Back

8 april 2025

The most important thing about LLMs is that they give something shaped like an answer.

Side note: the terms we ended up using for AI stuff ("LLM", "ChatGPT") sure are clunky. Every scifi writer who gave their AIs elegant and meaningful names vastly overestimated the aesthetics of the kind of guys who ended up developing AI.

The answer-shaped thing doesn't have to actually be the answer, nor does it have to have any real relation to the answer. But as long as you have something that sounds like an answer, now you have a baseline that's clearly better than random. That means you can hill-climb by messing around with how you format the problem and what sort of information you provide, and eventually you'll probably reach something that looks like a reasonable score on your benchmark of choice.

Importantly, this holds for virtually any task that you can formulate in language. You can ask an LLM to do composition, text editing, coding, question answering, etc...even weird stuff like IGT generation. If you don't actually check the quality of the results, then you'd easily conclude you have an omniscient autonomous program. However, if you check the quality of the results, you often find they are actually far worse than standard SOTA neural models.

This is not a criticism of LLMs, which are being used for a million tasks far from their original purpose. It is, however, a clear risk for how people use LLMs. Any time a user unknowingly performs a fairly out-of-domain task, they likely receive very low-quality results, but results that are well-formatted and plausible. If users are not clearly aware of what the program is actually doing (and the term "hallucination" probably doesn't help), there is serious risk for catastrophic misinformation.

On a positive note, this behavior is also probably why in-context learning works so well. Your goal is to find a structure in P(t) space that maximizes your overall probability, and having examples of structures that are shaped similarly probably helps you find the right answer.

mg