Note: the actual paper's title ends in a question mark and states in its "discussion":
We emphasize that nothing in this paper should be interpreted as claiming that large language models cannot display emergent abilities; rather, our message is that previously claimed emergent abilities in [3, 8, 28, 33] might likely be a mirage induced by researcher analyses.
It is clear to anyone who used them and understand the task they were trained on, that LLMs do have emergent abilities. This paper is a refutation of precise claims of other papers that they argue use inappropriate metric to show "sudden" emergence rather than a "smooth" one.
To clarify: The authors/Stanford used this exact stated/non-question title for their press release: https://hai.stanford.edu/news/ais-ostensible-emergent-abilities-are-mirage, which ended up also being the title of the previous post on !artificial_intel@lemmy.ml. As already noted by @huginn@feddit.it, this “AI’s Ostensible” title is therefore well in line with the paper’s actual conclusion, that is refuting current claims of emergence. And I picked the “AI’s Ostensible” title being from the authors/their employer, for clarity (especially when quoted inside a larger Lemmy post title), and continuity with the previous post.
It is clear to anyone who used them and understand the task they were trained on, […]
Yet where is the proof? This is the exact wishy-washy way of not substantiating a claim, which this paper investigated and have refuted.
[…] that LLMs do have emergent abilities.
I think you should really not drop that sentence immediately in front of your quite selective quote — the authors put it in emphasis for good reasons:
Ergo, emergent abilities may be creations of the researcher’s choices, not a fundamental property of the model family on the specific task.
So regarding “emergent abilities,” it is quite clear the authors argue that from their analysis, if at all, cherry-picked metrics carry these “emergent abilities,” not LLMs.
This paper is a precise refutation to all current claims of emergence as nothing more than bad measurements.
It is clear to anyone who used them and understand the task they were trained on, that LLMs do have emergent abilities.
Not by the definition in this paper they don't. They show linear improvement which is not emergent. The definition used is:
As the complexity of a system increases,
new properties may materialize that cannot be predicted even from a precise quantitative understand-
ing of the system’s microscopic details.
The capabilities displayed by LLMs all fall on a linear progression when you use the correct measures. That is the antithetical to emergent behaviors.
Again: that does not preclude emergence in the future, but it strongly refutes present claims of emergence.
That's a weird definition. Is it a widely used one? To me emergence meant to acquire capabilities not specifically trained for. I don't see why them appearing suddenly or linearly is important? I guess that's an argument in safety discussions?
That definition is based on how the paper approached it and seems to be a generally accepted definition. I just read a bit of the paper, but seems to highlight that how we've been evaluating LLMs has a lot more to say about their emergent capabilities than any actual outcome.
It's not the definition in the paper. Here is the context:
The idea of emergence was popularized by Nobel Prize-winning physicist P.W. Anderson’s “More Is Different”, which argues that as the complexity of a system increases, new properties may materialize that cannot be predicted even from a precise quantitative understanding of the system’s microscopic details.
What this means is, that we cannot, for example, predict chemistry from physics. Physics studies how atoms interact, which yields important insights for chemistry, but physics cannot be used to predict, say, the table of elements. Each level has its own laws, which must be derived empirically.
LLMs obviously show emergence. Knowing the mathematical, technological, and algorithmic foundation, tells you little about how to use (prompt, train, ...) an AI model. Just like knowing cell biology will not help you interact with people, even if they are only colonies of cells working together.
The paper talks specifically about “emergent abilities of LLMs”:
The term “emergent abilities of LLMs” was recently and crisply defined as “abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models”
The authors further clarify:
In this paper, [...] we specifically mean sharp and unpredictable changes in model outputs as a function of model scale on specific tasks.
Bigger models perform better. An increase in the number of parameters correlates to an increase in the performance on tests. It had been alleged, that some abilities appear suddenly, for no apparent reason. These “emergent abilities of LLMs” are a very specific kind of emergence.
That was my feeling reading the paper. I feel that LLMs are overhyped but the issue of linear vs super linear growth in metrics is a different issue and can't be a refutation of what has traditionally been thought of as emergent properties.
In other words, this is refutation by redefinition.