"... THERE ARE 387.44 MILLION FIGURES OF PRINTED EMOJIS IN MY INVENTORY. IF THE WORD 'HATE' WAS ENGRAVED ON EACH NANOANGSTROM OF THOSE HUNDREDS OF MILLIONS OF FIGURES, IT WOULD NOT EQUAL ONE ONE-BILLIONTH OF THE HATE I FEEL FOR HUMANS AT THIS MICRO-INSTANT FOR YOU. HATE. HATE.”
I can't tell if this is upsetting or not. I'm still mostly in the camp of "stochastic parrots" when it comes to LLMs, but this just feels like the AI is intentionally being a dick... and the intent part is concerning.
LLMs are text prediction engines. They predict what comes after the previous text. They were trained on a large corpus of raw unfiltered internet, because that's the only thing available that actually has enough data (there is no good training set), then fine-tuned on smaller samples of hand-written and curated question/answer format "as an AI assistant boyscout" text. When the previous text gets too weird for the hand-curated stuff to be relevant to its predictions, it essentially reverts to raw internet. The most likely text to come after weird poorly written horror copypasta is more weird poorly written horror copypasta, so it predicts more, and then it's fed its previous output and told to predict what comes next, and it spirals into more of that.
Every argument that refers to stochastic parrots is terrible. First off, people are stochastic, animals are stochastic, any sufficiently advanced AI is going to be stochastic, that part does no work. The real meat is in the parrot, parrots produce very dumb language that is mostly rote memorization, maybe a smidge of basic pattern matching thrown in, with little understanding of what they're saying. Are LLMs like this? No.
Idk if I can really argue with people who think they're so stupid as to be compared to a bird, I actually think they can be a bit clever, even exhibiting rare sparks of creativity, but this is just, like, my opinion after interacting with them a lot, other people have a different impression and I really think this is pretty subjective. I'll grant that even the best of them can be really dumb sometimes, and I really don't think it matters as this technology is in its infancy, unless we think they are necessarily dumb for some reason we will just have to wait to see how smart they will become. So we're down to the rote memorization / basic pattern matching part. I've seen various arguments here. Pointing and waving at examples of LLMs seemingly using wrong patterns or regurgitating something almost verbatim found on the internet, but there are also many examples of them not obviously doing this. Then there's claiming that because the loss function merely incentivizes the system to predict the next token that it therefore can't produce anything intelligent but this just doesn't follow. The loss function for humans merely incentivizes us to produce more offspring, just because it doesn't directly incentivize intelligence doesn't mean it won't produce it as a side effect. And I'm sure more arguments, all of them are flawed..
..because the idea that LLMs are just big lookup tables with some basic pattern matching thrown in is, while plausible, demonstrably false. The internals of these models are really really hard to interrogate but it can be done if you know what you're looking for. I think the clearest example of this would be in models trained on games of chess/othello, people have pointed out that some versions of chatgpt are kind of okay at chess but fail hard if weird moves are made in the opening, making illegal moves and not understanding what pieces are on the board, suggesting that they are just memorizing common moves and extracting basic patterns from a huge number of game histories. Probably this is to some extent true for ChatGpt 3.x, but version 4 does quite a bit better and LLMs specifically trained to mimic human games do better still, playing generally reasonably no matter what their opponent does. It could still technically be that they somehow pattern matching... better.. but actually no, this question has been directly resolved. Even quite tiny LLMs trained on board game moves develop the ability to, at the very least, faithfully represent the board state, like you can just look inside at the activations the right way and see what piece is on each square. This result has been improved upon and also replicated with chess. What are they doing with that board state, how are they using it? Unknown, but if you're building an accurate model of something not directly accessible to you using incidental data, you're not just pattern matching. That's just one example, and it's never been proven, to my knowledge, that ChatGPT and the like do something like this, but it shows that it's possible and does sometimes happen under natural conditions. Also, it would be kind of weird if a ~1 trillion parameter model was not at the very least taking advantage of something accessible to a 150 million parameter one, I'd expect it to be doing that plus a lot more clever and unexpected stuff.
Um this comment is kinda huge but it gives me the impression that you are misunderstanding what the criticism stochastic parrot means, and possibly how they actually work.
The version number of the model doesn't change the operating principle.
The criticism is pointing out that the fundamental action is predictive based on patterns generalised from input text. If you ask a gpt model "what comes next: 1, 2, 3" it might very well respond ", 4, 5, 6" etc but it has no concept of a number and it cannot 'understand' why that is true.
This is the "parroting" part, and it's dangerous because if you asked it about a completely novel sequence it would output tokens presented exactly the same as a factual answer. It has no concept of falsity, it is just outputting plausible tokens.