Admit it: ‘Artificial general intelligence’ may already be obsolete, Expecting OpenAI’s GPT and other large language models to beat humans at thinking like a human might be missing the point.
Elon Musk filed a lawsuit in San Francisco’s Superior Court accusing OpenAI and its CEO, Sam Altman, of betraying the startup’s initial commitment to openness, the betterment of society, and lack of profit as a motive. Among other things, Musk’s 35-page complaint argues that OpenAI has violated its original deal to share its GPT large language models with Microsoft, which stated that the software giant would lose access to new LLMs once OpenAI had achieved AGI. According to the complaint, OpenAI reached that epoch-shifting moment a year ago with GPT-4, its most powerful model to date.
Musk—who cofounded OpenAI but left in 2018—is at least as entitled as anyone to come up with his own definition of AGI. His complaint describes it as “a general purpose artificial intelligence system—a machine having intelligence for a wide variety of tasks like a human.” That does sound like GPT-4 as I, a mere layperson, experience it in ChatGPT Plus.
But Musk’s declaration that the AGI era is already upon us is hardly the consensus among AI scientists. Even those who think it’s not far off predict arrival dates that are least a few years away. And GPT-4 falls well short of meeting OpenAI’s own explanation of the term: “A highly autonomous system that outperforms humans at most economically valuable work.”
Consider the evidence:
GPT-4 isn’t remotely autonomous; indeed, it does its best work when humans provide plenty of hand-holding in the form of detailed prompts.
The world is still in the process of figuring out what tasks GPT-4 can do, and we frequently overrate its competence.
That’s not even getting into the fact that OpenAI’s reference to “most economically valuable work” suggests that true AGI may involve not just software but also sophisticated robotics that don’t exist yet.
To guess when OpenAI—or a rival such as Google, Anthropic, Meta, Mistral, or Perplexity—might reach AGI, as OpenAI defines it, is to expect that it’ll be an obvious moment in time. But OpenAI’s definition, like all the others, is squishy and difficult to put to a conclusive test. To riff on Supreme Court Justice Potter Stewart’s famous comment about pornography, maybe we’ll know it when we see it. At the moment, however, I’m convinced that obsessing over AGI’s existence or nonexistence is counterproductive.
The whole notion of AGI is predicated on the assumption that AI started out dumber than a human but could someday match or exceed our level of thinking. Already, though, generative AI is different than human intelligence—far closer to omniscient than any individual flesh-and-blood thinker, yet also preternaturally gullible and prone to blurring fact and fiction in ways that don’t map to common human frailties. That’s because it’s a predictive engine, trained to string together words without truly understanding them. If its present trajectory of simulated brilliance mixed with boneheadedness continues, it might wander off in a direction far afield from most definitions of AGI.
Even if the world lands on a new, more inclusive definition of AGI, it may be hard to prove whether a particular LLM has attained it. Musk’s lawsuit cites proof points of GPT-4’s reasoning power, such as its scoring in the 90th percentile on the Uniform Bar Exam for lawyers and the 99th percentile on the GRE Verbal Assessment. That it can do so is astounding. But acing tests is not synonymous with performing useful work. And even if it were, who gets to decide how many tests an LLM must pass before it’s achieved AGI rather than just bobbled somewhere in its vicinity?
For decades, the Turing Test—which a computer would pass by fooling a human into thinking that it, too, was human—was computer science’s beloved thought experiment for determining when AI had gotten real. Strangely enough, it’s useless as a tool for assessing today’s LLM-based chatbots. But not because they know too little to fake humanity convincingly, or can’t express it glibly enough—but because they betray their artificiality by being so good at churning out endless wordage on more topics than any human knows. AGI could end up in a similar predicament: a benchmark, devised by humans, that’s rendered obsolete by the technology it was meant to measure.
DID YOU HEAR THE ONE ABOUT THE “MAC CAR?”
Last week, Apple’s long, expensive quest to build an autonomous EV entered its rearview-mirror phase—a sad fate my colleague Jared Newman blamed on the company’s sometimes counterproductive pursuit of perfection. Wondering what an Apple car would be like has been an obsession for techies since 2012, when news broke that Steve Jobs had toyed with getting into the automobile business even before there was an iPhone. Or maybe it started in 2008, when reports of a meeting between Steve Jobs and Volkswagen’s CEO led to wild speculation about an “iCar.”
Or how about 1998? According to Snopes, that’s when a joke involving cars designed by software companies began spreading like crabgrass across the internet, eventually evolving into an urban legend involving a Bill Gates keynote and a General Motors press release. Along with a Microsoft car that crashed twice a day and occasionally needed its engine replaced for no apparent reason, it mentioned a “Mac car” that “was powered by the sun, was reliable, five times as fast, twice as easy to drive—but would only run on 5% of the roads.”
I guess I should elaborate a bit. This is from a famous SC court case concerning 'obscenity' it's almost impossible to provide any kind of definition concerning reason or thinking because it's on the very edge of what we can ever really 'know'. At the same time I know that if we train something on both the questions and the answers and make it really efficient at giving the right answers, it's obviously not thinking, just indexing information. A great example is how AI can't create new information without a seed of absolute randomness. Humans don't have a random bone in their body.
A fun (though outdated) video series about the edge of the knowable:
You know, Alan Turing describes almost the same problem in 1950, though he talks about defining "thinking". He was famously good at reasoning and proposed a solution.
That's kinda the whole point of my comment is that things like Turing's method completely fall apart under heavy scrutiny. Further, the Turing Test specifically tells you nothing about whether or not something IS thinking, just that it MAY be. Big difference.
I see you didn't engage with the rest of my comment tho. Would you like to?
Just wanted to add this as it and stuff like it comes up pretty quickly when you research the turing test:
"On the other hand, there are several criticisms and limitations of the Turing Test as a measure of machine intelligence. Some of the main issues include:
The test focuses solely on the ability to mimic human-like behavior and communication, rather than on the underlying intelligence or consciousness of the machine.
The test is heavily dependent on the human evaluator’s subjective judgment, and may be influenced by factors such as the machine’s appearance or the human’s own biases.
The test does not take into account the possibility that a machine could be intelligent in ways that are fundamentally different from human intelligence.
The test does not consider the possibility of a machine deceiving the human evaluator, by providing pre-programmed or rehearsed responses rather than truly understanding the meaning of the questions."
LLMs would fall into the last, as they train on the "answers" so to speak and just match them to the "question".
I see you didn’t engage with the rest of my comment tho. Would you like to?
I am not sure if I should. The topic is veering into the spiritual. To me, this is merely a matter of intellectual curiosity. But for many people it is a very emotional subject. I do not wish to cause emotional distress.
We both know this has nothing to do with 'emotional distress' and everything to do with your overly large ego being bruised by the fact that you're wrong. It's classic fallacious behavior to argue as you have and then not engage with the opposition. The only "emotionally distressed" one here is you, and it's honestly really sad considering it's an anonymous forum and nobody even knows that it's you being stupid behind the screen. :/
Huh. An emotional subject, indeed. I didn't think merely pointing it out would be enough to trigger you. Sorry for causing you distress. I'm just not good at picking up emotional cues.
If I were still a teenager, I would not have worried about causing anyone distress. I've had many exchanges with people about matters that touch on the religious or spiritual. I've come to understand some things. Some people, if they stop voicing the "right" opinion, they will be disowned by their families and shunned by their communities. Other people have specific ideas about life after death. To them, if anything contradicts these ideas, then it's like they learn that their relatives are dead and they themselves will die soon. To me, all this is just interesting. It seems cruel to expose others to this kind of threat and emotional distress while I'm just sitting here all comfortable. I'm sure it took me way longer than others to understand that.
I don't know what your situation is. You could have told me not to worry but instead responded rather emotionally. I don't know what to make of that.
But you want a point. I guess I can do that.
We need to step back and ask how we know things. In science, it's all about experiments. You try things out. It's not quite as straight-forward as it seems but we don't need the details. Another way to know things is a legal system. If you want to know whose property something is, science cannot help you. In case of doubt, you have to go to court and get a judgment. There are lots of other ways but we don't need to bother.
Obscenity is not a matter for science. There is no experiment which can determine if something is or isn't obscene. The courts decide and they use no uniform standard.
If reason is like obscenity, then it is for the courts to decide or the law-makers.
I really just don't get why somebody would get emotional over an argument like this but to each their own I suppose. The reason for the emotionality of my reply is rather simply stated: I still don't believe you had any intent to spare anybody 'emotional distress' and were trying to remain aloof and, honestly, rather cunty, by bringing up something literally everybody even mildly interested in AI knows all about as if it's the end all be all of understanding the potential of thinking arising from a machine. On top of that, you purposefully haven't engaged with any of the points directly refuting the things you've said. Honestly, some of the emotionality comes from when I remember being like you, thinking I knew everything, and whenever somebody would hold me to my words I'd do something along the lines of what you're doing (engaging in argumentative discussion dishonestly in order to maintain the appearance of 'winning' when I really should have been learning more and changing my mind instead of bringing up the same tired pop-culture "smart people" bs.)
Anyway,
My point wasn't about obscenity. It's about the nebulousness of something like reason, and the Turing test isn't scientific in the first place, so I'm really not sure where you got all this 'science vs law' bs from.
The point wasn't that reason is like obscenity, but that I can clearly see, from the way that we train LLMs, that they aren't reasoning in any form, rather using values that have been derived over time from the training data fed in and the 'reward' system used to get the right answers over time. An LLM is no more than a complicated calculator, controlled in many ways by the humans that train it, just as with any form of machine learning. Rather that I "know it when I see it"
I've read some studies on 'game states' which is the closest that ai scientists have come to anything resembling reason, but even in a model that played the relatively simple game of Othello, the metric they were testing the AI (which was trained on data of legal Othello boardstates) against to 'prove' that it was 'thinking' (creating game states) was that it was doing better at choosing legal moves than random chance. Another reason it might have been doing better than random chance? Oh yeah... the training data full of legal boardstates. And when the AI was trained on less data? Oh? Would you look at that? The margin by which it beats random chance falls drastically. Almost like the LLM has no fucking clue what's going on and it's just matching boardstates... indexing. It doesn't understand the rules of Othello; it's just matching piece placement locations with the legal boardstates it was trained on. A human trained on even a few hundred (vs thousands) of such boardstates could likely start to reason out the rules of the game quite easily.
I'm not even against AI or anything, but to call the machine learning that we have now anything close to true, thinking AI is just foolish talk.