It's not making Turing test obsolete. It was obvious from day 1 that Turing test is not an intelligence test. You could simply create a sufficiently big dictionary of "if human says X respond with Y" and it would fool any person that its talking with a human with 0 intelligence behind it. Turing test was always about checking how good a program is at chatting. If you want to test something else you have to come up with other test. If you want to test chat bots you will still use Turing test.
Sounds to me like that sufficiently large dictionary would be intelligent. Like, a dictionary that can produce the correct response to every thing said sounds like a system that can produce the correct response to any thing said. Like, that system could advise you on your career or invent machines or whatever.
So would a book could be considered intelligent if it was large enough to contain the answer to any possible question? Or maybe the search tool that simply matches your input to the output the book provides, would that be intelligence?
To me, something can't be considered intelligent if it lacks the ability to learn.
No, a dictionary is not intelligent. A dictionary simply matches one text to another. A HashMap is not intelligent. But it can fool a human that it is.
You could simply create a sufficiently big dictionary of “if human says X respond with Y” and it would fool any person that its talking with a human with 0 intelligence behind it.
The Turing Test isn't really intended to identify a computer -- Turing's problem wasn't that we needed a way to identify computers.
At the time -- well, and to some extent today -- some people firmly felt that a computer could not actually think, that that is something "special" that only humans can do.
It's intended to support Turing's argument for a behavioral approach to thinking -- that if a computer can behave indistinguishably from a human that we agree thinks, then that should be the bar for what we talk about when talking about thinking.
There have been people since who have aimed to actually work towards such chatbot, but for Turing, this was just a hypothetical to support his argument.
The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester.[5] It opens with the words: "I propose to consider the question, 'Can machines think?'" Because "thinking" is difficult to define, Turing chooses to "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."[6]
Turing did not intend for his idea to be used to test the intelligence of programs—he wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence.[82] John McCarthy argues that we should not be surprised that a philosophical idea turns out to be useless for practical applications. He observes that the philosophy of AI is "unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science."[83][84]
There is, however, still the concept of the Chinese Room thought experiment, and I don't think AI will topple that one for a while.
For those who don't know and don't wish to browse off the site, the thought experiment posits a situation in which a guy who does not understand Chinese is sat in a room and told to respond to sets of Chinese characters that come into the room. He has a little booklet of responses—all completely in Chinese—for him to use to send responses out of the room. The thought experiment questions whether or not the system of the Chinese Room itself can be thought to understand Chinese or even the man himself.
With the Turing Test getting all of the media spotlight in AI, machine learning, and cognitive science, I think the Chinese Room should enter into the conversation as the field of AI looks towards G.A.I.
The Chinese Room has already been surpassed by LLMs, which have shown to contain neurons that activate in such high correlation to abstract concepts like "formal text" or "positive sentiment", that tweaking them is one of the options that LLM based chatbots are presenting to the user.
Analyzing the activation space, it's also been shown that LLMs categorize and cluster sequences of text representing similar concepts closer to each other, which allows them to present reasonably accurate zero shot responses that have never been in the training set (that "weren't in the book" for the Chinese Room).
Imagine someone asked you "If Desk plus Love equals Fruit, why is turtle blue?"
AI will actually TRY to solve it.
Human nature would be to ask if the person asking the question is having a stroke or requires medical attention.
So, I asked this to the three different conversation styles of Bing Chat.
The Precise style actually tried to solve it, came to the conclusion the question might be of philosophical nature, including some potential meanings, and asked for clarification.
The Balanced style told me basically the same as the other reply by admiralteal, that the question makes no sense and I should give more context if I actually want it answered.
The Creative style told me it didn't understand the first part, but then answered the second part (the turtles being blue) seriously.
You're saying the test would work.
In 43+ years on this planet I've never HEARD someone seriously use "non sequitur" properly in a sentence.
Asking if the intention is sincere would be another flag given the circumstances (knowing they were being tested).
Toss in a couple real questions like: "What is the 42nd digit of pi?", "What is the square root of -i ?", and you'd find the AI pretty quick.
"If Desk plus Love equals Fruit, why is turtle blue?"
Assuming "Desk = x", "Love = y", "Fruit = x+y", and "turtle blue = z", it is so because you assigned arbitrary values to the words such that they fulfill the equation.
The idea that "a computer would deserve to be called intelligent if it could deceive a human into believing that it was human" was already obsolete 50 years ago with ELIZA. Clever though it was, examining the source code made it clear that it did not deserve to be called intelligent any more than does today's average toaster.
And then more recently, the ever-evolving chatbots have made it increasingly difficult to administer a meaningful Turing test over the past 30 years as well. It requires care and expertise. It can't be automated, and it can't be done by the average person who hasn't been specifically trained in it. They're much better at fooling people who've never talked to one before, but I think someone with lots of practice identifying the bots of 2013 would still have not much trouble catching out those of today.
It cannot be automated or systematized because neural networks are the tool you use to defeat systems like that. If there's a defined, objective test, a neural network can train for/on that test and 'learn' to ace it. It's just what they do.
The only way to test for 'true' intelligence would be to perfectly define it first, such that when the NN aced the test that would prove intelligence. That is, IF you could perfectly define intelligence, doing so would more or less give you all the tools you needed to create it.
All these people claiming we already have general AI or even anything like it have put the cart so far before the horse.
If a neural network can do it, then a neural network can do it... so we either have to accept that a neural network can be intelligent, or that no human can be intelligent.
If we accept that human NNs can be intelligent, then the only remaining question is how to compare a human NN to a machine NN.
Right now, the analysis of LLMs shows that they present: human-like categorization, super-human knowledge, and sub-human reasoning. So, depending on the measure, current LLMs can fall anywhere on the scale of "not AGI" to "AGI overlord". It's reasonable to expect larger models, with more multimodal training, to become fully "AGI overlord" by all measures in the very near future.
I disagree with the "limitations" they ascribe to the Turing test - if anything, they're implementation issues. For example:
For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive.
There's absolutely no reason why the evaluators shouldn't take the content of the messages into account, and use it to judge the reasoning ability of whoever they're chatting with.
The problem with AI is that it does not understand anything. You can have a completely reasonable sounding conversation that is just full of stupidity and the AI does not know it because it does not no anything.
Another AI issue is it works until it does not and that failure can be rather severe and unexpected. Again because the AI knows nothing.
Seems like we need some test to address this. They are basically the same problem. Or maybe it is some training so that the AI can know what it does not know.
Understanding the general sanity of some of their responses. Synthesizing new ideas. Having a larger context. AI tends to be idiot savants on one hand and really mediocre on the other.
You could argue that this is just a reflection of lack of training and scale but I wonder.
You will change my mind when I have had a machine interaction where the machine does not seem like an idiot.
Edit: AI people call the worst of these hallucinations but they are just nonsensical stuff that proves AI knows nothing and are just dumb correlation engines.
I'm reminded of the apocryphal Ghandi quote "first they ignore you, then they laugh at you, then they fight you, then you win." It seems like the general zeitgeist is in between the laugh/fight stages for AI right now.
It’s just too scary to acknowledge. Same thing with aliens. They’re both horrifying literally beyond imagination, and both for the same reason, and so it’s more natural to avoid acknowledging it.
Everything we’ve ever known is a house of cards and it’s terrifying to bring that to awareness.
🤖 I'm a bot that provides automatic summaries for articles:
Click here to see the summary
To try to answer this question, a team of researchers has proposed a novel framework that works like a psychological study for software.
This is why the Turing Test may no longer be relevant, and there is a need for new evaluation methods that could effectively assess the intelligence of machines, according to the researchers.
During the Turing Test, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots).
The same applies to AI as well, according to a study from Stanford University which suggests that machines that could self-reflect are more practical for human use.
“AI agents that can leverage prior experience and adapt well by efficiently exploring new or changing environments will lead to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current study, said.
It doesn’t tell us anything about what a system can do or understand, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg.