The Turing Test (Artificial Intelligence)

obi · Mar 11, 2009

The Turing test is a test proposed by Alan Turing to tell computers and humans apart. The test is pretty simple:

A judge sends text to two unseen responders: one human and one computer. They send text back. The goal is for each of them to try and convince the judge that they are the human. A computer is said to pass the Turing test if it is declared the human about half the time (if it's believed to be the human half the time, that's as good as random selection of participants, meaning the text really didn't influence the decision, most likely, it was just a guess).

The Loebner Prize is awarded to computer programs who pass this test. No computer has yet passed the test. In fact, no computer has even gotten the second place prize. Every year they award the bronze medal to the best program of the year.

http://www.loebner.net/Prizef/loebner-prize.html

This is a transcript of the "winner" in 2008. http://loebner.net/Prizef/2008_Contest/Elbot.pdf

As said, this log represents the best program submitted in 2008. It failed the Turing test, as the judges were able to determine it was the computer.

I have several problems with the use of the Turing test as a benchmark for AI.

It requires computers to lie. The most straightforward question is "Are you a computer?" or "Are you a human?". If the computer answers this question truthfully, the game is over. It has to pretend like it's a human. The test will tell us when computers are able to mimic human speech patterns, but why is this the pinnacle of AI achievement?

It punishes superiority. Ask a human "What is phi^(ln(pi + sin(13)))?" and they won't have any idea. Ask a computer the same question and they can correctly respond rather quickly with ~1.79344. People make errors in typing; machines do not.

If there is no automatic delay in the time taken to send messages in this test, then there could be cases in which the computer would respond "too quickly" to be human.

The Turing test also doesn't even really mimic human communication. A chat room like logging onto #smogon has several people talking at once. No one user is responsible for responding to every statement. One of the "laws of chatterbots" that I've seen is that every input text must give a response. This is obviously flawed in a multi-user setting like IRC (and most human communication isn't one-on-one), as having three bots in one chat would produce a combinatorial explosion. As soon as one line is uttered, each of the three would respond to that line, giving three responses total. Then they would respond to the other two responses, giving 6. Then they're going to need to talk to the other 4 lines, leading to 12 lines, and so on. Somethings just don't warrant a response.

Given these problems, what tests would you propose to use for measuring advancements in artificial intelligence, or do you consider the Turing test acceptable?

Hipmonlee · Mar 11, 2009

What is the purpose of the test though? It obviously isnt to tell humans and computers apart, since the test in effect is "can you tell humans and computers apart?"

The benchmark for AI should be "can it do what I want it to do?"

I dont see any need to measure advancement beyond that..

AI advancement at the moment could be summed up like "good at chess, bad at driving in GTA"

Have a nice day.

Caelum · Mar 11, 2009

heh, I like how Obi is asking questions that computer scientists and AI researchers aren't fully equipped to answer

Obi said:
It punishes superiority. Ask a human "What is phi^(ln(pi + sin(13)))?" and they won't have any idea. Ask a computer the same question and they can correctly respond rather quickly with ~1.79344. People make errors in typing; machines do not.

I always found it amusing that if a computer solves a computational problem that is impossible for a human to solve, it will actually fail the test. To me, that is nonsensical and that is also a major concern of the problem.

Another criticisms you left off, or rather only touched on without elaborating, is the idea of the "chinese room" and simulated vs. real intelligence, but maybe that's for another thread.

Obi said:
Given these problems, what tests would you propose to use for measuring advancements in artificial intelligence, or do you consider the Turing test acceptable?

I personally don't consider the Turing Test to be "acceptable" in terms of measuring artificial intelligence, but I do believe it is a good start certainly.

To be honest, I'm no expert in this field and only have remote knowledge of it based on my experiences with mathematics so I'm not going to attempt to give a fleshed out answer. I think the current biggest problem is, we still have not yet determined what intelligence actually is. Until we summarize the nuances and details of what we define as intelligence, we will never have a fully adequate test to determine it.

I personally find Marcus Hutter's work on universal intelligence in the context of the reinforcement learning problem to be the most compelling avenue. The idea that ability to compress data well is related to acting intelligently, based on the idea that you have a reasonably good idea on what's coming next, seems to be the most convincing avenue as of late. While I admit that we can not be certain that nearly ideal lossless compression is nearly equivalent to the Turing test, I feel this route has the most potential. I am well aware that it is impossible to write a program that losslessly compresses all input strings; never-the-less I still feel this is, currently, the most valid route.

Some interesting work on Kolmogorov complexity in relation to intelligence tests are being done, but I don't know the details of it and I'd have to do some research into it.

Anyway, forgive me for my (likely) uneducated response. I'm not particularly well versed in information theory.

Atlas · Mar 11, 2009

Obi said:
It requires computers to lie. The most straightforward question is "Are you a computer?" or "Are you a human?". If the computer answers this question truthfully, the game is over. It has to pretend like it's a human. The test will tell us when computers are able to mimic human speech patterns, but why is this the pinnacle of AI achievement?

why does it require the computer to lie? humans tell the truth all the time and everyone thinks their lieing, just ask any girl on the internet.

Kingdrom · Mar 11, 2009

If intelligence is considered to be a measure of how something understands and adapts to a different situation, computers have a long way to go. After reading the "winner" log, the computer did an extremely poor job of answering simple questions, and replied in the same fashion that referred to him being a computer and not human. Of course, there has to be a default response when a question asked doesn't match any questions the computer has prepared an answer for. For computers to "think" at an intelligence level that would pass the Turing test, it would require an advanced ability to learn.

A humans ability to make mistakes should not be a desirable trait in computers. It's enough to deal with other people lying, let alone other computers.

Hipmonlee · Mar 11, 2009

To deal with the lying handicap, you could require the human participant to claim to be a human but everything else they would say would have to be a lie.

Have a nice day.

jamespicone · Mar 12, 2009

Another criticisms you left off, or rather only touched on without elaborating, is the idea of the "chinese room" and simulated vs. real intelligence, but maybe that's for another thread.

The chinese room problem is a terrible criticism of AI in general. Why is the ability to simulate speaking chinese by following simple rules different to speaking chinese? How do we know that's not how it works in the head of a human that speaks Chinese?

And more to the point, the simple rules know how to speak Chinese, and they're the AI in the room.

On topic, I would consider most of your criticisms to be just missing the point. The idea is that conversation is a difficult problem, requiring, in particular, a mental model of what the other person is thinking. If a computer can do that well enough that they appear human, then there's clearly something clever going on - although arguably not human-level intelligence. You're criticising the implementation of the test more than the idea behind it.

X-Act · Mar 12, 2009

The point is that answering what is sin(5) is not intelligence. If that were the case, then a calculator is intelligent. Making coffee is not intelligence, as a coffee machine would then be intelligent.

The whole point lies in the question: is having a conversation that is indistiguishable from a human intelligence or not? According to Turing, it is. And I lean towards Turing's opinion, personally.

Can a computer program be written that is able to log onto the Smogon forum, interact and post meaningful posts here? If such a program were to be written, I would say that that is intelligence. The Turing test is much more restricted that this, and yet no machine has passed it yet.

jamespicone · Mar 12, 2009

X-Act said:
Can a computer program be written that is able to log onto the Smogon forum, interact and post meaningful posts here? If such a program were to be written, I would say that that is intelligence. The Turing test is much more restricted that this, and yet no machine has passed it yet.

That's mostly because the average brain has 10**10 neurons and 10**12 connections, and the biggest chip manufactured to date has ~2.5 * 10 **9 transistors, which are substantially simpler than neurons. And the brain is an evolved structure, and evolved designs are generally far more efficient when it comes to basic processing power. We just don't have the computational power to do human-scale intelligence. And when we do have the computational power, we won't be able to design an intelligence using it, because the system will be too complex. I suspect that fully intelligent artificial agents, if they ever happen, will be evolved, not designed.

crynts · Mar 12, 2009

The computer would need to have its own personal profile, a job if it has one,an age, and a birthday.

Or, it could deny the information because there could be identity theives.

Humans aren't very intelligent. The computer could pretend to be a retard. It would say stupid things like 0+0=8.

Surgo · Mar 12, 2009

jamespicone said:
And the brain is an evolved structure, and evolved designs are generally far more efficient when it comes to basic processing power.

I'm not going to believe this one without a source because, in general, evolution sucks when it comes to efficiency and optimality.

jamespicone said:
We just don't have the computational power to do human-scale intelligence.

There's the question of whether we can do it at all. A computer can only run algorithms that are in the class of Turing machines, as far as I'm aware no one has shown that the human brain is reducible to a Turing machine. While I strongly suspect that it is...suspecting it is not the same thing as proving it.

Vineon · Mar 12, 2009

I see nothing morally wrong with this test and the possibility of it achieving to manage to fool people seems borderline unachievable to me. At least for the time being.

There are too many trick questions that can be asked. Too many ways to type that could fool it or questions that could have double meanings. It could not possibly be prepared for everything that can be thrown at it.

For example if I were to code my text and give instructions to read it, it would take quite a hugely advanced program to answer me. Right now, it is likely filled with filler ways to answer, stuff like "I don't know" or "I think so too"... you know, ways around questions it can't reply to properly. It is not going to fool anyone as long as it uses shortcuts like that. I could fool someone not aware of the test but not a 'judge' trying to find a flaw.

That said, it would be lovely if this program was publicly accessible as I would love to try it. I've read the log you've shown however and we're not even close to simulate properly.

It could succeed if a only a limited number of questions are allowed or if the test runs over a limited period of time but never with freedom of conversing as long as you can with it, asking whatever you want, without any sort of restricting rules.

Luduan · Mar 12, 2009

jamespicone said:
The chinese room problem is a terrible criticism of AI in general. Why is the ability to simulate speaking chinese by following simple rules different to speaking chinese? How do we know that's not how it works in the head of a human that speaks Chinese?

And more to the point, the simple rules know how to speak Chinese, and they're the AI in the room.

On topic, I would consider most of your criticisms to be just missing the point. The idea is that conversation is a difficult problem, requiring, in particular, a mental model of what the other person is thinking. If a computer can do that well enough that they appear human, then there's clearly something clever going on - although arguably not human-level intelligence. You're criticising the implementation of the test more than the idea behind it.

Although I disagree with Searle on other grounds, I don't think this answer solves his fundamental criticism of "Strong AI" (that syntax is not sufficient for semantics). It doesn't actually provide any evidence that the system "understands" Chinese. Also, I am rather puzzled when you say that conversation requires a mental model of what someone else is thinking. This simply is not the way language operates.

monkfish · Mar 12, 2009

The problem with the Turing Test is that it is way ahead of its time. No AI can even come close to indistinction with a human at present; we don't have the technology. Perhaps in the future it will be used as a measure of AI, but currently it is useless.

RE: surgo and jamespicone
Have you studied evolutionary algorithms? Admittedly they are not perfectly suited to the emulation of human speech, but they can prove to be optimal solutions to np-hard problems. It would be possible to set up an environment where a machine can observe conversations (e.g. msn or irc) to use as its seeds, then use an interactive fitness function to refine its speech. The main problem with this is that interactive fitness functions are really slow - you'd need to be running a hugely distributed system, talking to a few hundred people at a time, all of whom follow rating guidelines correctly.

Caelum · Mar 12, 2009

It's quite unlikely that P=NP so an optimal solution to an NP-Hard problem in polynomial time isn't likely in the slightest so I'm not sure what you mean by optimal solutions.

monkfish · Mar 12, 2009

better than others

obi · Mar 12, 2009

jamespicone said:
The chinese room problem is a terrible criticism of AI in general. Why is the ability to simulate speaking chinese by following simple rules different to speaking chinese? How do we know that's not how it works in the head of a human that speaks Chinese?

My problem with Searle's Chinese Room is that yes, the person inside doesn't understand Chinese, but that would be like criticizing human use of Chinese because any given neuron doesn't understand Chinese. You can't point at one point in the brain and say "This is where the understanding of Chinese is.". No part of the system understands Chinese, but the system as a whole can be said to "speak Chinese".

Searle's problem is that he's basically doing an argument from ignorance. We don't understand exactly how language processing works, other than that Broca's area and Wernicke's area are important parts. However, just because we don't understand it yet doesn't mean it's some sort of mystical process that no other system can replicate (nor is it true that the human brain's processes are the only processes that allow 'true' understanding of speech). The Chinese Room criticism relies on the idea that the human brain does not just follow a set of rules / heuristics on the input to decide the output. If this isn't what happens, then I have to ask: what does?

X-Act said:
The point is that answering what is sin(5) is not intelligence. If that were the case, then a calculator is intelligent. Making coffee is not intelligence, as a coffee machine would then be intelligent.

The whole point lies in the question: is having a conversation that is indistiguishable from a human intelligence or not? According to Turing, it is. And I lean towards Turing's opinion, personally.

My point isn't that it is intelligence, but rather, if you were to ask a machine what it is, to fool you into thinking it's a human, it has to lie and say something like "I don't know.". Assuming the machine isn't trying to lie to you, it's a way to tell computers and humans apart that doesn't rely on a deficiency of the computer, but rather, a strength. It seems wrong to me that it would "fail" the test by proving to be more capable than humans in certain areas.

Vineon said:
It could succeed if a only a limited number of questions are allowed or if the test runs over a limited period of time but never with freedom of conversing as long as you can with it, asking whatever you want, without any sort of restricting rules.

The rules of the Loebner Prize are that you get 5 minutes of questioning, because Turing's original quote reads as follows:

“…I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning…”

It's also a bit confusing what he meant by "70 per cent chance", considering that even random guessing would only have a 50 per cent chance.

Hipmonlee · Mar 12, 2009

He means that that the computer will normally be convincing, but that 4 times out of 10 it will fuck up and give itself away.

Have a nice day.

Polis4rule · Mar 12, 2009

http://s3.amazonaws.com/engrade-myfiles/4017995492853/Can_Machines_Think.html

This link actually raises a couple of good issues when referring to the whole "Can machines think," or "Can a machine beat the Turin test?" There also was this other study done where some researchers were able to mimic a slice of a rat brain using just the chips in a super computer. Granted, it took a couple thousand chips to simulate like less than a millimeter thick piece of brain, but still, the possibility is there.

Staraptor Call · Mar 12, 2009

There are many possible solutions to the "dead giveaways" Obi mentioned in the first post. The computer could fool the questioner by giving a delayed or only approximate answer to a mathematical problem a human couldn't do, or intentionally give an incorrect answer. The computer could produce a "typing" delay to seem more like a human. The computer wouldn't necessarily respond to everything typed if it were programmed correctly; it could only respond if the statement met certain conditions. However, five minutes seems like an arbitrary, overly short amount of time in which to conduct a Turing test.

obi · Mar 13, 2009

I'm not saying the computer can't overcome these problems; what I'm saying is that the Turing test relies so much on "human-like" as its criterion that it actually punishes the areas in which machines are superior. Why should the machine have to intentionally give a wrong answer to pass a test?

husk · Mar 13, 2009

I don't see why emulating human flaws is such a bad thing if your goal is to emulate human responses.

Caelum · Mar 13, 2009

husk said:
I don't see why emulating human flaws is such a bad thing if your goal is to emulate human responses.

Obi's greater point is why is the goal to emulate human responses at all. If it is possible to design an AI that is more intelligent than a human it will fail the Turing test in principle. See my example above that if an AI solves a computation problem that a human cannot solve, it will fail the Turing test. Part of Obi's suggestion is that if an AI is greater than a human in terms of capabilities it will fail the Turing test. This makes it inherently inaccuracte. A view I happen to agree with.

An interesting perspective to look at the turing test though is whether or not the AI is intelligent enough to determine that in order to pass the test it must behave human. However, I would argue that still isn't a great test because why is the ability to falsely behave human an appropriate metric of intelligence.

husk · Mar 13, 2009

I've always thought that the Turing Test was the use of AI to emulate human responses and therefore emulating human flaws is just part of the game you're playing. The computer needs to be good enough to have the strengths of the human mind and the flaws. Further, these factors needs to show in the responses. Of course, if you're planning on creating "intelligence" the requirements should be different (it is arrogant to assume the average human mind can be used to define "intelligence") but then why discuss the Turing Test?

Misty · Mar 13, 2009

Obi, I have to ask - do you watch Numb3rs? The latest episode was about a computer that passed the Turing Test - because it was SPECIFICALLY DESIGNED to pass it. The essence was that the computer had a massive database of "normal" conversations and used an algorithm to determine the most "human" response to a given question. I'm not sure if this kind of thing would work in practice, but it's worth considering.

The Turing Test (Artificial Intelligence)

formerly david stone

Have a nice day

qibz official stalker

I'm the Mary!

Have a nice day

np: Biffy Clyro - Shock Shock

goes to eleven

Fleurdelysé

what are birds? we just don't know.

qibz official stalker

what are birds? we just don't know.

formerly david stone

Have a nice day

formerly david stone

qibz official stalker

oh

Users Who Are Viewing This Thread (Users: 1, Guests: 0)