obi
formerly david stone
The Turing test is a test proposed by Alan Turing to tell computers and humans apart. The test is pretty simple:
A judge sends text to two unseen responders: one human and one computer. They send text back. The goal is for each of them to try and convince the judge that they are the human. A computer is said to pass the Turing test if it is declared the human about half the time (if it's believed to be the human half the time, that's as good as random selection of participants, meaning the text really didn't influence the decision, most likely, it was just a guess).
The Loebner Prize is awarded to computer programs who pass this test. No computer has yet passed the test. In fact, no computer has even gotten the second place prize. Every year they award the bronze medal to the best program of the year.
http://www.loebner.net/Prizef/loebner-prize.html
This is a transcript of the "winner" in 2008. http://loebner.net/Prizef/2008_Contest/Elbot.pdf
As said, this log represents the best program submitted in 2008. It failed the Turing test, as the judges were able to determine it was the computer.
I have several problems with the use of the Turing test as a benchmark for AI.
It requires computers to lie. The most straightforward question is "Are you a computer?" or "Are you a human?". If the computer answers this question truthfully, the game is over. It has to pretend like it's a human. The test will tell us when computers are able to mimic human speech patterns, but why is this the pinnacle of AI achievement?
It punishes superiority. Ask a human "What is phi^(ln(pi + sin(13)))?" and they won't have any idea. Ask a computer the same question and they can correctly respond rather quickly with ~1.79344. People make errors in typing; machines do not.
If there is no automatic delay in the time taken to send messages in this test, then there could be cases in which the computer would respond "too quickly" to be human.
The Turing test also doesn't even really mimic human communication. A chat room like logging onto #smogon has several people talking at once. No one user is responsible for responding to every statement. One of the "laws of chatterbots" that I've seen is that every input text must give a response. This is obviously flawed in a multi-user setting like IRC (and most human communication isn't one-on-one), as having three bots in one chat would produce a combinatorial explosion. As soon as one line is uttered, each of the three would respond to that line, giving three responses total. Then they would respond to the other two responses, giving 6. Then they're going to need to talk to the other 4 lines, leading to 12 lines, and so on. Somethings just don't warrant a response.
Given these problems, what tests would you propose to use for measuring advancements in artificial intelligence, or do you consider the Turing test acceptable?
A judge sends text to two unseen responders: one human and one computer. They send text back. The goal is for each of them to try and convince the judge that they are the human. A computer is said to pass the Turing test if it is declared the human about half the time (if it's believed to be the human half the time, that's as good as random selection of participants, meaning the text really didn't influence the decision, most likely, it was just a guess).
The Loebner Prize is awarded to computer programs who pass this test. No computer has yet passed the test. In fact, no computer has even gotten the second place prize. Every year they award the bronze medal to the best program of the year.
http://www.loebner.net/Prizef/loebner-prize.html
This is a transcript of the "winner" in 2008. http://loebner.net/Prizef/2008_Contest/Elbot.pdf
As said, this log represents the best program submitted in 2008. It failed the Turing test, as the judges were able to determine it was the computer.
I have several problems with the use of the Turing test as a benchmark for AI.
It requires computers to lie. The most straightforward question is "Are you a computer?" or "Are you a human?". If the computer answers this question truthfully, the game is over. It has to pretend like it's a human. The test will tell us when computers are able to mimic human speech patterns, but why is this the pinnacle of AI achievement?
It punishes superiority. Ask a human "What is phi^(ln(pi + sin(13)))?" and they won't have any idea. Ask a computer the same question and they can correctly respond rather quickly with ~1.79344. People make errors in typing; machines do not.
If there is no automatic delay in the time taken to send messages in this test, then there could be cases in which the computer would respond "too quickly" to be human.
The Turing test also doesn't even really mimic human communication. A chat room like logging onto #smogon has several people talking at once. No one user is responsible for responding to every statement. One of the "laws of chatterbots" that I've seen is that every input text must give a response. This is obviously flawed in a multi-user setting like IRC (and most human communication isn't one-on-one), as having three bots in one chat would produce a combinatorial explosion. As soon as one line is uttered, each of the three would respond to that line, giving three responses total. Then they would respond to the other two responses, giving 6. Then they're going to need to talk to the other 4 lines, leading to 12 lines, and so on. Somethings just don't warrant a response.
Given these problems, what tests would you propose to use for measuring advancements in artificial intelligence, or do you consider the Turing test acceptable?