The Turing test
And now for something completely different.
The Turing test for distinguishing humans from machines is one of the most famous concepts in the philosophy of our field. But what was Turing original's idea and which purposes do the test serve?
Versions of the Turing test
The mainstream version of the Turing test (also known as Standard Turing Test) proceeds as follows, defining three parties:
- A is a a machine, attempting to disguises itself as a man.
- B is a real human.
- C is a human who converses with both A and B through a textual channel.
C compares the answers to the questione he makes to A and B; A succeeds and passes the Turing test if it (is?) able to fool C into believing he is the real man.
However, the original test proposed by Turing was quite different from this direct man-machine comparison, and is called the imitation game.
In the original imitation game:
- A is a man.
- B is a woman.
- C is a human interrogator.
Both A and B tries to convince C that they are the woman. The question made by the test is, when A is substituted by a machine with the goal of impersonating the woman, will C decide wrongly as often as when A is a man?
There are several reasons for not considering the two tests equivalent, and for declaring the original imitation game more powerful:
- in the imitation game, similarity to a human is not the criterion for detecting intelligence, although human intelligence is used as part of the test to set a reasonable metric. In fact, chat bots are often able to pass implementations of the standard turing test (the famous ELIZA machine) but we still don't consider them intelligent.
- moreover, in this original version a machine can outperform a human: in that case, it will convince the interrogator more often than the man itself.
- the foundation of the imitation game is statistical: the metric is the frequency of failure or success.
We can say that the imitation game requires a superior level of intelligence as it would check not only the knowledge of natural language and reasoning, but also the ability to impersonate and lie.
Criticism of the test
Despite Alan Turing's status in the olympus of computer science, there are many criticisms that have been brought on to his test. For example, a computer has to simulate human behavior just to pass the test, even if it would be able to offer a better performance in the questioning; the machine has to introduce typing errors and model some "artificial stupidity".
In general, the two partially overlapping sets of human and intelligent behavior are not coincident at all: the machine may be more intelligent than an human (like IBM's Watson becoming a champion of Jeopardy) but being unable to show it; or being unable to replicate human behavior like the sense of smell, but nevertheless possess the ability to reason.
In essence, paraphrasing Peter Norvig: planes are tested for how well they fly, not for how they imitate birds or how they can fool pigeons into believing that they are pigeons too. As such, the validation of artificial intelligence results never passes through the Turing test.
AI has simpler goals and validation methods than the natural language test. Consider for example object recognition: its goal is not to totally imitate a human and capture a verbal description of an object and reason about what is its usage, but just to detect the presence of an object or a face from a predetermined set inside a video or a large set of photographs.
Validation is performed by calculating precision and recall over a set of new photographs, never used in the training of the mechanism under test; this and many other examples from machine learning show how artificial intelligence has many interesting applications even without the ability to build real thinking machines.
The Completely Automated Public Test to tell Computer and Humans Apart mechanisms, CAPTCHA for friends, are an example of a reverse Turing test. In this case, a machine (due to the automation part) has to be able to tell the difference between an human and another machine. Usually, this means distinguishing between legitimate human users and abusers such as bots and spammer tools.
Captchas have many problems, like the accessibility of the image-based implementations which has lead to an audio equivalent. However, the arms race between more powerful recognition algorithms and more deformed captchas continues.
The result is incomprehensible words:
and the choice of spammers to rent cheap human labor on Amazon's Mechanical Turk to solve thousands of captchas for a few dollars.
There are alternatives to captchas, based on making the problem more difficult or just different from the usual deformed text image. We have yet to see if these new captchas are a form of security through obscurity or truly new challenges for artificial intelligence.
Returning to the philosophical side of the Turing test, it is considered (only?) a behavioral test: its assumption is that we are only interested in distinguish between machines and human by their external behavior, not by the presence of consciousness.
Even if we had a perfect imitation of a human, could we say that it is really a thinking machine (maybe acquiring the rights of a living being as a result)? The human mind may be just information and patterns, so that it can be reproduced inside a computer; or it may be something more, due to particular biological and chemical properties of the brain. The Turing test does not address this concern, defining equivalence as identical behavior in a very dualist way.
But that's a story for another day...