So, a computer has passed the ‘Turing Test’ for ‘intelligence’, has it? No, not really; in fact, no, not at all. But, boy, has it stirred up some public interest in the subject? That alone has to be good. More than that, it’s got senior computer scientists debating anew about how the test should be implemented … and even what it actually means.
The usual bite-sized version of the Turing Test (TT) for public consumption is this … Put a human in one room, in communication with both another human and a computer in a different room. In modern terms, the communication would take the form of something like a text message conversation with each. If the first human couldn’t say which of the second human or the computer was which (or got it wrong), then that would make the computer intelligent. Last week, there was widespread coverage in the press that a computer – well a computer program – had passed the test.
Well, it’s hard to know where to begin with what’s wrong with this … A possible starting point would be some questions. Many, many questions. How would the test be set up? Who are the humans? How clever (intelligent?) are they? How long have they got? (How long does the test last?) How many times is the test performed? Does it need to be repeated with different people? How consistent do the results have to be? It’s very interesting to note, at this point, that most of these uncertainties relate to the people involved in the TT. The role of the machine to be tested appears to be unambiguous … until we start trying to relate it to humans.
And, in the first instance, that may be central to the problem. We’re trying to measure what we think is a human attribute in a computer. And we’d like to measure it with some predictability or determinism. But we’re doing it by comparison to people … who generally don’t exhibit either quality. Inconsistency is almost an essential human quality, which we’re hoping to assess consistently in a machine by comparing it to an inconsistent person. Whether or not that’s a single loose end or several, doesn’t really matter; it’s very loose.
To go forward from this point, we have to go back a bit and ensure we’re being fair to Alan Turing himself. It does rather seem that many people who reference Turing’s original paper, haven’t actually read it. Its contents have almost evolved into a form of modern urban myth; it certainly doesn’t say what most people think it does.
Turing’s original concept was an entirely general one. He was interested in the concept of machine intelligence, and not really the detail. Although he did describe, in general terms, the principle of what we now think of as the TT – his so-called ‘Imitation Game’ – he never actually laid down any precise parameters. However, he did mention, at one point, his thoughts that:
I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. I believe further that no useful purpose is served by concealing these beliefs. The popular view that scientists proceed inexorably from well-established fact to well-established fact, never being influenced by any unproved conjecture, is quite mistaken. Provided it is made clear which are proved facts and which are conjectures. no harm can result. Conjectures are of great importance since they suggest useful lines of research.
Of course, this is essentially a philosophical discussion although, even in 1950, Turing’s insight was brilliant. Our own thinking and language relating to machine intelligence has indeed changed in exactly the way he predicted. We often describe a computer as ‘thinking’ when we’re waiting for a response. (In fact, it’s interesting to compare this concept to Robert Sawyer’s WWW Trilogy, in which ‘Webmind’ has to artificially delay it’s response in order to appear credible. This isn’t new either in last week’s stories.)
However, even if Turing wasn’t interested in detailed technical issues – at least not at this point, he knew he wasn’t going to get away without being asked them and sometimes he was prepared to offer suggestions; he even pre-empted some of them in the paper itself. And, over time, it’s these – effectively off-the-cuff – remarks that seem to have taken on a greater significance in the semi-technical media than the principles themselves.
The recent claim to have passed the TT is based on one such aside – well the comments above to be be precise. The argument was that, if a program could fool 100%-70% = 30% of ‘interrogators’ for five minutes, it would have passed the TT. This is utter, utter nonsense. Turing was making a long-term prediction, not defining the parameters of the test. Someone may as well have said, in 1900, “One day, I think a car will be able to travel at 100mph; then, and only then, will we call it a car.” Sure enough, cars eventually reached those speeds but that’s not the definition of a car. Turing’s 70% could have been 50% or 90%; five minutes could have been three or ten. What’s an ‘average interrogator’? Numbers don’t define intelligence and this isn’t what Turing was trying to do.
So, what would a proper TT look like and how do we measure (or define) intelligence? Well, now it’s clear that we get into some very murky waters indeed. How you define intelligence will very much depend on whether you’re a neuroscientist, computer scientist or philosopher. And even within those disciplines, there will be factions of different opinions. It may even depend on fundamental beliefs – religious, spiritual and/or scientific. Once again, Turing’s offering of the Imitation Game was just an attempt to make some sense of these questions in the context of machine intelligence. In many ways, the TT only tests people, not machines.
However, it doesn’t entirely help if we restrict ourselves to a computer science viewpoint and the Imitation Game for possible machine intelligence. Because we have to come to terms with an uncomfortable fact in relation to evolving machine intelligence … which is that we’ve evolved too. As technoogy has improved – particularly in terms of AI – our expectations of it have increased along with it. Things that may have seemed intelligent in the past, don’t any more; things that might seem intelligent now, probably won’t in the future. A hundred years ago, a pocket calculator would have seemed intelligent; twenty years ago, a chess machine did; a computer that passes the TT today would probably fail it in ten years’ time.
It’s likely that we’ll always, in one sense or another, seem to be on the brink of machine intelligence. And from time to time, it’ll look like we’ve made it, whether it’s Deep/Deeper Blue or Eugene Goostman. But, as the technology evolves, we’ll expect more and accept less. If we ever get there, we’ll know; we won’t have to measure it: we’ll know.
So what do you think?