Intelligence Tests for Developing Agents
by JS
The Turing Test is a famous test of machine intelligence. In the basic setup, a human communicates with an agent through some kind of computer terminal (so the agent is never visible). The human is tasked with determining whether the agent is a machine or another human. The goal of machine agents is to imitate natural conversation to the point of being indistinguishable from a human participant.
Like many good ideas, this test has caused a lot of controversy over the years, and as artificial intelligence has moved away from the study of the mind and into concrete application areas, discussions about intelligence tests (Turing and otherwise) have languished somewhat. There are occasionally some exceptions to this general trend. For instance, at this years ICDL in Ann Arbor, Jivko Sinapov and Alexander Stoytchev presented an intelligence test for robots in their award winning paper, “The odd one out task: Toward an intelligence test for robots.”
One property that intelligence tests seem to all share is that they depend critically on human observers building internal models of agents that are indistinguishable from the models human observers build for other humans. Even in constrained situations like the odd one out task, these kinds of assessments usually take the form of specifying a task that human and other agents can both perform. Our interpretation of the functional nature of the performance becomes the basis for assessing whether the agent has “passed” the test.
This seems to lead to a number of interesting problems. For one, there is a many-to-one correspondence between methods and results (many methods may yield the same behavior), so our “attributing” human-like models to agents may in fact be a faulty attribution (the agent could be doing something simple but demonstrating the correct contextual behavior). Of course there’s the opposite problem of recognizing that intelligent behavior may not necessarily look like human behavior, and so even if we fail to attribute human-like models to artificial agents, we may be missing other signs of intelligence. I imagine that this kind of error gives science fiction authors and people searching for other intelligent life in the universe headaches.
There have been a number of alternate proposals for intelligence tests geared towards developmental agents, most recently in a paper by Paul Cohen. In that paper, Cohen summarizes a number of possible replacements for the venerable Turing Test:
- Robot soccer
- Never ending language learning
- A virtual third grader
Unlike these proposals, the new body problem lacks a easily describable goal. This seems to arise naturally in any relatively task-free scenario, and many of Cohen’s examples skirt the issue by incorporating a variety of task or task-like elements in each test. The new body problem simply states a desire to learn about a new body starting from scratch. It’s simple but the criterion for success if still vague. What ought an agent that can solve the new body problem be able to achieve? Certainly we can imagine any number of tasks that we could use to test an agent, but any restraint on the tasks would lead us back into the morass of subjectively attributing human like models of intelligence to agents.
Here’s a possibility that I’m currently considering. What if the end goal of an agent solving the new body problem is to describe itself? We would have to apply subjective kinds of judgments to the descriptions provided (e.g. a low level inventory of raw sensors and motors is not really demonstrative of the kind of development we would like to see). On the other hand, we do not seem to have the same kind of a priori biases in evaluating an agent describing itself than we would evaluating an agent within a particular task domain. In other words, we would have an open ended evaluation for an open ended learning problem, which strikes me as precisely the kind of particularist approach we would ultimately need if we ever wanted to truly test for intelligence.
