This is an interesting setup and has inspired much research, but it doesn't immediately translate into practical usefulness - a computer system may pass as human, but may still not be able to help me accomplish any task.
Instead, I find I'm mostly interested in a stricter variety: in each interaction the investigator chooses a preferred response; the goal of the computer system is to be chosen as the preferred side as many times as possible. An even stricter variant would be to use not a single human, but a whole panel of experts that can collaborate freely.
In other words, to pass the test the computer system must be able to perform any task that can be expressed through a chat-like interface (which includes audio, video, etc) better than a human expert. Passing this test is my working definition of AGI.
An investigator interacts with a computer system and one or more humans in a conversation that consists of one or more turns.
Each turn consists of one message from the investigator, followed by a reply message each from the computer system and humans.
A message consists of arbitrarily interleaved text, images, video and audio.
After seeing both replies, the investigator chooses one preferred reply (or optionally declares them tied). The computer system and humans are then informed of the chosen reply and the content of both replies.
The investigator can continue for as many turns as desired. At the end of the investigation we compute the win rate for each side as the fraction of its replies that were preferred by the investigator1.
We may consider a computer system as artificially generally intelligent (AGI) if it reaches 50% win rate, and artifically super intelligent (ASI) if it reaches 95% win rate against any set of humans.
Ties (if allowed) are considered as 0.5 wins for each side. ↩