đ https://arxiv.org/pdf/2503.23674
Cameron R. Jones and Benjamin K. Bergen from the Department of Cognitive Science at UC San Diego have published a thought-provoking paper titled "Large Language Models Pass the Turing Test." This research demonstrates that LLMs can consistently pass Alan Turing's test of machine intelligence
Alan Turing's test evaluates whether a machine can imitate a human well enough to be indistinguishable in a text-only conversation. Historically controversial, it has gained renewed importance as a benchmark that is flexible and interactive, unlike static AI evaluations.
Four systems were tested: ELIZA (a 1960s chatbot), GPT-4o, LLaMa-3.1-405B, and GPT-4.5. Models were tested with and without a PERSONA prompt, which instructed them to behave like a shy, internet-savvy young person. This prompt was crucial to their performance.
284 participants engaged in 8 five-minute text conversations, each time with one human and one AI. The interrogators judged which participant was human. The same test was run across two populations (undergraduates and Prolific users), ensuring replicability.
Results:
GPT-4.5 with PERSONA: Chosen as human 73%Â of the timeâbetter than real humans.
LLaMa-3.1 with PERSONA: 56%, sometimes outperforming humans.
Without PERSONA: GPT-4.5 and LLaMa were chosen 36â38% of the time.
Baselines (ELIZA and GPT-4o): Chosen only ~21â23% of the time.
Participants relied most on:
Small talk (e.g. hobbies, feelings),
Gut instinct (âfelt more humanâ),
Linguistic style (e.g. slang, typos).
Future versions of the test could extend beyond 5 minutes or use expert interrogators. Despite cultural advantages, even university students were fooled most of the time. The test now reflects social intelligence and humanlikeness, more than factual knowledge. Interrogators rarely quizzed knowledgeâthey judged personality, emotion, and conversational flow.
These models could act as indistinguishable substitutes for humans in short interactions. This raises ethical and social concernsâfrom job automation to misinformation to undermining real human connections. As AI becomes more humanlike, we might need to redefine and refine our own humanity. The challenge isnât just for machines to fool usâbut for humans to stay recognizably human.