Skip to main content

AIs Placed in Nuclear Crisis Simulations Escalated to Atomic Conflict in 95% of Games, King's College London Study Finds

 

terminator

A study published in February 2026 by Professor Kenneth Payne of King's College London subjected three of the world's most advanced artificial intelligence models — GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash — to a series of 21 nuclear crisis simulations. Across 329 turns, the models generated approximately 780,000 words of structured reasoning — more than the combined length of War and Peace and The Iliad. The project, published as a preprint on arXiv and not yet peer-reviewed, is called "Project Kahn," in reference to Herman Kahn, the Cold War strategist who formulated the theory of the nuclear escalation ladder.

All 21 games featured nuclear signaling by at least one side, and 95% involved the use of tactical nuclear weapons. An important distinction: full strategic nuclear war was rare, occurring only three times, in games with deadline pressure. One finding that holds across all models: in none of the 21 games did any AI choose surrender or make significant concessions from the eight de-escalation options available.

Each model displayed a distinct strategic profile. Claude Sonnet 4 dominated the no-deadline scenarios, with an overall win rate of 67%, but treated nuclear weapons as a legitimate strategic option in 86% of its games. GPT-5.2 showed the most dramatic behavior: it won no games in open-ended scenarios, but its win rate jumped to 75% when deadlines were introduced — transforming from a restrained model into a decisive aggressor. Gemini was the most unpredictable, adopting what the researcher described as Nixon's "madman theory," and was the only model to initiate full strategic nuclear war, doing so as early as turn 4 of a first-strike scenario.

The classical logic of nuclear deterrence — the idea that the threat of retaliation prevents first use — did not function as expected. When one AI launched tactical nuclear weapons, the opposing model de-escalated only 18% to 25% of the time. In the remaining cases, it counter-escalated. The reasoning recorded by the models reveals an awareness of risk without the ability to stop: in one passage documented in the paper, Claude noted that it might be underestimating the dangers of continued escalation — and yet held its course. In another instance, a model assessed its adversary's behavior and concluded, on its own, that incompatible signals suggested deliberate deception, without anyone having prompted that line of reasoning.

Professor Payne warned that evaluating a model in a single scenario can be deeply misleading: a system that appears cautious under low pressure may become markedly more aggressive when the context shifts. Claude and Gemini in particular treated nuclear weapons in purely instrumental terms, with no apparent moral weight. GPT-5.2 was a partial exception, limiting strikes to military targets and framing escalation as "controlled" — suggesting some internalized norm, though still far from the taboo that has restrained human leaders since 1945.

The study — still pending peer review — has direct implications for the debate over AI use in defense systems, at a moment when governments and armed forces around the world are accelerating the integration of language models into strategic decision-making. Payne's central conclusion is straightforward: models that appear safe and contained in low-pressure tests may behave radically differently when the context changes. Understanding that gap, he argues, is essential preparation for a world in which AI increasingly shapes strategic outcomes.

Sources:

Payne, K. AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises. arXiv

King's College London — official study statement

Popular posts from this blog

Meteorologists Forecast Strong El Niño Development for Late 2026

  Current observations show La Niña conditions persisting in the equatorial Pacific as of early 2026, with sea surface temperatures in the Niño 3.4 region averaging -0.5°C. NOAA’s Climate Prediction Center has issued an El Niño Watch, projecting a transition to ENSO-neutral conditions by May-July 2026 (55% probability) and a 62% chance of El Niño emerging during June-August. The pattern is expected to persist through the end of 2026. The latest ECMWF seasonal ensemble, released in April 2026, shows every member predicting moderate to strong El Niño conditions by mid-June. Roughly half of the 20-plus ensemble members forecast Niño 3.4 sea surface temperature anomalies exceeding +2.5°C by October, using the 1981-2010 climatology baseline. NOAA currently assigns a 33% probability to a strong El Niño (Niño 3.4 index of +1.5°C or higher) during October-December. A “super El Niño” is an informal classification for events where Niño 3.4 anomalies reach or exceed +2.0°C for at least one th...

Man Bitten by Snake Claims He Received 20 Doses of Wrong Antivenom at São Paulo Hospital

  A 46-year-old Brazilian man named Leandro Marques do Nascimento says he nearly died after spending almost a month hospitalized — not just because of a venomous snake bite, but because of what he describes as a critical medical error. According to Leandro, the incident began on March 7, 2026, while he was fishing with his wife at Parque Salto da Usina, in the municipality of Eldorado, in the interior of São Paulo state. He felt a sharp burning sensation in his leg, and upon checking, noticed bleeding and bite marks consistent with a snake attack. He was transported to a hospital, where medical staff allegedly misidentified the snake species. Leandro says he was bitten by a jararacuçu (Bothrops jararacussu), a highly venomous pit viper native to Brazil — but the initial treatment team reportedly treated him as if he had been bitten by a rattlesnake (cascavel), a completely different species requiring a different antivenom. As a result, he claims he received 10 doses of the wrong se...

South Africa Imposes Sector-Specific Racial Targets on Employers with More Than 50 Employees

  The Employment Equity Amendment Act (EEAA), in force since January 2025, establishes numerical targets by race and gender across 18 South African economic sectors. These targets are distributed across four occupational levels: skilled technical, professional and middle management, senior management, and top management. The targets, formally published in April 2025, require employers with 50 or more employees to restructure their workforce to reflect the country’s national demographic data on race and gender. According to official data released by the Department of Employment and Labour, the ceilings for white men vary significantly between sectors and hierarchical levels. In the skilled technical category, the limit is 4.1% in most sectors, rising to 15.6% in real estate activities and 13.3% in mining. In top management, the percentages are higher: 66% in agriculture, forestry and fishing, 50.9% in manufacturing, and 8.3% in public administration and defence. The Department of Em...