Don't Just Tell Me Your p(doom), Tell Me Your Conditionals
Rather than asking, "What's your p(doom)?" we should be asking, "Under what conditions does AI risk increase or decrease?"
At AI conferences, researchers casually trade their p(doom) predictions like they’re restaurant recommendations. p(doom) is a concept that emerged from artificial intelligence safety discussions, representing the probability that artificial general intelligence (AGI) development will lead to catastrophic outcomes for humanity. It’s often expressed as a number between 0 and 1, where 0 equates to no chance of doom and 1 means 100 percent certainty of doom.
What began originally as a joke has evolved into a serious metric. There is even a Wikipedia page collecting p(doom) estimates. Geoffrey Hinton, the 2024 Nobel Prize winner in physics for his work on AI, puts the number between 10 and 50 percent. Yann Le Cun, the lead of AI at Meta, has his number at less than 0.01 percent. Even Lina Khan, previous Federal Trade Commission Chair, proffered a guess at around 15 percent. It has become a common shorthand in AI risk discussions, allowing researchers and policymakers to quantify and compare their assessments of AI risk.
But I find all the talk of p(doom) deeply unsatisfying.
Let’s take one of the lowest numbers, p(0.05), or a 5 percent chance of doom. Assuming that the clock starts the moment that artificial intelligence is created and every day is an independent trial, it will only take 20 days for doom to occur. However, if you drastically increase your probability to p(0.95), then the expected value is 0.6, which would mean that by mid afternoon, you’re dead.
But notice how I couched this statement. I assumed that every day is an independent trial. But is every day really an independent trial, or should a trial be in mere minutes?
In practice, many p(doom) estimates tend to be unconditional, that is, doom happens regardless of what else might occur. To me, as someone who was trained in Bayesian econometrics, most p(doom) conversations feel like they are the first 5 minutes of an introductory course where you also learn conditionals, and start exploring if probability A increases, decreases, or perhaps follows a bathtub over time, depending on the probability of B.
The more technically astute will update their p(doom) based on changes in the environment. Safety research, governance decisions, and technical architectures should alter future probabilities. Today’s p(doom) should, in theory, be different from tomorrow’s based on what we learn and do today. But most people don’t actually specify their conditionals. I only rarely hear, “my p(doom) is 15 percent assuming current trajectories, 3 percent if we implement interpretability breakthroughs, or 40 percent if we race toward AGI without safety measures.”
Two years ago, Eliezer Yudkowsky had a conversation with podcaster Dwarkesh Patel that perfectly illustrates this fundamental misunderstanding of how probability should work in AI safety discussions. Yudkowsky just released a new book with coauthor Nate Soares titled If Anyone Builds It, Everyone Dies last week, and not surprisingly, he has a p(doom) that is above 95 percent. On X, Jack Rabuck summarized the conversation, saying,
I listened to the whole 4 hour Lunar Society interview with [Eliezer Yudkowsky…] that was mostly about AI alignment and I think I identified a point of confusion/disagreement that is pretty common in the area and is rarely fleshed out:
Dwarkesh repeatedly referred to the conclusion that AI is likely to kill humanity as “wild.”
Wild seems to me to pack two concepts together, ‘bad’ and ‘complex.’ And when I say complex, I mean in the sense of the Fermi equation where you have an end point (dead humanity) that relies on a series of links in a chain and if you break any of those links, the end state doesn’t occur.
It seems to me that Eliezer believes this end state is not wild (at least not in the complex sense), but very simple. He thinks many (most) paths converge to this end state.
That leads to a misunderstanding of sorts. Dwarkesh pushes Eliezer to give some predictions based on the line of reasoning that he uses to predict that end point, but since the end point is very simple and is a convergence, Eliezer correctly says that being able to reason to that end point does not give any predictive power about the particular path that will be taken in this universe to reach that end point.
Dwarkesh is thinking about the end of humanity as a causal chain with many links and if any of them are broken it means humans will continue on, while Eliezer thinks of the continuity of humanity (in the face of AGI) as a causal chain with many links and if any of them are broken it means humanity ends. Or perhaps more discretely, Eliezer thinks there are a few very hard things which humanity could do to continue in the face of AI, and absent one of those occurring, the end is a matter of when, not if, and the when is much closer than most other people think.
In a review of the new book in Wired, Steven Levy asked Yudkowsky about how he expected to die and he retorted by saying, “If you want a more accessible version, something about the size of a mosquito or maybe a dust mite landed on the back of my neck, and that’s that.” Levy continued,
The technicalities of his imagined fatal blow delivered by an AI-powered dust mite are inexplicable, and Yudowsky doesn’t think it’s worth the trouble to figure out how that would work. He probably couldn’t understand it anyway. Part of the book’s central argument is that superintelligence will come up with scientific stuff that we can’t comprehend any more than cave people could imagine microprocessors.
None of this is satisfying to me and it’s my main gripe with the book, which admittedly, I am about halfway through.
The problem with Yudkowsky’s approach isn’t that he’s wrong about AI being potentially dangerous. I, too, am worried about advanced capabilities being harmful. Rather, what irks me is that his framework makes productive discourse nearly impossible. When he makes strong claims that doom is both inevitable and incomprehensible, he puts his argument beyond the reach of rational debate.
The dust mite scenario perfectly encapsulates this problem. By invoking incomprehensible superintelligence, Yudkowsky sidesteps the hard work of modeling how AI systems might cause harm and what we might do about it. Real progress in AI safety requires grappling with concrete failures, understanding specific vulnerabilities, and developing targeted interventions. It’s not as though Yudkowsky hasn’t done this work before, but that’s not what the book is focused on.
Rather than asking, “What’s your p(doom)?” we should be asking, “Under what conditions does risk increase or decrease?” and “Which interventions have the highest expected value for reducing catastrophic outcomes?” This isn’t just an academic exercise. It’s the difference between paralyzing fatalism and productive risk management.