Prepare for the Predictor: Why One-Boxing Wins Newcomb’s Problem

“If you fail to plan, you are planning to fail”

-Benjamin Franklin

The Problem

Newcomb’s problem is one of the most famous problems in the branch of philosophy, probability, and mathematics called decision theory. What’s even more interesting, in my opinion, is that contemporary philosophers are nearly split down the middle between one-boxing and two-boxing. In this essay, I will argue that the only coherent strategy is to one-box. Here is the problem, taken from Wikipedia.

Two boxes are designated A and B. The player is given a choice between taking only box B (one-box) or taking both boxes A and B (two-box). The player knows the following:

Box A is transparent, or open, and always contains a visible $1,000.
Box B is opaque, or closed, and its content has already been set by the predictor:

If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing. If the predictor has predicted that the player will take only box B, then box B contains $1,000,000. The player does not know what the predictor predicted or what box B contains while making the choice. The predictor is reliable.

The Two Theories

At the most basic level, there are two paths of decision theory that lead to two different outcomes. Evidential decision theory (EDT) argues that one should choose the path that leads to the best evidence about outcomes. Therefore, since there is strong evidence that taking only the opaque box leads to 1 million dollars, one should one-box. Causal decision theory (CDT), on the other hand, argues that one should only look at the causal effects of their actions. The predictor has already made its decision and “walked away” from the situation. Therefore, taking two boxes won’t sway its prediction. Thus, it would be silly to leave an extra $1,000 on the table, and one should two-box. Newcomb’s problem is interesting because it challenges, even penalizes, rational behavior. For a rational agent that values money, it doesn’t make sense to leave another $1,000 on the table. However, by doing so, one could jeopardize the even larger prize.

A Contemporary Solution

Philosopher Simon Burgess of Monash University divides the problem into two stages: before the prediction and after the prediction.

Stage 1: Before the prediction is fixed.

Here one can still influence what the predictor will believe about them. One can credibly commit to being a one-boxer. If the predictor is reliable, commitment should affect whether B is filled. So, before the prediction, it is rational to adopt the one-boxing policy.

Stage 2: After the prediction is fixed.

Now the contents of B are already determined. If one arrives here without prior commitment, the narrow, act-by-act calculus says two-box. But by then, the majority of the money has already been decided by what one did in Stage 1.

The problem here is that those two paths are mutually incompatible, and an actor can only choose one option. So, what must one do?

Burgess again proposes, as a baseline, that one must assure themselves that they will one-box given the opportunity. As Benjamin Franklin once said “If you fail to plan, you are planning to fail.” That adage holds true in this situation as well. Anyone encountering Newcomb’s problem for the first time inevitably wonders why they wouldn’t simply choose to take both boxes. Worst-case scenario, this convinces the predictor to leave the opaque box empty. More likely, as one then attempts to convince themselves to one-box instead, the predictor realizes that one is simply posing as an agent who would one-box with every intention of switching back later. The person who succeeds without preparation is the person who wholeheartedly believes that they will one-box, without thinking enough about the possible irrationality of doing so. In reality, when faced with life-changing amounts of money, the third scenario would be very unlikely. If this seems paradoxical, it’s because it is. An agent who discovers Newcomb’s problem without preparing for it is doomed to fail.

Functional Decision Theory

But how does one prepare to make themselves believe something? To understand this, we must first take a step back. Instead of the two traditional models of EDT and CDT, select analytic philosophers model Newcomb’s problem through the lens of Functional Decision Theory (FDT). FDT says that one should choose the output of the function or model that would have led the predictor to set up the world most favorably. This conclusion rests on the assumption that the predictor would have set up that same function/model as well. This is different from both CDT (pick that act that looks best given that the world is fixed) and EDT (pick the act that is best based on previous evidence). To understand FDT, we must realize what the predictor does.

The predictor is not a mind-reader, as some may believe, but instead a highly-sophisticated modeling machine. When deciding to fill the box or not, the predictor creates a reliable model of the player and infers the decision one is about to make. Notice that this does not mean that it infallibly predicts the future—it means that it takes a highly-educated guess, given one’s background, whether one will take one or both boxes.

The key part of this theory is that the predictor likely models one’s own decision process as well. Therefore, the player’s present output and the predictor’s earlier simulation are instances of the same abstract computation, and that link exists without any backward causation. If the predictor likely models one’s decision as well, then the prudent thing to do would be to one-box, as this would maximize value by 999k.

Prepare, Prepare, Prepare

This is where preparation comes in. It’s easy to just say that one is a one-boxer, but will that really be enough to convince the predictor? Probably not. The player’s best bet is to train themselves, so that when the predictor models a version of them to make a decision, it finds enough evidence to suggest that they will one-box. This must be done in a few different ways. First of all, one can adopt an overarching belief. It could be something like, “In Newcomb-like situations, I will one-box.” If one enforces this belief as not just a momentary whim but as the output of a function, it is at least a start. Another way to prepare oneself to be a one-boxer is to establish outward credibility. One can tell others that they intend to one-box, explaining why one-boxing is the logically better decision. One can even make a speech—and do anything else in order to convince the predictor that they are not just a “pretend one-boxer.” Lastly, one can work the logic out by themselves. If one does not sincerely believe in the strength of the one-boxing strategy, then establishing outward credibility and telling themselves to do so will have far less of a chance of convincing the predictor.

Preparation dispels the familiar first-time reaction: why not just take both? That wobble is exactly what a strong predictor detects—either as direct evidence they will two-box or as a sign that they are merely posing as a one-boxer. To avoid this, one must make themselves into as much of a one-boxer as they can. If not, they leave 1 million dollars up to whimsy and chance.

Yet, Newcomb’s problem is still anything but solved. Let us dive into counterarguments.

The Forced Belief Argument

Pascal’s wager says that the rational thing to do is to believe in God. Outside the obvious counterarguments that thousands of jealous Gods exist, Pascal still runs into the problem of reconciling rationality and belief. A similar problem arises here. FDT says that one should make oneself into someone who would one-box through internal and external reinforcement. However, what’s to say that the predictor doesn’t sense that one is simply pretending to be a one-boxer? Pascal himself admitted that true belief in God could not come from rational means—it had to come from the heart. Likewise, if one’s first reaction is to two-box, can they ever disguise that original intention, or will the predictor always sense it?

This argument is certainly convincing. Yet, as it often happens in philosophy, the “belief in God” and the preparation-based Newcomb’s argument are apples and oranges. Most theologians would agree that belief occurs when one takes a leap of faith—believing without evidence to base that belief on—because of utmost trust in a higher power. If one solely takes pleasure into account as reason to believe in God, it doesn’t seem to fit this criterion of true belief to me. However, Newcomb’s problem doesn’t penalize rationality the same way. If one convinces themselves to one-box because of logic, it doesn’t detract the strength of that belief. If anything, it actually supports it—making the belief less of a subject to mental flip-flopping. Additionally, the predictor doesn’t punish agents for prioritizing pleasure over belief. In other words, if one chooses to one-box to give themselves more happiness, there is no problem with doing so. Now, where the analogy would apply is if someone attempted to convince the predictor that they were a one-boxer but their clear underlying intention was to two-box. On paper, this is the rational belief. However, because the problem is constructed to challenge pure rationality, the predictor would sense this fake belief—just as God would sense belief based solely on reward.

The No-Prior Knowledge Argument

Newcomb’s problem is very vague about both when the predictor makes its decision and when the player first learns about the two options. Of course, if one had the time to convince themselves to one-box, then they would. However, the problem simply does not specify that one does. Therefore, we should assume that one learns about Newcomb’s problem a minute or two before they have to make their decision. If this is true, then “giving oneself a pep-talk” won’t change anything.

However, the classic Newcomb setup either states or presumes that the agent understands the structure of the game (including predictor reliability). Adding that the player has to make an instantaneous decision is an additional restriction not specifically mentioned. If this were true, it would also seemingly cancel out any rational thought: most logic can’t be reasoned through in a minute or two. Moreover, a reliable predictor should still model one’s decision procedure, not their last-minute self-talk. Under FDT, one’s current output and the earlier simulation are instances of the same computation. Additionally, even if one somehow did not have prior knowledge of the setup, and had no time to reasonably make themselves into a one-boxer, it doesn’t mean they should then two-box. Doing so would be prioritizing a locally rational decision (taking an additional $1,000) instead of the better decision overall.

The View from Nowhere Argument

Proponents of two-boxing may still argue that if one takes enough time to prepare before encountering the predictor, they should have enough preparation to both convince the predictor that they will one box but also, in the back of their mind, know that it is rational to two-box in the end. This belief would have to be present, of course, but it would also have to be so minimal/so well disguised in one’s persona that the predictor wouldn’t detect it.

This argument is relatively weak, so I won’t spend too much time on it. In other words, the idea of two-boxing without planning to do it before would be “a view from nowhere”—something I don’t think can exist. Even if we accept libertarian versions of free will, i.e, that one can make a decision not solely based on past nature and nurture, it still doesn’t follow that one would be able to make a decision completely separate from previous factors. Therefore, the reality is that a “view from nowhere” is nearly impossible. One cannot plan to one-box but then switch to take both boxes without being detected by the predictor. Likewise, one also cannot rely on the ability to switch on the spot without planning to do so before. This leaves only one option: convince oneself to one-box, and stick to the plan.

So What?

Newcomb’s problem illustrates that the correct choice is sometimes not the rational one. Moreover, it displays that reason can’t weasel its way out of every situation—and that sometimes irrationality is needed for the proper decision. Lastly, as I hope I’ve shown in this paper, sometimes the only way to approach dilemmas like this is with careful forethought and preparation. Without doing so, one ensures that life-changing money is left up to the whims of a momentary decision.