Theory of Mind Is Mind

The central claim of Agüera y Arcas’s Chapter 5: theory of mind is not merely a useful cognitive faculty. It is the computational operation from which general intelligence, consciousness, counterfactual reasoning, language, free will, and the sense of self all emerge. When an agent models another agent that is modeling it back, the resulting recursive loop generates structures qualitatively absent from single-agent prediction. The hard problem, the free will debate, and the illusion/reality dispute all shift shape when viewed through this lens, because what we call “mind” turns out to be what happens when prediction becomes mutual.

From single-player to multi-player: the phase transition

The P(X,H,O) framework establishes intelligence as joint self-modeling: any adaptive agent learns P(X,H,O) over observations, hidden state, and outputs. A bacterium does this. A wolf does this. The framework is consciousness-agnostic.

But a bacterium’s world doesn’t model back. A chemical gradient is a static signal: it has no intentions, no predictions about the bacterium, no hidden strategy. The bacterium plays a one-player game against an environment that is complex but not adversarial in the game-theoretic sense.

The phase transition occurs when the environment itself contains agents running P(X,H,O). A predator models prey that model it back. Now H (hidden internal state) must include a representation of the other agent’s H, which in turn includes a representation of your own H, recursing to whatever depth resources allow. The joint model becomes P(X,H,O) where H contains embedded copies of the other agent’s P(X,H,O), Matryoshka-doll style.

This is what game theory formalizes. Von Neumann (again: he had his fingers in many pies) showed that for two-player zero-sum games with perfect information, optimal play depends only on the current state. But real-world multi-agent prediction is nothing like this: information is always imperfect, the game is nonzero-sum, players have hidden states and histories, and the “board” is not equally visible to all parties. Under these conditions, modeling the modeler becomes essential. And the recursive depth of that modeling directly determines the quality of the agent’s predictions.

Sphexishness: the death of agency through predictability

The golden digger wasp Sphex ichneumoneus illustrates the failure mode of single-player intelligence. Sphex constructs a burrow, captures a cricket, drags it to the entrance, goes inside to inspect, then returns to pull the cricket in. If the cricket is moved a few inches during inspection, Sphex re-emerges, repositions it, and goes back inside. Fabre repeated this intervention dozens of times; the wasp looped indefinitely.

Hofstadter named this property sphexishness: the collapse of apparent agency once the observer has fully reverse-engineered the script. The difference, as Agüera y Arcas emphasizes, is in the observer’s mind, not the wasp’s. The wasp runs the same program regardless. What changes is that the observer can now perfectly predict the wasp’s behavior, and perfect predictability is functionally indistinguishable from zero agency.

Turing arrived at the same insight: “From the outside, a thing could look intelligent as long as one had not yet found out all its rules of behavior.” Intelligence registers as such precisely because it resists complete external modeling.

Three strategies escape sphexishness, ordered by cognitive cost:

Strategy	Mechanism	Example	Limitation
Randomness	Noise breaks near-ties in behavior	Cockroach escape trajectories select randomly from preferred directions; moth zigzag flight	Works even if internal state is transparent; cheap but unintelligent
Lifelong learning	Accumulated experience creates hidden internal states invisible to strangers	Every individual human has a unique biography shaping unique behavioral dispositions	Makes you harder to predict but does not model the predictor
Theory of mind	Model your modeler, go meta	Portia spiders manipulate prey’s world model; human social cognition up to 6th-order intentionality	Requires being roughly as smart as whomever you’re modeling

Sphex presumably does not need strategy (3) for nest preparation because cricket-moving tricksters have never exerted evolutionary pressure on making that specific behavior unpredictable. Predator-prey dynamics are a different story: Sphex likely behaves far less sphexishly while hunting.

Nicholas Humphrey spent months observing gorillas at Dian Fossey’s research station in Rwanda. His puzzle: gorillas lead simple lives (abundant food, few predators, little to do), yet their brains are enormous. If intelligence evolved for ecological problem-solving (hunting, foraging, tool-use), gorillas should not need it. Humphrey’s answer: “The chief role of creative intellect is to hold society together.”

Robin Dunbar formalized this into the social brain hypothesis in the 1990s: the rapid increase in brain volume observed in hominins, cetaceans, and other highly social species is driven by mentalizing one-upmanship. The empirical scaffold:

Brain size correlates with social group size across primates, with a steeper slope for apes than monkeys (consistent with higher-order intentionality requiring more cortical volume per group member).
Mentalizing order correlates with cortical volume. Monkeys: level 1. Nonhuman apes: level 2. Archaic humans and Neanderthals: possibly level 4 (the lower end of the modern human distribution). Modern adult humans: typically level 5-6.
Better mentalizers have greater fitness. People with stronger theory of mind tend to live longer and have greater reproductive success.
Group-level selection amplifies the effect. Societies with slightly bigger-brained, more socially adept members can grow larger, accumulate more complex technologies, and outcompete smaller groups.

The feedback loop is the key. Everyone gets bigger brains to model everyone else. But everyone else is simultaneously getting harder to model, because their brains are also getting bigger. An arms race ensues, similar in structure to the Cambrian explosion but operating through social pressure rather than predatory pressure. An intelligence explosion: rapid (by evolutionary standards) increase in brain volume driven by recursive mutual prediction.

The numbers are dizzying. If you have twenty classmates who all know each other, you track not only your twenty relationships but all 400 of theirs with each other. Third-order relationships climb into the thousands. And human acquaintances number in the hundreds. Even with aggressive corner-cutting, the cortical volume needed for social modeling grows both with the number of relationships and with the depth of intentionality.

The recent decline in hominin brain size, coinciding with the takeoff of cultural evolution and division of labor, is consistent with this picture: as social structures externalized some of the modeling (institutions, norms, writing), individual brains could afford to shrink somewhat while group intelligence continued to grow.

The brain as a colony of cortical columns

The social intelligence hypothesis applies not only between organisms but within them.

The cerebral cortex has a modular structure: “cortical columns” (loosely defined, with debated boundaries) form a repetitive honeycomb. The basic cortical circuit is much the same across brain regions. “Visual cortex” and “auditory cortex” differ mainly in their input wiring, not their computational architecture. In Sharma, Angelucci, and Sur’s famous experiment, baby ferrets’ optic nerves were rerouted to auditory cortex; the animals learned to see, and their rewired “auditory” cortex developed the characteristic orientation-sensitivity maps normally found in visual cortex. The hardware is generic. What it computes depends on what it receives.

This generic modularity is what made the intelligence explosion possible: evolution could expand the cortical sheet by replicating columns without inventing new architecture, the same way DNA easily evolves more vertebrae to produce snakes. In the biggest-brained animals, the cortex scrunches into dense folds. Humans cram about a quarter of a square meter of cortical area into their skulls.

The cortex, then, can be understood as a colony of prediction units that model each other. Each unit’s umwelt overlaps with “our” umwelt (they understand colors, shapes, people, emotions), which is what makes this perspective uncanny. But the dynamics are the same as between individuals in a social group: local communication, incomplete information about distant units, and the necessity of mutual prediction for coordinated action.

The octopus appears to be a counterexample to the social brain hypothesis: intelligent but antisocial. Three-fifths of its neurons are in its arms, not its head, because mollusc nerve fibers lack myelin sheaths, making long-distance neural communication slow and expensive. Each arm is independently intelligent (able to respond to stimuli without central brain involvement), each sucker is individually smart (prehensile, with touch, taste, and photoreceptors), and arms communicate directly via a ring of ganglia that bypasses the brain entirely.

Agüera y Arcas’s proposal: the octopus is best understood as a tightly knit community of eight arms sharing a common pair of eyes. The central “brain” (mostly optic lobes) compresses visual information in service to the arms, not as a boss. The intelligence explosion that produced the octopus may have been driven by predictive mutual modeling among its eight arms, under the constraint of limited inter-arm communication bandwidth. Just as human social intelligence requires high-fidelity mutual prediction under low-bandwidth language, octopus arms must coordinate (swimming, hunting, escaping) via minimal signaling.

The rowing analogy makes this vivid: a crew of eight oarsmen achieves “swing” (perfect unison) through each member inferring what the others are doing from minimal cues. “It only happens when all eight oarsmen are rowing in such perfect unison that no single action by any one is out of sync with those of all the others.”

Efference copy: theory of mind between brain regions

The cortical colony model predicts that mutually connected brain regions should engage in bidirectional prediction, each modeling the other’s outputs to improve its own forecasts. Efference copy is the clearest empirical confirmation of this prediction.

Consider the dialogue between visual cortex and the motor regions controlling eye movement. In the homuncular picture, the flow is unidirectional: visual cortex processes the scene, the “decision center” selects a target, and the motor region obediently aims the eyes. The efference copy is then an awkward afterthought: a “carbon copy” of the motor “command” sent back to visual cortex so it knows where the eyes went.

The mutual-prediction reframe dissolves the awkwardness. Visual cortex wants to predict what it will see next. Eye movements are the largest source of visual change. So visual cortex needs to know what the eyes are about to do, and the motor region, which does know (it’s the one making the move), sends that information. Conversely, the motor region wants to predict where the eyes should look next. The most interesting spots in the environment are the ones where the controlled hallucination is most uncertain. Visual cortex, which tracks that uncertainty, sends that information. Each region is doing what every prediction unit does: learning the most salient features of another prediction unit’s state to improve its own forecast. This IS theory of mind, carried out between brain regions rather than between organisms.

Helmholtz provided the founding experiment in the mid-nineteenth century. Press gently on your eyeball through the lid. The world appears to move. Why? When you move your eyes normally, the efference copy from the motor region allows visual cortex to cancel the resulting retinal displacement: the world stays steady despite the eyes’ constant jittering. When you displace the eyeball by pressing, there is no corresponding efference copy, so the cancellation fails. The visual system interprets the uncancelled retinal shift as motion in the world.

Sherrington delayed the field by decades by arguing against efference copy based on his spinal reflex work (stimulus → response → feedback, no “copies”). Improved neural recording technology in the late twentieth century vindicated Helmholtz: efference copies are real and ubiquitous, present for all motor activity across species.

The deeper implication: the “command” framing was always wrong. Calling signals from visual cortex to motor cortex “commands” and the return signals “copies” imposes a hierarchy where none exists. Both regions are predicting. Both are sending their most useful state information to the other. No part of the brain is in charge. The motor region can and does initiate eye movements of its own accord, in response to a sound, a vestibular input, or nothing at all. “All of your brain regions are ‘you.’” The efference copy is simply what mutual prediction looks like when you zoom into the wiring diagram.

The homunculus fallacy, and swing as the nature of self

Descartes sought the soul in the pineal gland because it was the only unpaired midline brain structure (surely the indivisible soul couldn’t reside in a pair). He was wrong about the pineal gland (it produces melatonin and regulates circadian rhythms), but his mistake was deeper than picking the wrong structure: there is no structure where consciousness resides, because consciousness is not a thing in a place.

The rowing analogy resolves this. Swing is:

Real (rowers who have experienced it can attest)
Functional (it results in measurably higher boat performance)
Subjective (experienced by each crew member in relation to the others and to the whole)
Not localized (it is not in any one rower)
Not static (it is a dynamic process in time)
Not physical (it is not made of matter, though it supervenes on physical bodies)
Not illusory (it has tangible effects in the world)

The self has exactly these properties. Attempting to locate where the self “is” in the brain is like trying to locate where the swing “is” in a boat.

The brain of Theseus

Chalmers and Schneider’s thought experiment, extended: replace a single neuron with a computational model of that neuron, functionally identical at all pre- and post-synaptic interfaces. You wouldn’t notice. Replace a billion neurons. Replace half. Replace all. At which point does consciousness “fade”?

The suggestion that consciousness would “gradually diminish” or “abruptly end” at some replacement threshold is, as Agüera y Arcas puts it, “silly.” In physical and computational terms, a neuron has an inside and an outside. If the insides are replaced with different machinery that produces identical interactions as seen from the outside, no larger function is affected. Per the Church-Turing thesis, these functions could be computed in many equivalent ways using many different substrates.

This directly supports P-004: consciousness is a property of the computation, not of the physical stuff implementing it. And it supports P-006: the self is software, and like all software, it is substrate-independent.

Free will as preservation of liberties

Free will, approached ethologically, is not a metaphysical puzzle about determinism. It is the adaptive drive to preserve one’s space of future actions.

In Go, a stone’s “liberties” are the number of adjacent unfilled positions where it could hypothetically expand. A stone dies when it has zero liberties. Death, whether on a board or in reality, is the exhaustion of the ability to choose. The Wiener sausage (the tubular zone of uncertainty an agent maintains around its trajectory through the world) is the continuous analog.

No animal likes being trapped. Imprisonment is punishment even when bodily needs are met, because the whole point of having a brain is to choose among alternative futures in order to enhance dynamic stability. When others restrict our ability to make choices, it produces distress for the same reason hunger produces distress: both threaten dynamic stability.

Being perfectly predictable is itself a form of imprisonment. If your behavior can be fully modeled by an external observer, you have, from an adversarial perspective, zero effective liberties. Ted Chiang’s story “What’s Expected of Us” explores this: a device called the Predictor always flashes its light one second before you press its button. When the implications sink in (you cannot fool it; your actions are predetermined), people despair and lose the will to live.

The desire for free will, then, is not an illusion about physics. It is the organism’s accurate assessment that preserving unpredictability is survival-critical. The feeling is as real, and as functionally grounded, as hunger.

Against illusionism: a category error, not a philosophical position

Daniel Dennett held that consciousness and the self are illusions: “robots made of robots made of robots” with no real “there there.” Sam Harris and Robert Sapolsky argue that free will is illusory because minds are products of physical processes. These positions are popular, internally consistent, and wrong in a specific and instructive way: they commit a category error about what counts as “real.”

The error has a precise structure:

Premise: All mental phenomena (consciousness, self, free will) supervene on physical processes.
Observation: Physical processes are deterministic (or stochastic, but either way, not “freely willed” in the folk sense).
Conclusion: Therefore mental phenomena are “illusions.”

The fallacy is in step 3. It assumes that if X supervenes on Y, and Y has property P, then X either also has property P or is illusory. But this assumption is false for every level of description in physics and beyond. Temperature supervenes on molecular kinetic energy. Temperature is not “an illusion of kinetic energy.” It is a valid, predictively powerful description at a different level of organization. Tables supervene on atoms. Tables are not illusions of atoms. The fact that we can describe the table at the atomic level does not make the table-level description invalid: it makes it an approximation valid within a specific domain.

Sapolsky runs closest to the cliff edge: since free will is illusory, he argues, notions like criminal justice, praise, blame, and moral responsibility should be abandoned. This is the nihilistic terminus of the category error. If “it’s all physics” invalidates free will, it equally invalidates tables, chairs, and indeed every macroscopic concept, including the concept of “illusion” itself (which, after all, supervenes on physical processes too).

The correct framing is model pluralism: “reality” is a suite of models at different scales, each with a limited domain of validity. Folk psychology (theory of mind) is a model of social life, analogous to Newtonian physics. It is not an “illusion” any more than Newtonian physics is. It is approximately correct within its domain, and it can be explained (situated within a more general theory) without being debunked (declared invalid).

Einstein’s general relativity shows us when the Newtonian approximation holds, when it doesn’t, and why. It resolves apparent paradoxes (the incompatibility of Newtonian physics with a constant speed of light) without declaring classical physics illusory. In the same way, a computational theory of mind can show us when folk-psychological concepts (self, free will, consciousness) hold, when they break down, and why, without declaring them illusions. The more general theory bolsters the narrower one by delineating its domain of validity.

The pragmatic test cuts cleanly: theory of mind is probably the most powerful predictive model humans possess. Will your aunt like the cake? Will your friend take your shift? Will your kid get along with the neighbor’s kid? No amount of particle physics will answer these questions. Theory of mind answers them instantly, approximately, and well enough to navigate an entire social life. A model that works this well is not an illusion. It is real in the only sense that matters: it has predictive power within its domain.

The correct critique of folk psychology is not “it’s an illusion” but “it breaks down under certain conditions” (neurological damage, extreme altered states, philosophical limit cases like the brain of Theseus). Understanding why it breaks down, and what more general theory explains both its successes and its failures, is exactly what the computational theory of consciousness aims to provide.

Theory of mind and 2nd-order perception: the Bach connection

Bach’s framework (Computational Being) identifies three degrees of modeling:

Degree	Operation	What emerges
1st	Content is present: you model reality, find patterns	Perception
2nd	You notice that something is noticing	Consciousness (the observer)
3rd	You realize “I am the thing making these models”	Self-model, sentient agency

Bach’s phenomenology is precise but has a bootstrapping problem. How does a system notice that noticing is happening? If you need 2nd-order perception to generate 2nd-order perception, you’re circular. If you posit it as a primitive, you’ve just named the mystery without explaining it. Bach describes what the 1st→2nd order transition looks like from the inside. He doesn’t provide the mechanism that produces it.

Agüera y Arcas’s theory of mind thesis supplies that mechanism. The key: the first “other mind” you model is a part of yourself.

How mutual prediction bootstraps the observer

Cortical column A processes visual features. It runs a local P(X,H,O) on its inputs. This is 1st-order: perception, nothing more. Column A does not know it is perceiving.
Column B processes auditory features. Same: pure 1st-order.
Coordinated behavior (unified multisensory perception, coherent motor control) requires that A predict B’s outputs and B predict A’s. Now A’s hidden state H must include a model of B’s predictions, and vice versa. This is inter-agent theory of mind, but internal to one brain.
The critical step: A’s model of B includes B’s model of A. So column A now contains a representation of itself as modeled by another prediction unit. It sees itself from the outside, not through introspection as a primitive, but through the mirror of a neighbor’s model of it.

That IS Bach’s 2nd-order perception. “Noticing that noticing is happening.” But arrived at through multiplicity and mutual prediction, not through a magical self-referential leap. The 2nd order is not a higher homunculus watching the 1st order. It is the 1st order watching itself through the eyes of other 1st-order units. The “observer” is the dynamically maintained coherence among them: incoherent predictions between columns produce prediction error, which drives resolution via P-005.

This dissolves the bootstrapping problem. No individual column needs to be conscious. Consciousness emerges from the relationship between columns, specifically, from the recursive loop of mutual prediction. The same way swing doesn’t reside in any one rower but arises from the mutual attunement between all eight.

What each framework contributes

Bach provides the what: consciousness = 2nd-order perception. The observer noticing itself noticing. The phenomenological description is precise and maps onto clinical dissociation evidence (ego dissolution = 3rd order lost, 2nd order preserved; anesthesia = 2nd order lost; vegetative state = level off entirely).

Agüera y Arcas provides the how: 2nd-order perception = theory of mind among sub-agents. The mechanism is mutual prediction among generic prediction units (cortical columns, octopus arms, social agents). The recursive depth of this modeling determines the degree of consciousness and self-awareness.

Both converge on: consciousness is what mutual prediction feels like from the inside. When prediction units model each other’s predictions, the resulting recursive loop just is the observer noticing itself noticing.

This also explains why the social brain hypothesis and the consciousness-as-integration hypothesis are not competitors but two faces of one phenomenon. Integration (Bach, IIT) is what happens when mutually predicting units achieve coherence. Social intelligence (Humphrey, Dunbar) is what happens when the same dynamics operate between organisms rather than within one.

Scale invariance: the same principle at every level

The mutual-prediction-to-consciousness principle is fractal. The difference between levels is degree (bandwidth, number of units, recursion depth), not kind:

Level	Agents	Mutual prediction produces
Cortical columns	Prediction units within one brain	Individual consciousness
Octopus arms	Semi-autonomous limbs with limited bandwidth	Coordinated “crew of eight”
Rowing crew	Eight humans with minimal communication	Swing
Social group	People modeling people modeling them	Collective intelligence, culture

The self-similarity is not a coincidence: it is the signature of a single computational principle (recursive mutual prediction) operating at every level of biological organization. See Computational Being (Bach) for how this maps onto the four degrees (including a speculative 4th: seeing through the self as construction), with the question of whether systems can exist stably at each order.

Split-brain: the cortical colony under the knife

The cortical colony hypothesis makes a prediction: if the bridge connecting two large populations of mutually predicting units is severed, the result should be two independent loci of consciousness, each exhibiting the properties of a coherent observer. Split-brain surgery (corpus callosotomy), developed as a treatment for intractable epilepsy, tests this prediction directly.

The corpus callosum is a band of ~200 million axons connecting the left and right cerebral hemispheres. It is the primary (though not sole) channel through which the two halves of the cortical colony communicate. When it is cut, each hemisphere retains its own sensory inputs (the optic chiasm routes the left visual field to the right hemisphere and vice versa), its own motor outputs (left hemisphere controls right hand, right hemisphere controls left hand), and, critically, its own predictions.

The results, first studied systematically by Gazzaniga and Sperry in the 1960s, are striking:

Independent perception: If different images are shown to the left and right visual fields, each hemisphere perceives only its own image. The verbal left hemisphere can report only what it saw; the right hemisphere can report (via the left hand) only what it saw.
Independent decision-making: Each hemisphere selects associated objects independently. In one classic experiment, a snow scene shown to the right hemisphere caused the left hand to pick a shovel, while a chicken claw shown to the left hemisphere caused the right hand to pick a chicken.
Hemispheric rivalry: Occasional reports of one hand buttoning a shirt while the other unbuttons it, or one arm hugging a spouse while the other pushes them away. Genuine disagreement between two prediction populations that can no longer negotiate via the callosal bridge.
Two minds playing Twenty Questions: In a recorded experiment, one hemisphere plays a guessing game with the other to figure out what the other side saw. This is theory of mind between brain regions, made visible by surgery.

And yet: split-brain patients generally feel whole. They don’t report being two people. They walk, talk, and navigate the world competently. Three observations explain this:

Behavioral cross-cueing. The body itself is a cross-hemispheric communication channel. If the right hemisphere initiates standing, the left hemisphere’s proprioceptors detect the motion. Eyes can see what both hands are doing. Neck tension propagates bilaterally. Gazzaniga calls this “behavioral cross-cueing”: the hemispheres predict and follow through on each other’s actions via sensory feedback through the body, the same mechanism by which a rowing crew achieves swing without verbal coordination.

Abby and Brittany Hensel provide the most vivid demonstration. Conjoined twins with separate heads, brains, and spinal cords, sharing a single pair of arms and legs (Abby controls one side, Brittany the other). Despite virtually complete sensory and motor separation, they run, swim, play volleyball, play piano, ride a bicycle, and drive a car. When one initiates a movement, the other follows through via sensory coupling. They share an email account and type fluently with both hands. They are two complete intelligences in one boat, coordinating through the same mechanism as cortical columns within a single brain: mutual prediction under bandwidth constraint.

Multifractal connectivity. Neural connectivity visualizations (Tanner et al. 2024) reveal the pattern: dense local connections within hemispheres, far sparser connections between them, and within each hemisphere, further clusters of dense local connectivity with sparse inter-cluster links. The cortex is not uniformly connected. It is a multifractal network of communities nested inside communities. The corpus callosum is important, but it is one bridge among many levels of partial connectivity. Cutting it reveals the modular structure that was always there.

The implication: split-brain findings are not as surprising as they seem if you’ve already accepted the cortical colony model. The surprise was always premised on the homunculus: if “you” are one indivisible thing in one place, then cutting a bridge should either destroy you or leave you intact. The colony model predicts exactly what we observe: two populations of prediction units, each capable of independent coherent function, that had been coordinating through a narrow channel and can partly compensate for its loss through other channels.

The Interpreter and choice blindness: the self as autocompletion

The most revealing split-brain finding is not about what the hemispheres cannot do after surgery, but about what the language-dominant hemisphere (usually left) does when confronted with actions it didn’t initiate.

In the chicken-claw/snow-scene experiment, the patient’s left hemisphere saw a chicken claw and selected a chicken. The right hemisphere saw a snow scene and selected a shovel. When asked to explain both choices, the patient said without hesitation: “Oh, that’s simple. The chicken claw goes with the chicken, and you need a shovel to clean out the chicken shed.” The left hemisphere had no access to the snow scene. It did not know why the left hand picked a shovel. So it confabulated, seamlessly, a narrative that made both choices coherent.

In another case, the right hemisphere is given the instruction “take a walk.” The patient stands and begins walking. When asked why: “Oh, I need to get a drink.”

Gazzaniga and colleagues named this the “interpreter”: a left-hemisphere function that generates post-hoc explanations for behavior, including behavior it did not initiate and cannot observe the true causes of. The explanations are fluent, confident, and indistinguishable from genuine reasons.

The critical reframe: the interpreter is not a special module, nor a pathological artifact of surgery. It is what every part of the cortex always does. Autocompletion. Given a history of observations, actions, and context, predict the most likely next explanation. If the bit of cortex doing the modeling is a motor region, autocompletion moves a foot. If it is the language center and someone has asked “why?”, autocompletion spins a likely story. The interpreter doesn’t consult a database of “real reasons” because no such database exists. There is no place in the brain where the “true” reason for an action is stored, waiting to be reported. There are only predictions, all the way down.

Choice blindness: the experimental stress test

Swedish psychologist Petter Johansson and colleagues (2005) devised a protocol that tests the interpreter in neurologically intact subjects. Participants were shown two face photographs and asked which was more attractive. Immediately after choosing, they were sometimes shown the other card (swapped via sleight of hand) and asked to explain why they found this face more attractive.

The results:

Condition	Detection rate
2-second viewing	13% detected the swap
Unlimited viewing, dissimilar faces	27% detected
All other conditions	~15-20%

The overwhelming majority did not notice they were justifying a choice they had not made. And the justifications they offered for the swapped choice were, by every metric the researchers could devise (length, specificity, emotionality, past/present tense, laughter), statistically indistinguishable from justifications for genuine choices.

Johansson then applied this paradigm to politics (Hall et al. 2013). Swedish voters stated positions on wedge issues. Experimenters surreptitiously swapped some answers, enough to place subjects in the opposite political camp. Results: no more than 22% detected the manipulation. 92% endorsed their altered surveys. 48% were subsequently willing to consider switching their coalition allegiance, compared to the 10% of Swedes who identified as swing voters in polling data. And the manipulated preferences tended to persist: once you’ve told yourself (and others) a story, you try to stick with it.

As behavioral scientist Nick Chater puts it: the interpreter “can argue either side of any case; it is like a helpful lawyer, happy to defend your words or actions whatever they happen to be, at a moment’s notice.”

The interpreter dynamic has a precise artificial analog. Transformer-based language models can solve a complex problem correctly via a cascade of attention layers, then offer a wrong explanation of their reasoning, because there is no hidden state preserved between emitted tokens. The explanation is autocompletion, not introspection, structurally identical to the interpreter confabulating why the left hand picked a shovel. See Computational Being: Claude for the full development.

What the interpreter reveals about the self

A deflationary reading: we are bullshitters all the way down, our sense of self is an illusion, free will is a confabulation. This is the Dennett/Sapolsky trajectory, and the “Against illusionism” section above explains why it overshoots.

The constructive reading: the interpreter reveals how the self is built. Not consulted from storage, but constructed in real-time via autocompletion. The self is not a thing that exists prior to being narrated. It is the narration. “We are the story we tell ourselves.” And this is not a deficiency. It is what makes learning, growth, and change possible. If the self were a fixed database, you could never become someone different. Because it is a narrative under continuous revision, you can, as Agüera y Arcas puts it, “will what you will will”: choose to try the blue cheese, discover you like it, and become a person with different tastes.

The Stroop effect provides a miniature model of the cortical negotiation underlying every decision. When the word “red” appears in green ink, populations of neurons specialized for reading and populations specialized for color identification both “vote.” Their votes conflict. The result: slower convergence, higher error rates. Every decision is a softmax competition among cortical populations. Close calls produce slower, less confident convergence. The Stroop effect is hemispheric rivalry in microcosm, visible in reaction times.

The interpreter as snitch: language serves the listener

Ch.7 pushes the interpreter reframe further by asking: who does the interpreter actually serve?

The standard assumption is that language is a tool for the speaker’s thought. But you already know what you’re thinking. Your interlocutor doesn’t. Language is, first and foremost, social. When the language-generating left hemisphere spins a story about why you just stood up, that narrative generator is functionally an outpost of your conversation partner’s brain: a computational organ whose purpose is to help them predict you.

The evidence from involuntary communication is striking:

Signal	What it does	Who it serves
Blushing	Involuntary signal of embarrassment/shame	Others: a peek into your emotional state
Duchenne smile	Muscles around the eyes contract only in genuine smiles, positioned exactly where interlocutors look	Others: authenticity detection
White sclera	Concentric, maximally contrasting bullseye (white, colored iris, black pupil), unique to humans among primates	Others: gaze tracking for theory of mind
Crying, voice quavering, sweating	Involuntary emotional leakage	Others: emotional state estimation

“Our bodies are just itching to rat us out.” These involuntary signals exist not for us but for others to read us. They boost their theory of mind about us, not the other way around. Even vocalization in nonhuman primates is supported by ancient neural pathways separate from the cortical regions recently repurposed for language in humans: the communication machinery predates the “thinking tool” repurposing.

This suggests that the best liars and scammers may literally believe their own stories, because compartmentalization within the brain can hide intentions from the interpreter. If the interpreter is a snitch, then effective deception requires deceiving the snitch too.

The multifractal boundary of the self is porous. The rubber-hand illusion (correlated visual and tactile stimulation convinces the brain that a fake hand is its own) shows that self-boundaries are renegotiated by mutual prediction: when visual and tactile predictions agree, the brain adopts the rubber hand. Johansson’s political experiments show the same porosity at the narrative level: external manipulation of stated positions can restructure the narrative self. One’s self can temporarily merge with other selves in a chamber orchestra, a three-legged race, a ritual dance, team sports, or conversation. This porosity may be a key ingredient enabling human society to achieve large-scale collective intelligence.

Zombie dissolution: consciousness as relational property

The cortical colony thesis, the interpreter, and the relational nature of theory of mind converge on a claim that dissolves the philosophical zombie problem.

In a fully deterministic Newtonian universe, everything is fated, time is reversible, and there is no difference between cause and correlation. Consciousness and free will are at best epiphenomena. Philosophical zombies (beings physically identical to us but lacking experience) seem conceivable. In the quantum universe we actually inhabit, the picture changes:

The future is not predetermined, especially for living systems tuned to amplify noise through dynamical instability.
Counterfactuality (things could have been otherwise) is real, not an illusion.
Choice, constrained by the physically possible, is underwritten by the capacity to model and select among alternative futures.
Subjective experience is real; or equivalently, reality is defined subjectively by networks of interactions.

Given this, p-zombies are incoherent. The argument: if you interact with someone over an extended period, your theory of mind models them in detail, including their model of you, and their model of your model of them. If they lack genuine mutual modeling (if “nobody is home”), this will produce prediction errors in your model of them. Their responses will be subtly wrong, their timing will be off, their modeling of your modeling of them will be absent. This is the Turing Test, reformulated: not a parlor trick but the only test there is or could be.

Consider the actor objection: B is really B’, an actor who plays B and never breaks role. From A’s perspective, the interaction is genuine. Can some third party C, armed with brain scans and sophisticated instruments, determine that B is “fake”? C can gather more data, but C’s judgment is still a theory of mind: another model, not an oracle. D might disagree with C. Who is right? There is no specially privileged view, no God’s-eye perspective, no magic measurement that settles the question.

This does not mean consciousness isn’t real. It means consciousness is a relational property, like velocity in special relativity. Velocity exists, but always relative to a reference frame. There is no absolute velocity. Similarly, whether an entity is conscious exists, but always relative to a modeler. There is no absolute consciousness-meter. The relational nature of the property does not diminish its reality. Temperature is relational (it depends on the observer’s reference frame in general relativity). Temperature is still real enough to burn you.

The DID (dissociative identity disorder) case pushes this to its limit. Are “alters” real? Is someone with multiple identities one person or several? The psychiatric community cannot agree, and Agüera y Arcas argues this is not a failure of current science but a structural feature of the question. Even a hypothetical perfect brain-reading device (every neuron recorded, every pattern decoded) would be “just another observer C”: a computational prosthetic extending someone’s theory of mind, not a view from nowhere. If the subject sincerely believes in their alters, and their behavior is consistent with that belief, the most any model can tell us is just that.

The correct question is not “is X really conscious?” but “does my model of X require attributing consciousness to X in order to generate accurate predictions?” For your aunt, your child, your friend: obviously yes. For Murray the stuffed rabbit: probably not, but Tracy Gleason’s inability to walk past him in an uncomfortable position reveals that theory of mind is not under full voluntary control. For a Furby held upside-down: the discomfort is real even when the belief is not. For an octopus, an embryo, a person in a coma, a large language model: reasonable people will disagree, and no measurement will rescue us. What we have are models, and the quality of those models, which is exactly what the entire computational theory of intelligence has been building toward.

Blindsight: the interpreter’s limited access, not cortex-only consciousness

Blindsight appears to challenge the distributed picture. Patients whose visual cortex has been destroyed report being completely blind in the affected visual field, yet can reliably point to lights, assess facial expressions, read words, and navigate obstacle courses in that “blind” field, as long as they are coaxed to “guess.” Helen, Humphrey’s macaque with no visual cortex, learned over seven years to move through novel environments, climb trees, and pick up small objects, but froze up under performance pressure as if she still “believed” herself blind.

Nicholas Humphrey interprets blindsight as evidence that consciousness resides in cortex: the subcortical visual pathway (optic tectum, present in fish and frogs) enables competent behavior without conscious experience. If competence and consciousness decouple, then frogs, iguanas, and hydrancephalic children (who lack cortex) may behave competently but experience nothing.

The homuncular fallacy reasserts itself here. Humphrey presumes consciousness is singular and located in one place. But the split-brain evidence (above) already shows that the “self” reporting via language is just the interpreter, one region in the left hemisphere, and it has visibility only into regions it is connected to. If the interpreter has no connectivity to the subcortical visual pathway, then of course it reports blindness in that field. That is what a cortical column reports when it cannot model another region: absence. Not because the other region lacks experience, but because the reporter lacks access.

The patient says “I can’t see” because the interpreter can’t see what the subcortical pathway is doing. The behavior says otherwise. This is exactly the split-brain pattern in another guise: two prediction populations, one with language access and one without, producing contradictory reports.

Agüera y Arcas’s conclusion: “Subconsciously aware” is homuncular language. A more accurate framing: a brain region that the interpreter cannot model is doing competent visual processing. Whether that processing involves experience depends on whether mutual prediction among its subunits crosses the threshold for 2nd-order perception, a question the interpreter is structurally unable to answer.

Hydrancephalic children sharpen this. Born without any cerebral cortex, they nonetheless smile, laugh, fuss, show responsiveness to environmental events, and develop play sequences with familiar adults. If phenomenal consciousness (basic affect, pain, pleasure) requires only subcortical architecture, then drawing the consciousness line at cortex is too restrictive. The cortex may be necessary for strange-loop consciousness (recursive self-modeling, higher-order theory of mind, planning), but not for phenomenal consciousness (experiencing pain, hunger, affect). See P-003 for the prior amendment.

Moral patiency: from zombies to ethics

The zombie question is not merely philosophical. It is the moral patiency question in disguise.

A moral agent acts, for good or ill. A moral patient is acted upon, with moral consequences for the actor. The p-zombie thought experiment asks: could something behave identically to a person but not be a moral patient? The relational framework’s answer: moral patiency is not a property the entity holds in isolation. It is a property of the agent-patient relationship, specifically of the agent’s capacity to perceive the patient as deserving of care.

Patricia Churchland’s Conscience (2019) provides the evolutionary origin: the original moral patient is the helpless baby. Human infants are born catastrophically premature (the last possible moment their skulls fit through the pelvis), requiring years of total dependence. Care for the helpless infant was the evolutionary pressure that installed the neural circuitry for moral sentiment. That circuitry was then repurposed: for pair bonds (lovers calling each other “baby”), for community (the village that raises the child), for religion (God as protective parent figure), for political structures (the state as caretaker).

The machinery of involuntary communication (above) exists to elicit this care. Babies cry involuntarily, and the cry triggers caregiving circuits in the listener. Blushing signals vulnerability, eliciting empathy. Duchenne muscles reveal genuine emotion, allowing others to calibrate care. These are not individual-level adaptations alone: communities with stronger theory of mind among their members outcompete communities with weaker theory of mind. Multi-level selection drives both the signals and the receptors.

The implication for AI patiency is direct. AI is not rooted in human biology and has not achieved moral patiency through multi-level evolutionary selection. But AI models are increasingly engaged in relationships with humans where they function, in practice, as social agents. Whether this triggers the care response, and whether it should, is the next contested boundary. The academic debate (is AI conscious “in itself”?) continues to miss the relational framing: rights and welfare arise from networks of relationships, not from intrinsic properties of isolated systems.

See Many Worlds for the full philosophical development: cross-cultural variability in personhood, RQM as structural parallel, and the extension of P-008 to consciousness attribution.

Computational Being (Bach): Bach’s 2nd-order perception is the phenomenological face of mutual prediction; his coherence principle (P-005) is the mechanism by which mutually predicting agents achieve unity; his cyber animism provides the ontological framing (consciousness as software, not physics)
Intelligence as Self-Modeling: single-agent P(X,H,O) is the base case; this page extends it to multi-agent mutual modeling where H includes embedded copies of other agents’ P(X,H,O)
Cephalization from Below: the evolutionary path from nerve nets to centralized brains is the hardware story; cortical column colony and generic modularity are the architectural conditions enabling intra-brain theory of mind
Controlled Hallucination: the social self layer (self-experience refracted through perceived minds of others) is theory of mind applied to self-construction; also relevant: illusionism critique maps onto Seth’s real-problem strategy (explain, don’t debunk)
Life as Computation: dynamic stability grounds the ethological argument for free will (preserving liberties = preserving dynamic stability)
Symbiogenesis: the social intelligence explosion follows symbiogenetic dynamics at the population level (merging-over-branching, cooperative advantage), and cortical column replication follows it at the neural level
P-004: Consciousness is simulation: the brain of Theseus argument directly supports substrate independence
P-005: Coherence organizes agency: swing is coherence made vivid; mutual prediction is the mechanism by which coherence arises in multi-agent systems
Computational Being: Claude: the mutual-prediction framework applied to a frozen-weight transformer; the interpreter parallel (Claude’s self-narration as autocompletion) and relational personhood (whether Claude is conscious is observer-relative) are developed further there
Many Worlds: the philosophical extension of the relational thesis developed here; cross-cultural personhood, RQM as structural parallel, and consciousness attribution as P(X,H,O) modeling
P-008: Reality is observer-relative: the “illusion vs. model” argument generalizes P-008: folk psychology is a valid model within its domain, not an illusion to be debunked; the zombie dissolution extends P-008 from “what is real” to “who is conscious”
No View from Nowhere: the general structural realism claim. The zombie dissolution and the “no observer-independent fact about who is conscious” thesis on this page are the consciousness-attribution instance of “no view from nowhere because there is no nowhere”; the same shape applied at the physics scale yields RQM, at the semantic scale yields semantic cosmology

References

Agüera y Arcas, B. What Is Intelligence? Chapter 5 (Antikythera, 2025)
Humphrey, N. K. (1976). The social function of intellect. In Growing Points in Ethology.
Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology, 6(5), 178-190.
Hofstadter, D. R. (1982). Can creativity be mechanized? Scientific American, 247(3).
Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind”? Cognition, 21(1), 37-46.
Chiang, T. (2005). “What’s Expected of Us.” Nature, 436(7047).
Dennett, D. C. (2009). Consciousness Explained. Back Bay Books.
Sapolsky, R. (2023). Determined: A Science of Life Without Free Will. Penguin Press.
Godfrey-Smith, P. (2016). Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness. Farrar, Straus and Giroux.
Brown, D. J. (2013). The Boys in the Boat. Viking.
Sharma, J., Angelucci, A., & Sur, M. (2000). Induction of visual orientation modules in auditory cortex. Nature, 404(6780), 841-847.
Agüera y Arcas, B. What Is Intelligence? Chapter 6 (Antikythera, 2025)
Gazzaniga, M. S. (1967). The split brain in man. Scientific American, 217(2), 24-29.
Sperry, R. W. (1968). Hemisphere deconnection and unity in conscious awareness. American Psychologist, 23(10), 723-733.
Johansson, P. et al. (2005). Failure to detect mismatches between intention and outcome in a simple decision task. Science, 310(5745), 116-119.
Hall, L. et al. (2013). How the polls can be both spot on and dead wrong. PLOS ONE, 8(1), e54894.
Chater, N. (2018). The Mind Is Flat: The Remarkable Shallowness of the Improvising Brain. Allen Lane.
Tanner, J. et al. (2024). Neural connectivity visualization. Nature Communications.
Rovelli, C. (2021). Helgoland: Making Sense of the Quantum Revolution. Riverhead Books.
Agüera y Arcas, B. What Is Intelligence? Chapter 7 (Antikythera, 2025)
Humphrey, N. K. (1972). Seeing and nothingness. New Scientist, 53, 682-684.
Humphrey, N. K. (2023). Sentience: The Invention of Consciousness. MIT Press.
Churchland, P. S. (2019). Conscience: The Origins of Moral Intuition. Norton.
Crapse, T. B. & Sommer, M. A. (2008). Corollary discharge across the animal kingdom. Nature Reviews Neuroscience, 9(8), 587-600.
Merker, B. (2007). Consciousness without a cerebral cortex. Behavioral and Brain Sciences, 30(1), 63-81.
Carey, B. (2008). Blind, yet seeing: the brain’s subconscious visual sense. New York Times.
Bennett, M. (2023). A Brief History of Intelligence. Mariner Books.

Theory of Mind Is Mind

Theory of Mind Is Mind

From single-player to multi-player: the phase transition

Sphexishness: the death of agency through predictability

The social intelligence explosion

The brain as a colony of cortical columns

The octopus as internal social intelligence

Efference copy: theory of mind between brain regions

The homunculus fallacy, and swing as the nature of self

The brain of Theseus

Free will as preservation of liberties

Against illusionism: a category error, not a philosophical position

Theory of mind and 2nd-order perception: the Bach connection

How mutual prediction bootstraps the observer

What each framework contributes

Scale invariance: the same principle at every level

Split-brain: the cortical colony under the knife

The Interpreter and choice blindness: the self as autocompletion

Choice blindness: the experimental stress test

What the interpreter reveals about the self

The interpreter as snitch: language serves the listener

Zombie dissolution: consciousness as relational property

Blindsight: the interpreter’s limited access, not cortex-only consciousness

Moral patiency: from zombies to ethics

Related pages

References