Will Mayner

Belief manifolds, and how to steer along them

May 2026

A reproduction of Sarfati et al.’s “The Shape of Beliefs”

How LLMs encode in-context beliefs as curved manifolds, and how manifold-aware steering changes them with fewer side effects than linear steering.

BlueDot Technical AI Safety Project

Decomposing introspection in LLMs: representation and report

May 2026

Decomposed concept-injection introspection (Gemma-3 12B, Qwen-2.5 32B) into separable components: representation (what the model encodes about an injection) and report (the prompt-dependent late-layer circuitry that surfaces it), explaining apparent conflicts across prior protocols.

PyPhi ↗

2014 – present

A Python library to calculate integrated information, the quantity that integrated information theory (IIT) identifies with consciousness

View on GitHub →

PhD thesis ↗

2023

Integrated Information Theory: Theoretical Developments & Empirical Applications

Retirement Center for Conscious Organoids ↗

2022

Second Prize, 5th annual Morgridge Ethics Cartooning Competition. A cartoon on the ethics of lab-grown cerebral organoids.

Differentiation analysis in mouse visual cortex ↗

2022

I led a collaboration with the Allen Institute through their OpenScope program, which provides high-throughput two-photon calcium imaging in mice.

We systematically surveyed neurophysiological differentiation—a measure of the extent to which a population of neurons expresses a rich and varied repertoire of states, inspired by prior work on IIT—across layers and areas of mouse visual cortex in response to naturalistic movies and phase-scrambled controls.

The key finding was that naturalistic stimuli evoke more differentiated neural activity than scrambled stimuli, but only in specific populations: layer 2/3 excitatory neurons in the anterolateral and anteromedial visual areas. This effect was robustly modulated by arousal state: effect sizes showed strong correlations with locomotion and pupil diameter, suggesting that the difference in differentiation is more pronounced when animals are engaged.

Differentiation analysis represents an “inside-out” approach to neural activity, in the sense articulated by György Buzsáki (2019): rather than characterizing responses in terms of externally defined stimulus variables, it quantifies the intrinsic diversity of population dynamics. This contrasts with traditional “outside-in” methods such as decoding, which showed uniformly high performance across layers and areas and did not distinguish the specific populations highlighted by neurophysiological differentiation. In line with Romain Brette’s (2019) critique of the neural coding metaphor, these results underscore that decoding accuracy—defined relative to an experimenter’s variables and ideal observer assumptions—does not by itself establish functional or perceptual relevance. By characterizing population activity on its own dynamical terms, differentiation sidesteps these assumptions and may reveal which neural populations are signatures of functionally relevant dynamics—offering a new lens for understanding how neural activity relates to perception.

View on GitHub →

PyEMD

2014 – 2025

A Python wrapper of a C++ implementation of the Earth Mover's Distance metric (Wasserstein metric), adopted by several ML libraries to compare probability distributions. Now largely superseded by POT (Python Optimal Transport).

View on GitHub →

Selected work

Belief manifolds, and how to steer along them

Decomposing introspection in LLMs: representation and report

PyPhi ↗

PhD thesis ↗

Retirement Center for Conscious Organoids ↗

Differentiation analysis in mouse visual cortex ↗

PyEMD