Will Mayner

AI interpretability·consciousness·neuroscience

Will Mayner

I’m a researcher at the Center for Sleep and Consciousness at the University of Wisconsin–Madison, where I work on integrated information theory with Giulio Tononi.

I’m now focusing on mechanistic interpretability, AI safety, AI consciousness, and model welfare.

Some related work from our group: this 2019 conference paper and this recent preprint.

Selected work

A reproduction of Sarfati et al.’s “The Shape of Beliefs”

How LLMs encode in-context beliefs as curved manifolds, and how manifold-aware steering changes them with fewer side effects than linear steering.

BlueDot Technical AI Safety Project

Decomposed concept-injection introspection (Gemma-3 12B, Qwen-2.5 32B) into separable components: representation (what the model encodes about an injection) and report (the prompt-dependent late-layer circuitry that surfaces it), explaining apparent conflicts across prior protocols.

PyPhi

2014 – present

A Python library to calculate integrated information, the quantity that integrated information theory (IIT) identifies with consciousness

View on GitHub →

Integrated Information Theory: Theoretical Developments & Empirical Applications

I led a collaboration with the Allen Institute through their OpenScope program, which provides high-throughput two-photon calcium imaging in mice.

We systematically surveyed neurophysiological differentiation—a measure of the extent to which a population of neurons expresses a rich and varied repertoire of states, inspired by prior work on IIT—across layers and areas of mouse visual cortex in response to naturalistic movies and phase-scrambled controls.

The key finding was that naturalistic stimuli evoke more differentiated neural activity than scrambled stimuli, but only in specific populations: layer 2/3 excitatory neurons in the anterolateral and anteromedial visual areas. This effect was robustly modulated by arousal state: effect sizes showed strong correlations with locomotion and pupil diameter, suggesting that the difference in differentiation is more pronounced when animals are engaged.

Differentiation analysis represents an “inside-out” approach to neural activity, in the sense articulated by György Buzsáki (2019): rather than characterizing responses in terms of externally defined stimulus variables, it quantifies the intrinsic diversity of population dynamics. This contrasts with traditional “outside-in” methods such as decoding, which showed uniformly high performance across layers and areas and did not distinguish the specific populations highlighted by neurophysiological differentiation. In line with Romain Brette’s (2019) critique of the neural coding metaphor, these results underscore that decoding accuracy—defined relative to an experimenter’s variables and ideal observer assumptions—does not by itself establish functional or perceptual relevance. By characterizing population activity on its own dynamical terms, differentiation sidesteps these assumptions and may reveal which neural populations are signatures of functionally relevant dynamics—offering a new lens for understanding how neural activity relates to perception.

View on GitHub →

PyEMD

2014 – 2025

A Python wrapper of a C++ implementation of the Earth Mover's Distance metric (Wasserstein metric), adopted by several ML libraries to compare probability distributions. Now largely superseded by POT (Python Optimal Transport).

View on GitHub →