Dillon Plunkett

I'm Dillon Plunkett, a cognitive (neuro)scientist studying human and AI minds.

Research

I'm interested in complex thought, conscious awareness, and the relationship between the two. What kinds of thoughts are we capable of thinking and how does the human brain encode them? Which mental processes do we consciously experience and what determines the character of those experiences?

I'm also interested in artificial intelligence systems. I have all of the same questions about them. I also believe they will likely be either enormously beneficial or catastrophically harmful for life on Earth. Accordingly, my current research is focused on understanding and steering powerful AI systems.

Currently, I am an Anthropic Fellow researching model welfare with Kyle Fish. Previously, I worked with Jorge Morales in the Subjectivity Lab, where my research focused on the ability of AI systems to report on their own internal processes and on how the human mind represents and predicts changes. Before that, I did my PhD research in Joshua Greene's lab at Harvard. And before that, I did research in experimental epistemology, causal inference, and metareasoning with Tania Lombrozo and Tom Griffiths while working in the Concepts and Cognition and Computational Cognitive Science labs at UC Berkeley.

As an undergraduate, I studied philosophy and psychology at Harvard. My thesis work focused on another topic I find fascinating: the rational and moral significance of personal identity. Precisely what makes some future person me and why should I care more about that person than other people?

Publications

Full CV (pdf)

Plunkett, D., Morris, A., Reddy, K., & Morales, J. (under review; presented at ASSC 2025). Self-interpretability: LLMs can describe complex internal processes that drive their decisions.
[code and data]
Plunkett, D. & Morales, J. (accepted). Representational momentum transcends motion. Psychological Science.
[code and data] [VSS talk]
Plunkett, D. & Greene, J. D. (in prep). Evidence for crossmodal translation of complex ideas in left lateral posterior temporal cortex.
[conference poster]
Plunkett, D., Frankland, S. M., & Greene, J. D. (in revision). Neural representation of compositional ideas with spatial structure.
Bernhard, R. M., Frankland, S. M., Plunkett, D., Sievers, B., & Greene, J. D. (2023). Evidence for Spinozan “unbelieving” in the right inferior prefrontal cortex. Journal of Cognitive Neuroscience. https://doi.org/10.1162/jocn_a_01964
Plunkett, D. & Greene, J. D. (2019). Overlooked evidence and a misunderstanding of what trolley dilemmas do best: Commentary on Bostyn, Sevenhant, & Roets (2018). Psychological Science, 30, 1389-1391. https://doi.org/10.1177/0956797619827914
[code]
Plunkett, D., Lombrozo, T., & Buchak, L. (2019). When and why people think beliefs are “debunked” by scientific explanations of their origins. Mind & Language, 35, 3-28. https://doi.org/10.1111/mila.12238
[code and data] [conference paper]
Wilkenfeld, D. A., Plunkett, D., & Lombrozo, T. (2018). Folk attributions of understanding: Is there a role for epistemic luck? Episteme, 15, 24-49. https://doi.org/10.1017/epi.2016.38
[conference poster]
Wilkenfeld, D. A., Plunkett, D., & Lombrozo, T. (2016). Depth and deference: When and why we attribute understanding. Philosophical Studies, 173, 373-393. https://doi.org/10.1007/s11098-015-0497-y
Buchsbaum, D., Griffiths, T. L., Plunkett, D., Gopnik, A., & Baldwin, D. (2015). Inferring action structure and causal relationships in continuous sequences of human action. Cognitive Psychology, 76, 30-77. https://doi.org/10.1016/j.cogpsych.2014.10.001
Lieder, F., Plunkett, D., Hamrick, J. B., Russell, S. J., Hay, N. J., & Griffiths, T. L. (2014). Algorithm selection by rational metareasoning as a model of human strategy selection. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 2870-2878). Red Hook, NY: Curran Associates, Inc.