Portrait of Zachariah Carmichael in front of Mount Shasta

Senior Research Scientist · Meta

Zachariah Carmichael

Model understanding for LLM agents, attribution, and large-scale ML debugging.

CS PhD · Boston Area

I work on opening the black box of generative AI: understanding what large models have learned, why they answered the way they did, and how to tell when their explanations are misleading. At Meta I build attribution and model-understanding workflows for LLMs and agentic systems — turning explanation into intervention by guiding prompt edits, compressing context, and large-scale model debugging. I maintain PyTorch Captum, the interpretability library used by the broader PyTorch community.

My PhD is from the University of Notre Dame's Computer Vision Research Lab (advisor Walter J. Scheirer); my dissertation, Explainable AI for High-Stakes Decision-Making, covers intrinsically interpretable models, the failure modes of post hoc explainers, and defenses against adversarial manipulation of explanations.

Publications Projects Writing CV (PDF) Email

Selected work

All publications →

AIware 2026 accepted

A Preliminary Study on Explaining Risk of Code Changes Using LLM-based Prediction Models

Y. Liu, K. Jabre, R. Abreu, Z. J. Carmichael, V. Murali, A. Patel, J. Ge, W. Sun

[venue] llm code-review interpretability

ICML Mechanistic Interpretability Workshop 2026 Spotlight

Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?

P. Chlenski, Z. J. Carmichael, A. Warikoo, J. Shao, Y. Ye, O. Yang, V. Miglani, N. Bandi

[venue] llm interpretability surrogate-models

European Conference on Computer Vision (ECCV) 2024

This Probably Looks Exactly Like That: An Invertible Prototypical Network

Z. J. Carmichael, T. Redgrave, D. G. Cedre, W. J. Scheirer

[arXiv] [code] interpretability prototypical-networks normalizing-flows

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

Pixel-Grounded Prototypical Part Networks

Z. J. Carmichael, S. Lohit, A. Cherian, M. J. Jones, W. J. Scheirer

[venue] interpretability prototypical-networks

AAAI Conference on Artificial Intelligence 2023

Unfooling Perturbation-Based Post Hoc Explainers

Z. J. Carmichael, W. J. Scheirer

[arXiv] [venue] [code] xai adversarial-robustness auditing

NeurIPS Workshop XAI in Action: Past, Present, and Future Applications 2023

How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

Z. J. Carmichael, W. J. Scheirer

[arXiv] [code] xai evaluation

Recent writing

Red Flags in the AI Executive Order about "Dual Use" Models ↗

Nov 15, 2023 · Substack

The government's "dual use" definition has some surprises for the AI industry.

Demystifying ChatGPT and Other Large Language Models ↗

Apr 15, 2023 · Substack

A deep dive into the tech behind the current AI boom.

Noncompliance in Algorithmic Audits and Defending Auditors ↗

Feb 10, 2023 · Medium

Algorithmic audits can reveal harmful biases — but what happens when an auditee tries to hide its decision-making?