I design and run studies to measure how interfaces, automation, and AI systems change human decisions and outcomes. That is a measurement problem, and it is solvable.
PhD in Human Factors · Published in Artificial Intelligence and International Journal of Human-Computer Studies · AI red-teamer · EU AI Act evaluator.
Controlled experiments measuring how AI systems affect human decisions and performance in real deployment contexts.
Systematic bias evaluation, adversarial testing, and conformity assessment of LLM and predictive AI systems.
Measuring how AI deployment affects decision quality, trust, and outcomes at individual and organisational levels.
A reanalysis of experimental data from PhD work using the potential outcomes framework. Can we make stronger causal claims about whether AI explanations actually change human decisions? Reframing a controlled XAI experiment as an estimation problem with explicit identification assumptions — a methodological demonstration of what rigorous AI evaluation looks like in practice.
Multi-study experimental program testing whether counterfactual, normative, and contrastive explanations actually help human operators make better decisions in complex process control. Five published controlled experiments, within-subjects designs, synthesised into a practitioner design framework.
Designed and led a research programme studying how automation affects human operator performance in small modular reactor control rooms. $360K budget, 6 researchers, experimental paradigm built from scratch using a microworld simulator. Collaboration with Idaho National Lab and Canadian Nuclear Safety Commission.
AI evaluation and risk advisory: scoped evaluation criteria, designed red-teaming protocols, conducted systematic bias and performance evaluations of LLM and predictive AI systems, synthesised findings into compliance reports aligned with EU AI Act requirements.
Embedded with Ericsson’s Global AI Accelerator to study how internal data science teams interpreted ML model outputs. Contextual interviews, usability evaluations, and concept tests of visualisation prototypes. Translated findings into UI/UX recommendations for their analytics platform.
Full methodology list available on request — get in touch.
Most people working on AI governance are writing policy. Most people working on AI safety are running benchmarks. I sit between those worlds: I design rigorous empirical methods to test whether AI systems do what we claim they do, for the humans who actually use them.
As AI investment scales, the question shifts from "is it accurate" to "is it worth it" - and both are measurement problems. Organisations that can answer them have a durable advantage over those that cannot.
I am happy to hear from you if you want to discuss a research collaboration, a role, or a question about my work.
Toronto, ON