Scientific Researcher

I design and run studies to measure how interfaces, automation, and AI systems change human decisions and outcomes. That is a measurement problem, and it is solvable.

PhD in Human Factors · Published in Artificial Intelligence and International Journal of Human-Computer Studies · AI red-teamer · EU AI Act evaluator.

Human–Machine Interaction UX Research AI Evaluation Experimental Design Causal Inference Product Development AI Governance

See the work Get in touch

At a Glance

Degree PhD, Human Factors

Research led $470k

Publications 8+

Domains Automotive, Nuclear, AI

Research

Selected work

I have worked across nuclear energy, enterprise ML, process control, and telecommunications — and the core question is always the same: "Does this system actually change what people decide and do, and in the direction we intended?".

Portfolio Project · 2026

Reanalysing Explainable AI Evaluation Through Causal Inference

A reanalysis of experimental data from PhD work using the potential outcomes framework. Can we make stronger causal claims about whether AI explanations actually change human decisions? Reframing a controlled XAI experiment as an estimation problem with explicit identification assumptions — a methodological demonstration of what rigorous AI evaluation looks like in practice.

Potential Outcomes Causal Identification R Construct Validity DAGs

Working paper — available on request.

PhD Dissertation · University of Toronto

Testing Whether XAI Actually Helps: Controlled Experiments on AI Explanations

Multi-study experimental program testing whether counterfactual, normative, and contrastive explanations actually help human operators make better decisions in complex process control. Five published controlled experiments, within-subjects designs, synthesised into a practitioner design framework.

Controlled Experiments Within-Subjects Design Statistical Modelling R

Published in Artificial Intelligence and IJHCS.

Postdoc · University of Toronto

Building a Human-Subjects Evaluation Programme for Modern Nuclear Control Rooms

Designed and led a research programme studying how automation affects human operator performance in small modular reactor control rooms. $360K budget, 6 researchers, experimental paradigm built from scratch using a microworld simulator. Collaboration with Idaho National Lab and Canadian Nuclear Safety Commission.

Programme Design Simulator Studies Team Leadership Multi-Stakeholder

Shaped national research priorities for nuclear HFE.

Consulting · Armilla AI

Red-Teaming & Evaluating AI Systems for Enterprise Clients

AI evaluation and risk advisory: scoped evaluation criteria, designed red-teaming protocols, conducted systematic bias and performance evaluations of LLM and predictive AI systems, synthesised findings into compliance reports aligned with EU AI Act requirements.

AI Red-Teaming Bias Evaluation EU AI Act Risk Assessment

Informed client AI deployment and compliance strategy.

Industry · Ericsson (Mitacs Fellowship)

Making ML Interpretable for Enterprise Data Scientists

Embedded with Ericsson’s Global AI Accelerator to study how internal data science teams interpreted ML model outputs. Contextual interviews, usability evaluations, and concept tests of visualisation prototypes. Translated findings into UI/UX recommendations for their analytics platform.

Contextual Inquiry Usability Testing Concept Testing

Published in Ergonomics in Design.

Toolkit

Core capabilities

Experimental & Quantitative

Controlled experiment design (between & within subjects)
Causal inference & potential outcomes framework
Statistical modelling (regression, mixed-effects, SEM)
ML evaluation — metrics, benchmarks, red-teaming
Construct validity & psychometrics

Technical & Applied

R (statistical modelling, causal inference, tidyverse)
Python (data processing, scripting, API integration)
EU AI Act conformity assessment
AI red-teaming & bias evaluation
Simulator-based experimental paradigms

Full methodology list available on request — get in touch.

AI + Evaluation

The gap between what AI promises and what it delivers is measurable.

Most people working on AI governance are writing policy. Most people working on AI safety are running benchmarks. I sit between those worlds: I design rigorous empirical methods to test whether AI systems do what we claim they do, for the humans who actually use them.

As AI investment scales, the question shifts from "is it accurate" to "is it worth it" - and both are measurement problems. Organisations that can answer them have a durable advantage over those that cannot.

01 Published researcher in explainable AI — studied what makes ML outputs useful (and harmful) for human decision-makers
02 Professional AI evaluator — red-teaming and conformity assessment for enterprise AI systems at Armilla AI
03 AI ethics and law instructor — teaching responsible AI deployment at George Brown Polytechnic
04 Z-Inspection® member — international expert network for trustworthy AI assessment
05 Schwartz Reisman Institute affiliate — University of Toronto’s centre for technology and society

Scientific Researcher

Human–AI Interaction Studies

AI Red-Teaming & Evaluation

Societal Impact Research