Scientific Researcher

I design and run studies to measure how interfaces, automation, and AI systems change human decisions and outcomes. That is a measurement problem, and it is solvable.

PhD in Human Factors · Published in Artificial Intelligence and International Journal of Human-Computer Studies · AI red-teamer · EU AI Act evaluator.

Human–Machine Interaction UX Research AI Evaluation Experimental Design Causal Inference Product Development AI Governance
At a Glance
Degree PhD, Human Factors
Research led $470k
Publications 8+
Domains Automotive, Nuclear, AI

Human–AI Interaction Studies

Controlled experiments measuring how AI systems affect human decisions and performance in real deployment contexts.

AI Red-Teaming & Evaluation

Systematic bias evaluation, adversarial testing, and conformity assessment of LLM and predictive AI systems.

Societal Impact Research

Measuring how AI deployment affects decision quality, trust, and outcomes at individual and organisational levels.


Research

Selected work

I have worked across nuclear energy, enterprise ML, process control, and telecommunications — and the core question is always the same: "Does this system actually change what people decide and do, and in the direction we intended?".
PhD Dissertation · University of Toronto

Testing Whether XAI Actually Helps: Controlled Experiments on AI Explanations

Multi-study experimental program testing whether counterfactual, normative, and contrastive explanations actually help human operators make better decisions in complex process control. Five published controlled experiments, within-subjects designs, synthesised into a practitioner design framework.

Controlled Experiments Within-Subjects Design Statistical Modelling R
Published in Artificial Intelligence and IJHCS.
Postdoc · University of Toronto

Building a Human-Subjects Evaluation Programme for Modern Nuclear Control Rooms

Designed and led a research programme studying how automation affects human operator performance in small modular reactor control rooms. $360K budget, 6 researchers, experimental paradigm built from scratch using a microworld simulator. Collaboration with Idaho National Lab and Canadian Nuclear Safety Commission.

Programme Design Simulator Studies Team Leadership Multi-Stakeholder
Shaped national research priorities for nuclear HFE.
Consulting · Armilla AI

Red-Teaming & Evaluating AI Systems for Enterprise Clients

AI evaluation and risk advisory: scoped evaluation criteria, designed red-teaming protocols, conducted systematic bias and performance evaluations of LLM and predictive AI systems, synthesised findings into compliance reports aligned with EU AI Act requirements.

AI Red-Teaming Bias Evaluation EU AI Act Risk Assessment
Informed client AI deployment and compliance strategy.
Industry · Ericsson (Mitacs Fellowship)

Making ML Interpretable for Enterprise Data Scientists

Embedded with Ericsson’s Global AI Accelerator to study how internal data science teams interpreted ML model outputs. Contextual interviews, usability evaluations, and concept tests of visualisation prototypes. Translated findings into UI/UX recommendations for their analytics platform.

Contextual Inquiry Usability Testing Concept Testing
Published in Ergonomics in Design.

Publications

Selected papers

Full list on Google Scholar. Get in touch if you would like access to a specific paper.

Toolkit

Core capabilities

Experimental & Quantitative

  • Controlled experiment design (between & within subjects)
  • Causal inference & potential outcomes framework
  • Statistical modelling (regression, mixed-effects, SEM)
  • ML evaluation — metrics, benchmarks, red-teaming
  • Construct validity & psychometrics

Technical & Applied

  • R (statistical modelling, causal inference, tidyverse)
  • Python (data processing, scripting, API integration)
  • EU AI Act conformity assessment
  • AI red-teaming & bias evaluation
  • Simulator-based experimental paradigms

Full methodology list available on request — get in touch.


The gap between what AI promises and what it delivers is measurable.

Most people working on AI governance are writing policy. Most people working on AI safety are running benchmarks. I sit between those worlds: I design rigorous empirical methods to test whether AI systems do what we claim they do, for the humans who actually use them.

As AI investment scales, the question shifts from "is it accurate" to "is it worth it" - and both are measurement problems. Organisations that can answer them have a durable advantage over those that cannot.

  • 01 Published researcher in explainable AI — studied what makes ML outputs useful (and harmful) for human decision-makers
  • 02 Professional AI evaluator — red-teaming and conformity assessment for enterprise AI systems at Armilla AI
  • 03 AI ethics and law instructor — teaching responsible AI deployment at George Brown Polytechnic
  • 04 Z-Inspection® member — international expert network for trustworthy AI assessment
  • 05 Schwartz Reisman Institute affiliate — University of Toronto’s centre for technology and society

Contact

Get in touch

I am happy to hear from you if you want to discuss a research collaboration, a role, or a question about my work.

Toronto, ON