[01] · Selected work

Production agents,
shipped solo& in flight.

Four pieces of work that show how I think about agentic systems — from the giant end-to-end product down to the single building blocks that hold it up.

[01]The portfolio

Four projects.
One throughline: ship it, then prove it.

[01]Village Hale Technologies Inc.·2026 — ongoing

in flight

Village Hale — the AI village for families

My current focus. A passive, multi-agent household assistant for families across every stage of childhood, 0–18. Grounded local recommendations (‘the village’) and an approval-gated concierge that never acts on its own. Privacy is the product — newborn data redacted by construction, PIPEDA and Quebec Law 25 compliant by default. Empty repo to public production, solo.

Empty repo → prod

solo

Childhood span

0–18

Compliance

PIPEDA · Law 25

Agent autonomy

earned, gated

Multi-agent systems
Privacy-first design
Consumer AI

Read case study

[02]Settled (TripFix)·Sept 2025 — May 2026

shipped

TripFix — autonomous flight-claim co-pilot

An AI co-pilot for flight-delay refund claims. Reads boarding passes and airline emails, drafts the rebuttal letter, escalates only when uncertain. Built as a small team of specialised agents — not one monolithic prompt — so each piece is testable, swappable, and auditable on its own.

LLMs orchestrated

14+

Eval dimensions

Citation grounding

deterministic

Headcount in AI

Multi-agent systems
Eval harnesses
Vision reasoning

Read case study

[03]Settled (TripFix)·Apr — May 2026

shipped

Cursor Cloud Agent v1 — conversation timeline rebuild

A flight recorder for cloud agents. Stitches prompt, thinking, and tool calls into a single replayable timeline — so any agent run is auditable in under a minute. Design call: optimise for the operator first, not the model.

Stream types unified

Replay fidelity

100%

PRs merged solo

Agent observability
Tool-use traces

Read case study

[04]Settled (TripFix)·Feb — Apr 2026

in flight

Agentic preparation checklist

An agent that reads a case and figures out what’s missing. Instead of one giant ‘knows everything’ prompt, it loads short markdown skills on demand for the stage it’s in. Cheaper inference, sharper answers, knowledge anyone on the team can edit in a text file.

Skills authored

Tools wired

Snapshot evals

passing

Agent design
Skills-as-prompts

Read case study

[05]Settled (TripFix)·Nov 2025 — ongoing

shipped

LLM-as-judge evaluation framework

The quality bar for every AI change we ship. Five automatic judges grade each output on truth, sourcing, tone, completeness, and safety. New prompt scores worse than the live one — the deploy is blocked. The only reason daily prompt iteration is safe at production scale.

Evaluators

Daily judged samples

hundreds

Regressions caught pre-deploy

many

Evals
Production safety

Read case study

Four projects.One throughline: ship it, then prove it.

Village Hale — the AI village for families

TripFix — autonomous flight-claim co-pilot

Cursor Cloud Agent v1 — conversation timeline rebuild

Agentic preparation checklist

LLM-as-judge evaluation framework

Four projects.
One throughline: ship it, then prove it.