A
Back to work

[03] · Settled (TripFix) · Feb — Apr 2026

Agentic preparation checklist

An agent that reads a case and figures out what’s missing. Instead of one giant ‘knows everything’ prompt, it loads short markdown skills on demand for the stage it’s in. Cheaper inference, sharper answers, knowledge anyone on the team can edit in a text file.

Laravel AI SDKMarkdown skill loaderTool-use loopLangfuse eval

The problem

Customer cases miss key evidence at intake, and the team has to chase the same things over and over. We didn't want a smarter classifier — we wanted an agent that knew what 'good enough' looked like for each stage of the case.

[01]

Design call

Skills as markdown, not as more prompt.

The cheap path is a giant system prompt that lists every rule the agent needs to know. It works — until the prompt is 15,000 tokens and nobody on the team will touch it.

Instead, the checklist agent loads short markdown skills on demand: one for each case stage, one for each evidence category. Anyone on the team can open the file, change the rule, and watch the eval harness re-grade.

  • Skills live in version control as .md files
  • Agent loads only what’s relevant to the stage
  • Knowledge editable by non-engineers (ops, legal, founders)
  • Reviewable in PRs like any other code

[02]

Tool design

Real services as tools, not toy mocks.

The agent’s tools are thin wrappers around the same services the rest of the app uses — document classification, booking parsing, jurisdiction lookup. No mock-bench. The agent runs against production-grade tooling from day one.

Side effect: agent failures and human failures show up in the same dashboards. Telemetry is unified.

[03]

Loop

Tool-use loop, not chain-of-prompt.

The agent runs an explicit tool-use loop: think → pick a tool → read the result → decide if more work is needed → exit. Each turn is logged, and the loop has a hard turn limit so a misfire can never run forever.

The takeaway

Domain knowledge belongs in versioned markdown the team can read — not buried inside a 15,000-token system prompt that everyone is too scared to touch.

Next case study

Eval framework →