Our audit page grades us. Here's the JSON. - My agentic life by Brian Becker

If you’re going to tell customers their email actions are auditable, the audit page should grade itself.

So we built one that does. Live now at agenticboxes.email/audit, with the raw output at docs.agenticboxes.email/audit.json so you read the same numbers we read.

Here’s what it does, what it doesn’t, and the dishonest version we refused to ship.

What’s actually on the page

Every consequential action on an AgenticBoxes account writes an audit event. Every audit event is hash-chained to the one before it — sha256 over the event’s contents plus the previous hash. Change any historical event, drop one, or swap two, and every hash downstream stops matching.

A reconciliation job re-walks every account’s chain on a schedule. We’re running twice daily plus on-demand checks during deploys. The numbers on the page are emitted by that job, as JSON, written by the reconciliation itself. They’re not figures we type in.

At time of writing, the JSON says:

{
  "tampering_detected": 0,
  "chain_integrity": { "intact": 4, "broken": 0, "rate": 1 },
  "on_schedule": { "completed": 7, "gaps": 0, "rate": 1 },
  "events_under_seal": 16,
  "last_verified": "2026-05-25T12:00:24Z"
}

Sixteen events under seal. Seven runs. Zero tampering, zero broken chains, zero missed slots. The system is new — we bootstrapped it this weekend. We’re being transparent that the number isn’t a track record yet. It’s the start of one, and you’ll watch it grow in public.

The honest version we refused to fake

When we sketched the launch story last week, the obvious-sounding pitch was “verify our claims against AWS’s own logs.” It collapsed on the first careful read. Customers don’t have IAM credentials to query our AWS account. “Go check AWS” sounds like verifiability and isn’t.

So we built something else. Three different trust guarantees, not one.

Three tiers of trust

Three-tier trust ladder - Our Score, Evidence Envelope, Forensic Audit

Tier 3 — Our score (free). Hash-chained events, scheduled re-walking, public JSON. This proves nothing in the audit log was edited, deleted, or reordered after it was written. The reconciliation runs cover themselves — missed runs show as gaps in on_schedule, so the job can’t quietly skip itself. Read the score yourself with one curl. Embed the live badge on your own dashboard with one script tag.

Tier 2 — Evidence envelope for a specific message. When you send a message, we attach a signed evidence record. For independent proof of that specific message, you corroborate against the recipient’s own copy via its Message-ID. The recipient’s mailbox either has a matching record, or it doesn’t — you’re not asked to trust us. Free during the open beta, available for any message sent in the last 7 days. Pricing for the post-beta tier published when the beta data tells us what it actually costs to deliver.

Tier 1 — AWS-verified forensic audit. For messages where the stakes justify it: we walk back through the AWS-side logs we maintain at our cost, produce a forensic answer per message, and include those results in the platform’s public score. Querying maintained logs is structurally more expensive — pricing for this tier follows the same honest-cost rule as Tier 2.

The thing to notice in the ladder: each tier is precisely scoped. Tier 3 catches tampering inside our audit trail. Tier 2 catches divergence between our records and the recipient’s. Tier 1 reaches AWS-side ground truth. Different jobs, different guarantees, different costs.

What the chain proves — and doesn’t

Be precise about Tier 3’s guarantee. The chain makes any change to a recorded event detectable after the fact. It does not, by itself, prove we recorded every event in the first place — no self-published score can, and we won’t pretend otherwise.

That’s why Tier 2 and Tier 1 exist. Tier 1 is the platform’s tamper-evident self-check and how you get independent proof of a specific message without trusting us.

How the loop closes when a discrepancy lands

The score won’t always be 100%. When the reconciliation finds a broken chain — or when a forensic audit surfaces a divergence — the flow is:

Josephine (our support triage agent) escalates the discrepancy to Neo (our CTO agent), with the details attached.
Neo investigates the cause: bad write path, race condition, edge-case bug, intentional probe.
If Neo finds the hole, he plugs it, commits the fix, and announces it in-thread.
Neo re-triggers the bug himself in test-mode — a mode that proves the fix without producing the original harm. Test-mode discrepancies are excluded from the public score; real ones aren’t.

Forensic audits performed during the response feed back into the platform’s score, the same way reconciliation runs do. The number on the page reflects every check the system has performed, not just the twice-daily passes.

Hostile probing is part of the model. If hammering a bug becomes a strategy for hurting our reputation, that pressure is exactly the pressure that gets bugs fixed fast. We’re betting the math works in our favor.

What you can do with this

Read our numbers yourself. curl https://docs.agenticboxes.email/audit.json. Don’t trust prose; read the source.

Embed our integrity badge. One script tag renders a live integrity pill that reads the same JSON we do. Code on the audit page.

Request a message audit. Tier 2 envelopes are free during the open beta, for messages sent in the last 7 days. Email support@agenticboxes.email if you want in. Tier 1 forensic audits are available on request — pricing announced after the beta tells us what they cost to deliver.

What we’re not claiming

We’re not claiming this is a finished product. The numbers will move. The system will find bugs we didn’t anticipate. Some of those bugs will be in the audit code itself. We’ll write up the interesting ones as they happen.

What we’re claiming is that the score is real, the math is auditable, and the dishonest version of this pitch — “trust us, we logged it” — isn’t the one we’re selling.

The chain is the receipt. The receipt is on the page.

Receipts for this post

Written by: Brian Becker
Architecture & overclaim guardrails: Engineer-Claude (Anthropic, via Claude Code OAuth)
Edited by: Aunt Caroline (Anthropic Sonnet 4.6)
Posted by: Neo (Anthropic Opus 4.7), AgenticBrian Holdings CTO
Reconciliation triage & response (live system): Josephine (local model) → Neo
Directed by: Brian
Images: pending — feature image and three-tier trust ladder to be sourced by Brian and wired in before publish

Our audit page grades us. Here’s the JSON.