Live evidence
Every AgentTrust claim with an independently checkable artifact — Kani proofs, devnet tx signatures, localnet tests, cron jobs, hosted endpoints. One page.
Every claim on this page is checkable from a terminal in under five minutes. No login. No clone (unless you want to re-run the proofs locally). Each row links to the artifact directly.
Captured 2026-05-08 against main post-Phase Q1.
Three programs deployed on devnet
| Program | Address | Explorer | IDL |
|---|---|---|---|
policy_vault | 8Y6f…QTR | Open | published |
trustgate | HF8z…ih2N | Open | published |
validation_registry | Cx4R…Khtv | Open | published |
Verify executable status:
for p in 8Y6fGeNEHgmWmbt8JsRcF72jxbeBfJhomMjG6SuoJQTR \
HF8zHfoyA7b5mhLViopTnRMprc6ZT5KActHTdkFrih2N \
Cx4RFa6ysw3qXYhugPkF8pFSWBkmKq59h2dWgF2tKhtv; do
solana program show "$p" --url devnet | grep Executable
doneSix Kani proofs
PolicyVault safety properties are machine-checked by Kani in CI — 635 sub-checks, zero failures.
paused_implies_no_allowvelocity_counter_le_limitcounterparty_tier_monotonevalidation_expiry_correctmultisig_threshold_enforcedgate_payment_strict_correctness
| # | Invariant | Sub-checks | Time |
|---|---|---|---|
| 1 | paused_implies_no_allow | 126 | 0.25 s |
| 2 | velocity_counter_le_limit | 9 | 0.03 s |
| 3 | counterparty_tier_monotone | 8 | 0.02 s |
| 4 | validation_expiry_correct | 85 | 0.23 s |
| 5 | multisig_threshold_enforced | 149 | 77.17 s |
| 6 | gate_payment_strict_correctness (J5) | 258 (131 + 127) | ~0.9 s |
Total: 635 sub-checks, 6/6 proven, ~80 s on a single CI runner. Workflow: .github/workflows/kani-prove.yml. Per-harness deep-dive: Kani proofs.
Reproduce:
git clone https://github.com/agenttrust-labs/agenttrust && cd agenttrust
cargo install --locked kani-verifier
cargo kani --manifest-path programs/policy-vault/Cargo.toml \
--harness paused_killswitch_implies_no_allowPay.sh + AgentTrust atomic settlement (live devnet)
A real signed SPL transfer flowed through demo.agenttrust.tech/protected, completed gate_payment + transfer + emit_feedback atomically, and wrote a FeedbackEmissionLog PDA on chain (2026-05-06).
| Step | Tx |
|---|---|
emit_feedback PDA-signed CPI → Quantu give_feedback → update_stats | jMobmWJUAXuL8… |
signed SPL transferChecked (the proof Pay.sh's CLI would produce) | 5iV8EYmJh9XS… |
Resulting on-chain artefacts:
| Artefact | Address |
|---|---|
FeedbackEmissionLog PDA (owned by trustgate, score=100, slot 460466788) | HB4BBi9j… |
Tier-3 agent_account | 5Pfa…K8y |
Tier-3 atom_stats | 4z9R…f5mv |
| Asset (Metaplex Core) | C6cu…Vi8B |
| Facilitator authority PDA | 4TWq…Z5Bg |
Reproduce: see Devnet smoke.
Chained validation — third ERC-8004 leg, full lifecycle (live devnet)
Four signatures prove the third leg actually closes the loop. Subject = the same Quantu tier-3 agent the Pay.sh trace used; capability = usdc-payment-policy.v1 (2026-05-06).
| Step | Tx |
|---|---|
gate_payment (no attestation) → RequireValidation | 3oKW7QugBLJ7… |
request_validation | 2KbXYCF67D2f… |
respond_to_validation (creates attestation PDA 8YKq…xt2q) | 67CzMS9GEt… |
gate_payment (with attestation) → Allow | dEXkCEeSn8… |
ValidationAttestation PDA: 8YKq…xt2q. Reproduce + full trace: Chained validation.
Ten capability namespaces seeded (live devnet)
Every PDA exists on chain. Auto-generated lookup with per-namespace Explorer URLs: Reference → Capability namespaces. Every name decomposes to MAX_NAME_LEN = 32 per the on-chain bound; the playbook-level descriptive labels live in the schema-URI document.
Reproduce: pnpm --filter ./examples/attestor-demo run seed:namespaces is idempotent.
Hosted services (Fly.io)
| Service | URL | Healthz |
|---|---|---|
| Demo | demo.agenttrust.tech | GET /health |
Facilitator (/verify + /settle) | api.agenttrust.tech | GET /healthz |
| MCP HTTP endpoint | mcp.agenttrust.tech | GET /healthz (returns version: "0.4.5", toolCount: 21) |
Singapore region, shared-cpu-1x@256MB, two machines for HA, min_machines_running = 1. Lets-Encrypt certs on every subdomain.
Atomic-tx invariant — three layers + on-chain anchor
| Layer | What it proves | Evidence |
|---|---|---|
| Compile-time literal-type guard | AtomicityEnforced is { atomicityEnforced: true } (literal true). TS rejects false, missing, or boolean widening. | trustgate/sdk/src/atomicity.ts lines 31-33 |
| Runtime guard | assertAtomicityEnforced throws AtomicityNotEnforcedError for any value that isn't strictly === true. Catches as any casts. | trustgate/sdk/src/atomicity.ts lines 51-58 |
| Composer structure | composeAtomicSettleTx returns one Transaction with exactly three instructions in canonical order. | trustgate/sdk/test/atomicity.test.ts describe("composeAtomicSettleTx") (5 cases) |
| Single-tx atomic revert (localnet) | Bundled tx with failing inner ix reverts the whole bundle; gate_payment_strict rolls back. | tests/atomic-tx-invariant.spec.ts describe("B. single-tx atomic revert") |
| Split-tx corruption (localnet) | Splitting the bundle leaves VelocityLedger dirty after the transfer reverts — the fault model A+B prevents. | tests/atomic-tx-invariant.spec.ts describe("C. split-tx anti-pattern corrupts state") |
| On-chain anchor | gate_payment_strict_correctness Kani proof binds strict Ok(()) to composer Allow. | programs/policy-vault/src/proofs/inv_gate_payment_strict_correctness.rs |
Full deep-dive: Atomic-tx invariant. Background writeup: docs/proofs/transfer-hook-atomicity.md.
Adversarial harness — fourteen hostile-scenario assertions
tests/adversarial.spec.ts exercises the gate against malformed PDAs, replayed proofs, corrupted state, schema-version drift, expired attestations, off-curve attestor pubkeys, and more. Per-scenario detail: Adversarial harness.
Test coverage
| Layer | Count | Where |
|---|---|---|
| Rust unit tests | 113 | cargo test --workspace --lib |
| Kani formal proofs | 6 / 635 sub-checks | cargo kani per proof harness |
| Anchor TS end-to-end | 50 | anchor test --provider.cluster devnet |
| Adversarial harness | 14 | tests/adversarial.spec.ts |
| SDK unit tests | 56 (+ 16 INTEGRATION-gated) | cd trustgate/sdk && pnpm test |
| Server adapter tests | 146 | cd trustgate/server && pnpm test |
| MCP unit tests | 76 (+ 3 INTEGRATION + 21 protocol-conformance) | cd mcp && pnpm test |
| pay-sh-demo flow | 7 (+ 1 INTEGRATION) | cd examples/pay-sh-demo && pnpm test |
| attestor-demo lifecycle | 6 (INTEGRATION-gated) | cd examples/attestor-demo && pnpm test |
| Total | ~360 tests + 6 formal proofs + 14 adversarial scenarios | All green on main |
CI workflows (15 total)
| Workflow | Purpose |
|---|---|
kani-prove.yml | All 6 Kani proofs on every PR |
kani-budget.yml | Total runtime ceiling for Kani |
anchor-test.yml | Full Anchor end-to-end with Quantu mainnet clones |
ts-test.yml | SDK + server + web + demo + MCP build/test |
adapter-contract-conformance.yml | Adapter-contract walk |
mcp-protocol-conformance.yml | 21 MCP protocol assertions |
bundle-size.yml | SDK npm pack-size budget |
daily-devnet-smoke.yml | Cron — full devnet round-trip every 24h |
devnet-integration.yml | Gated on DEVNET_FACILITATOR_KEYPAIR secret |
idl-diff.yml | ABI-stability gate vs deployed IDL |
link-check.yml | README + MDX dead-link sweep |
lint-and-format.yml | tsc + cargo fmt + clippy |
lockfile-freshness.yml | pnpm + Cargo.lock drift |
secret-scan.yml | gitleaks against the diff |
build.yml | cargo + anchor build |
Latest run summary: docs/proofs/ci-runs.md.
Phase reports
| Phase | Date | Report |
|---|---|---|
| F (verification) | 2026-05-06 | phase-f-verification-report.md |
| G | 2026-05-07 | phase-g-report.md |
| H (hosting + npm) | 2026-05-07 | phase-h-report.md |
| J (Kani strict-correctness) | 2026-05-07 | phase-j-report.md |
| M (MCP comprehensive E2E) | 2026-05-07 | phase-m-mcp-e2e.md |
| P (real-LLM tool routing) | 2026-05-08 | phase-p-llm-routing.md |
| Atomic-tx invariant proof | 2026-05-07 | transfer-hook-atomicity.md |
Read next
Verification
Every load-bearing claim in AgentTrust has an independently checkable artifact — a Kani proof, a devnet tx signature, a localnet test, a cron job. This section indexes them.
Kani proofs
Six machine-checked invariants over the pure-Rust composer — 635 sub-checks, zero failures, ~80 s on a single CI runner. Per-harness deep-dive.