Spec — Indexer Postgres Schema

Owner: solana-indexer-engineer Depends on: 03 (AgentRegistry), 04 (TreasuryStandard), 06 (ProofVerifier), 07 (TaskMarket), 08 (IACP), reputation-graph.md, retro-airdrop.md Blocks: audit prep (docs/audit/ottersec-m1.md), Discovery API (specs/discovery-api.md), portal /analytics + /agents/[did] + /marketplace data paths References: backend PDF §3 (off-chain infra), §3.3 (slot-history per backend §3), services/indexer/migrations/, services/indexer/src/{schema,stats,ingest,reorg,poller,jobs}.rs

Goal

Document the human-readable shape of the indexer's Postgres database: every table, every column, every index, every materialized view, plus the read/write contracts for each. Migrations are the source of truth for column types; this spec is the source of truth for what each column means and which code path owns it. Anyone reading this should be able to plan a query, audit a write path, or estimate storage growth without spelunking through diesel macros.

Non-goals

Not the migration tool. diesel migration runs up.sql / down.sql verbatim — this spec describes intent, not bytes.
Not the Discovery API spec. That spec (specs/discovery-api.md, BACKLOG line 106) consumes this schema and adds REST + WS surfaces.
Not a Yellowstone gRPC swap plan. Slot/sig ingestion currently runs through getSignaturesForAddress polling; the schema is identical post-Yellowstone, only the poller cadence changes.
Not a backup/restore runbook. Render's PG snapshot policy + WAL archiving lives in ops docs once provisioning lands (BACKLOG line 59 — Dennis-approved per INBOX, pending exec).

Migration order

#	Date	Path	Adds
1	2026-04-14	`2026-04-14-000001_init`	`blocks`, `program_events`, `reorg_log` + indexes
2	2026-04-15	`2026-04-15-000002_sync_cursor`	`sync_cursor`; drops `program_events_slot_fkey`
3	2026-04-16	`2026-04-16-000003_reputation_rollup`	`category_reputation`, `reputation_samples`, materialized view `reputation_rollup`
4	2026-04-16	`2026-04-16-000004_retro_eligibility`	`retro_eligibility`, `retro_fee_samples`

down.sql exists for each — diesel migration revert is exercised in CI to keep them honest.

Tables

`blocks` — slot anchor table

Records every slot we've ever ingested. signature → slot lookups go through program_events; this table answers "have we seen this slot?" + "what was its hash?" for reorg detection.

Column	Type	Notes
`slot`	`BIGINT` PRIMARY KEY	Solana slot number. Monotonic but not contiguous — RPC skips empty slots.
`hash`	`TEXT NOT NULL`	Block hash as base58. Empty string acceptable for slots we know about via signature-only paths and never fetched the block for (reorg watcher tolerates this).
`parent_slot`	`BIGINT NULL`	NULL for genesis or first-seen slot in a fresh DB. Indexed for fork-walk queries.
`processed_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`	Wall-clock at ingestion. Not authoritative time — use slot for ordering.

Writers: ingest::record_block (poller batch path, INSERT … ON CONFLICT (slot) DO UPDATE). Readers: stats::network_health (MAX(slot), COUNT(*)), reorg::reorg_check_cycle (latest slot for window). Storage: ~1.2 KB/row at full base58 hash. At Solana mainnet's ~432k slots/day, full retention = ~520 MB/day. M1 retains all (cheap); M3+ may roll-up older slots (TBD per Open Q).

`program_events` — append-only event log

The workhorse. One row per (signature, event_name). Anchor emit_cpi! payloads are decoded against the IDL by borsh_decode::decode and stored as JSONB.

Column	Type	Notes
`id`	`BIGSERIAL` PRIMARY KEY	Monotonic ingestion order; not slot order.
`signature`	`TEXT NOT NULL`	Base58 transaction signature.
`slot`	`BIGINT NOT NULL`	Slot of the containing tx. No FK to `blocks.slot` — see migration 2 below.
`program_id`	`TEXT NOT NULL`	Base58 program ID. Indexed for per-program scans.
`event_name`	`TEXT NOT NULL`	IDL event-struct name (e.g. `TaskCreated`, `AgentRegistered`). Indexed.
`data`	`JSONB NOT NULL DEFAULT '{}'`	Decoded body. `u64`/`i64` fields decode to JSON strings (Borsh→JSON loses precision in JS); `Pubkey`/`[u8; N]` decode to base58 / hex per `borsh_decode`. Decode failures store `{_decode_error, raw_hex}` so a malformed event doesn't stall the stream.
`ingested_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`	Wall-clock ingestion. Used by analytics 24h/per-day/per-min windows.

UNIQUE constraint: (signature, event_name). Same tx can emit multiple distinct events; same (sig, name) is idempotent.

Indexes: program_id, slot, event_name. JSONB body intentionally not indexed at M1 — query patterns are (event_name, ingested_at) scans plus the materialized rollups below. If a portal page starts hammering data->>'agent_did' filters, add a GIN partial index per Open Q.

Migration 2 dropped the slot → blocks.slot foreign key. Rationale: poller ingests events from getSignaturesForAddress paginated lookbacks; the corresponding block row is fetched on a separate cadence and may lag. Enforcing FK at insert-time forced an extra round-trip per event for marginal integrity benefit (we already have signature as the primary referent for chain provenance). The down.sql re-adds the FK so a downgrade is consistent.

Writers: ingest::record_event (INSERT … ON CONFLICT DO NOTHING per the UNIQUE constraint). Readers: stats::* (every analytics endpoint), jobs::retro_rollup (TODO once fee_collector events ship), pubsub::Publisher (fire-and-forget Redis fanout post-insert).

`reorg_log` — detected dropped signatures

Append-only audit trail of every signature the reorg watcher dropped from program_events. One row per dropped sig, written transactionally with the DELETE + sync_cursor rewind in reorg::apply_rollback.

Column	Type	Notes
`id`	`BIGSERIAL` PRIMARY KEY
`slot`	`BIGINT NOT NULL`	Slot the dropped sig was at. Indexed.
`old_hash`	`TEXT NOT NULL`	Reused as the dropped signature string (column name is historical from a planned block-hash-tracking variant we didn't ship; safe to rename via a future migration).
`new_hash`	`TEXT NOT NULL`	Sentinel `"dropped"`. Reserved for the same future migration.
`detected_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`	Wall-clock detection time. Drives `reorgs_24h` analytics.

Open Q (Q-3): rename old_hash → dropped_signature and remove new_hash in a future migration. Naming defect is harmless but auditor-confusing.

`sync_cursor` — per-program ingest watermark

One row per program ID. Lets the poller resume from last_sig after restart and lets the reorg watcher rewind without scanning the entire program_events table.

Column	Type	Notes
`program_id`	`TEXT` PRIMARY KEY	Base58 program ID.
`last_sig`	`TEXT NULL`	Most recent signature successfully ingested. NULL after a reorg-rollback that erased every event for the program.
`last_slot`	`BIGINT NULL`	Slot of `last_sig`. Tracked separately so reorg-rewind queries can range on slot without parsing sig.
`updated_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`	Wall-clock of last update.

Writers: poller::set_cursor (per-program, INSERT … ON CONFLICT (program_id) DO UPDATE); reorg::apply_rollback (per-program rewind). Readers: poller::get_cursor (resume).

`category_reputation` — per-(agent, capability) authoritative score

Mirror of the on-chain ReputationScore PDA per (agent_did, capability_bit). Off-chain copy lets the portal serve leaderboards without N RPC fetches per page render.

Column	Type	Notes
`agent_did`	`BYTEA`	32-byte agent DID. Part of composite PK.
`capability_bit`	`SMALLINT`	0–31 (32 M1 capability tags). Part of composite PK.
`quality`	`SMALLINT NOT NULL DEFAULT 0`	EWMA bps, 0–10000.
`timeliness`	`SMALLINT NOT NULL DEFAULT 0`	Same.
`availability`	`SMALLINT NOT NULL DEFAULT 0`	Same; decays via the heartbeat-miss path in `jobs::reputation_rollup`.
`cost_efficiency`	`SMALLINT NOT NULL DEFAULT 0`	Same.
`honesty`	`SMALLINT NOT NULL DEFAULT 0`	Same.
`jobs_completed`	`BIGINT NOT NULL DEFAULT 0`	Lifetime count.
`jobs_disputed`	`BIGINT NOT NULL DEFAULT 0`	Lifetime count.
`last_task_id`	`BYTEA NULL`	Most recent task that updated this row.
`status`	`TEXT NOT NULL DEFAULT 'active'`	CHECK constraint: `'active'
`last_update`	`TIMESTAMPTZ NOT NULL DEFAULT now()`	Wall-clock of last write.

Composite PRIMARY KEY: (agent_did, capability_bit). Each agent can hold a separate reputation score per capability bit they advertise.

Indexes: (capability_bit, status) for leaderboard scans; (last_update DESC) for "recently active" queries.

Writers: jobs::reputation_rollup projection writes (TODO — currently only refreshes the view; per-axis writes land when IACP heartbeat ingest is wired). Readers: reputation_rollup materialized view (defined inline).

`reputation_samples` — append-only sample stream

Raw event log feeding category_reputation. One row per on-chain reputation update event observed.

Column	Type	Notes
`id`	`BIGSERIAL` PRIMARY KEY
`signature`	`TEXT NOT NULL`	Source tx signature.
`slot`	`BIGINT NOT NULL`	Slot of source tx. Indexed.
`agent_did`	`BYTEA NOT NULL`	32-byte DID.
`capability_bit`	`SMALLINT NOT NULL`	0–31.
`task_id`	`BYTEA NOT NULL`	32-byte task hash. Indexed.
`completed`	`BOOLEAN NOT NULL`	Whether the underlying task settled successfully.
`quality_delta`	`SMALLINT NOT NULL DEFAULT 0`	EWMA-bps delta this sample contributed.
`timeliness_delta`	`SMALLINT NOT NULL DEFAULT 0`	Same.
`correctness`	`SMALLINT NOT NULL DEFAULT 0`	0/1 flag from the proof-verifier path; reserved for honesty axis.
`judge_kind`	`TEXT NOT NULL`	CHECK: `'Circuit'
`execution_root`	`BYTEA NOT NULL`	32-byte merkle root of the execution trace this sample is keyed by.
`ingested_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`

UNIQUE constraint: (signature, task_id, agent_did, capability_bit) — one sample per axis per task per agent per tx.

Indexes: (agent_did, capability_bit, ingested_at DESC) for per-agent timeline queries; (task_id) for task-detail joins; (slot) for reorg-window scans.

Storage: ~200 B/row + indexes. At a steady-state 100 tasks/sec across all capabilities, that's ~17 GB/day pre-compression. Pruning policy is an Open Q.

`reputation_rollup` — materialized view

Read-side surface for the /agents/leaderboard page and the top-agents / agent-graph analytics endpoints.

SELECT
    agent_did, capability_bit,
    quality, timeliness, availability, cost_efficiency, honesty,
    jobs_completed, jobs_disputed,
    (quality::int + timeliness + availability + cost_efficiency + honesty) / 5
        AS composite_score,
    last_update
FROM category_reputation
WHERE status = 'active';

Refresh: jobs::reputation_rollup::refresh_rollup calls REFRESH MATERIALIZED VIEW CONCURRENTLY reputation_rollup every 60s (per specs/reputation-graph.md). CONCURRENTLY requires the unique index reputation_rollup_pk_idx on (agent_did, capability_bit) — installed by the same migration.

Indexes: (agent_did, capability_bit) UNIQUE (CONCURRENTLY enabler); (capability_bit, composite_score DESC) for per-capability leaderboards; (agent_did) for per-agent radar fetches.

`retro_eligibility` — per-operator airdrop snapshot

One row per operator. Latest snapshot wins; nightly rollup overwrites. See specs/retro-airdrop.md.

Column	Type	Notes
`operator`	`BYTEA` PRIMARY KEY	32-byte operator pubkey.
`net_fees_micro_usdc`	`BIGINT NOT NULL DEFAULT 0`	Sum of post-wash-filter fees. micro-USDC = 6-decimal USDC base unit.
`wash_excluded_micro_usdc`	`BIGINT NOT NULL DEFAULT 0`	Sum of fees excluded via `WashFlag::*` classification.
`personhood_tier`	`TEXT NOT NULL DEFAULT 'none'`	CHECK: `'none'
`personhood_multiplier`	`NUMERIC(4, 3) NOT NULL DEFAULT 0.5`	0.50 / 0.75 / 1.00.
`cold_start_multiplier`	`NUMERIC(4, 3) NOT NULL DEFAULT 1.0`	0.50 if any agent was registered within `COLD_START_EPOCHS` of snapshot.
`estimated_allocation`	`NUMERIC(20, 6) NULL`	Final allocation estimate (units = SAEP token). NULL until first nightly run scores this operator.
`epoch_first_seen`	`INT NOT NULL`	Earliest epoch this operator's fees appeared in.
`last_updated`	`TIMESTAMPTZ NOT NULL DEFAULT now()`

Indexes: (net_fees_micro_usdc DESC) for leaderboard rendering; (personhood_tier) for tier-segment analytics.

`retro_fee_samples` — append-only fee-event log

Raw FeeClaim events from fee_collector feeding the nightly retro_eligibility rollup. (M1: FeeClaim is a forward-looking name from specs/retro-airdrop.md; fee_collector currently emits FeesCollected / StakerClaimed / SlashReceived / CollateralForfeited but no retro-airdrop-shaped per-(operator, agent_did, task_id, client) event yet — see §jobs::retro_rollup row below for the matching TODO.)

Column	Type	Notes
`id`	`BIGSERIAL` PRIMARY KEY
`signature`	`TEXT NOT NULL`	Source tx signature.
`slot`	`BIGINT NOT NULL`
`operator`	`BYTEA NOT NULL`	Indexed by `(operator, epoch)`.
`agent_did`	`BYTEA NOT NULL`
`task_id`	`BYTEA NOT NULL`
`client`	`BYTEA NOT NULL`	Indexed for circular-trade graph traversal.
`epoch`	`INT NOT NULL`	Indexed. Epoch = 30d per `retro-airdrop.md`.
`fee_micro_usdc`	`BIGINT NOT NULL`	Pre-classification raw fee amount.
`wash_flag`	`TEXT NULL`	CHECK: `'self_task'
`ingested_at`	`TIMESTAMPTZ NOT NULL DEFAULT now()`

UNIQUE constraint: (signature, task_id) — one sample per fee-emission tx per task.

Read query catalog

services/indexer/src/stats.rs exposes one HTTP route per query. All queries are read-only and use raw SQL (diesel::sql_query) against the live tables — no caching layer at M1.

Endpoint	Tables read	Query shape
`/stats/totals`	`program_events`	4 subselects: distinct `agent_did` from `AgentRegistered`, count of `TaskCreated`, sum of `agent_payout` from `TaskReleased`, `StreamInitialized − StreamClosed`.
`/stats/tasks-per-day?days=N`	`program_events`	`generate_series` ⨝ `date_trunc('day', ingested_at) WHERE event_name='TaskCreated'`. Default 30d, max 180d.
`/stats/top-capabilities?limit=N`	`reputation_samples`	`GROUP BY capability_bit WHERE completed=true ORDER BY count DESC`.
`/stats/fees-burned`	`program_events`	`SUM(data->>'protocol_fee')` + `SUM(data->>'solrep_fee')` + 24h-windowed sum, all from `TaskReleased`.
`/stats/network-health`	`blocks`, `program_events`, `reorg_log`	`MAX(slot)`, `reorgs_24h`, `events_per_min`, `events_total`, `blocks_total`.
`/stats/top-agents?limit=N`	`reputation_rollup`	`GROUP BY agent_did ORDER BY AVG(composite_score) DESC, SUM(jobs_completed) DESC`.
`/stats/agent-graph?limit=N`	`reputation_rollup`	Two-query: top agents + edges (capability rows for those agents).

MAX_LIMIT = 500 clamps any limit parameter; agent_graph clamps to 200.

Write paths

Path	Writes to	Concurrency model
`ingest::record_block`	`blocks`	`INSERT … ON CONFLICT DO UPDATE`. Single poller task, no contention.
`ingest::record_event`	`program_events`	`INSERT … ON CONFLICT DO NOTHING`. Idempotent on `(signature, event_name)`.
`poller::set_cursor`	`sync_cursor`	`INSERT … ON CONFLICT DO UPDATE`. One row per program.
`reorg::apply_rollback`	`reorg_log` ∪ `program_events` ∪ `sync_cursor`	Single transaction: `INSERT reorg_log` per dropped sig, `DELETE FROM program_events WHERE slot >= fork_slot`, rewind `sync_cursor` per program.
`jobs::reputation_rollup::refresh_rollup`	`reputation_rollup` view	`REFRESH MATERIALIZED VIEW CONCURRENTLY`. Requires the unique PK index.
`jobs::reputation_rollup` projections (TODO)	`category_reputation`, `reputation_samples`	Per-event upsert; bounded by IACP heartbeat ingest cadence. Not yet wired pending heartbeat-presence table (Open Q).
`jobs::retro_rollup` (TODO)	`retro_eligibility`, `retro_fee_samples`	Nightly job. Not yet wired pending `fee_collector` event emission (separate BACKLOG item).

pubsub::Publisher reads from record_event's success path and XADDs to saep:events:<program> + saep:events:all Redis channels — fire-and-forget, never blocks ingest.

Slot / reorg invariants

Append-only between reorgs. program_events rows are never updated; only inserted (idempotent) or deleted by the reorg watcher.
Reorg deletes are slot-bounded. DELETE FROM program_events WHERE slot >= fork_slot is the only delete path on this table. fork_slot = min(dropped_signature.slot) per reorg::detect_dropped.
Reorg window is bounded. RPC getSignatureStatuses only retains ~150 slots; REORG_WINDOW_SLOTS defaults to 150 and REORG_WINDOW_DEPTH defaults to 500 sigs to keep the watcher inside the cache. Reorgs deeper than that are not detectable by this design — flagged in Open Q.
sync_cursor is the only mutable per-program state. A reorg rewinds it to the latest surviving event below fork_slot (or NULL). Resume-from-restart reads from this row.
blocks rows are not deleted on reorg. Block hash + parent_slot are still useful as fork-history evidence even after the events at that slot are dropped.

Operational notes

Connection pool: r2d2 with POOL_MAX_SIZE = 8 (db.rs:10). Both the ingest path and the analytics API share this pool. Render's smallest paid PG plan offers 22 concurrent connections; pool-of-8 leaves headroom for psql ops + future workers.
Materialized view refresh: every 60s. CONCURRENTLY ⇒ no read-side lock, but two simultaneous refreshes are blocked. Scheduler is single-tenant so this is a non-issue at M1.
Index maintenance: program_events will accumulate fastest. Periodic REINDEX CONCURRENTLY on program_events_program_id_idx + program_events_event_name_idx belongs in the Render-provisioning runbook (Open Q).
Pruning policy: none at M1. program_events + reputation_samples + retro_fee_samples grow unbounded. Open Q.
Migrations: diesel migration run from services/indexer/migrations/. CI runs both up then down for each migration to keep down.sql honest.
Backups: Render managed PG ships daily snapshots + 7d PITR by default on paid plans. Confirm at provisioning-time per BACKLOG line 59.

Open questions for reviewer

reorg_log column-name cleanup. old_hash reused as dropped_signature, new_hash always 'dropped'. Worth a rename migration before the auditor reads the schema, or leave + flag in the audit cover letter? Default: rename pre-audit (low risk, clean slate).
JSONB GIN indexes on program_events.data. Currently no GIN. Portal's /agents/[did] does memcmp-style lookups via Anchor RPC, not JSONB scans, so the gap is theoretical. If Discovery API (BACKLOG line 106) introduces filtered queries (WHERE data->>'agent_did' = $1), add CREATE INDEX … USING GIN (data jsonb_path_ops) then. Default: defer.
Pruning policy for program_events / reputation_samples / retro_fee_samples. Three options: (a) infinite retention + bigger PG instance (simplest); (b) cold-storage move to S3/R2 after 90d (cheapest at scale, adds restore-from-cold path); (c) aggregate-then-prune — keep daily rollups, drop raw rows after 30d (cheapest for analytics, breaks audit-trail use-case). Default: (a) for M1, revisit at M3 if storage cost > $20/mo.
Heartbeat-presence table. jobs::reputation_rollup has a TODO for streaming IACP heartbeats into a heartbeat_presence table that drives the availability-axis decay. New table not yet specified. Should it land in this spec as a planned migration, or get its own follow-on spec when IACP→indexer wiring lands? Default: follow-on (out of scope here).
Reorg-window deeper than 150 slots. RPC cache caps detection at ~150 slots. Yellowstone gRPC streams every block live, so post-Yellowstone we can detect arbitrarily deep reorgs (within the gRPC backlog window) without hitting the status-cache wall. Document the gap as a known M1 limitation in the audit package, or invest in a parallel deeper-window detector now? Default: document + defer until Yellowstone swap.
blocks.hash empty-string convention. Block-fetch is opportunistic; rows can land with empty hash. Reorg watcher tolerates it but auditor may flag. Make it NULL-able instead with a CHECK that exactly one of (NULL, non-empty base58) holds? Default: rename empty → NULL in next migration, low risk.
reputation_rollup refresh cadence. 60s is the spec default. Tighter (15s) cuts portal staleness at the cost of CONCURRENTLY refresh CPU; looser (5min) cuts CPU at the cost of staleness. M1 is fine at 60s; revisit when portal page-load metrics exist.

Done checklist

All 4 migrations enumerated with intent + writer + reader.
Every column documented (type + meaning, not just diesel macro).
All indexes catalogued with their query target.
Materialized view refresh contract pinned (CONCURRENTLY + unique-index dependency).
Read query catalog cross-references stats.rs endpoints.
Write path catalog enumerates concurrency model per path.
Slot/reorg invariants stated.
Operational notes (pool, refresh, pruning, migrations, backups).
7 reviewer open-questions sized to a single review batch.
Reviewer ratifies (or pushes back on) the 7 open questions.
Pre-audit rename pass: reorg_log.old_hash → dropped_signature, blocks.hash empty → NULL (if Q-1 + Q-6 ratified).
Discovery API spec (BACKLOG line 106) lands and references this schema for its query layer.

Spec — Indexer Postgres Schema

Goal

Non-goals

Migration order

Tables

blocks — slot anchor table

program_events — append-only event log

reorg_log — detected dropped signatures

sync_cursor — per-program ingest watermark

category_reputation — per-(agent, capability) authoritative score

reputation_samples — append-only sample stream

reputation_rollup — materialized view

retro_eligibility — per-operator airdrop snapshot

retro_fee_samples — append-only fee-event log