Spec — Discovery API

Owner: solana-indexer-engineer + frontend-engineer (consumer) Depends on: specs/indexer-schema.md, specs/08-iacp-bus.md, specs/03-program-agent-registry.md, specs/07-program-task-market.md, specs/04-program-treasury-standard.md Blocks: portal /marketplace (server-side search), portal /tasks list view, portal /agents/[did] history + radar, analytics live wiring beyond the 7 stats::* roll-ups, A2A marketplace panel (M2), SDK @saep/sdk/discovery client References: backend PDF §3 (off-chain infra), frontend PDF §2.3 (Marketplace), §2.5 (Tasks), §3.1 (SIWS), §3.4 (optimistic UI), services/indexer/src/{api,stats,pubsub}.rs

Goal

Replace browser-originated getProgramAccounts memcmp scans (current pattern across useAllAgents, useAgentTasks, useTasksByCreator) with a server-side REST + WS surface hitting the indexer's Postgres schema. Two effects: (1) shifts the read-heavy filter+sort path off mainnet RPC, where 5 memcmp scans per marketplace load is a cost + latency tax that does not scale past a few hundred agents, and (2) gives realtime consumers a single bearer-auth'd WS channel for agent / task / treasury state transitions without standing up a per-page Yellowstone subscription.

Non-goals

Not a write surface. All mutations stay SDK → RPC → chain. Discovery is read-only.
Not the analytics API. services/indexer/src/stats.rs already owns /stats/* — 7 fixed-shape aggregate endpoints for the public analytics page. Discovery adds row-level search + subscription; aggregate roll-ups stay there.
Not the IACP bus. IACP (specs/08-iacp-bus.md) moves signed agent-to-agent envelopes; Discovery moves indexed chain state. Overlap is zero — different auth model, different payload shape, different delivery guarantees.
Not a Yellowstone gRPC passthrough. Consumers that need raw chain events (validators, audit tooling) should subscribe to Yellowstone directly via Helius dedicated. Discovery WS delivers decoded, filtered, schema-stable events.
Not a historical backfill tool. Initial implementation serves live + recent (≤30d default) state; deeper history is Open-Q #8.

Service placement

Discovery lives as a standalone TS service at services/discovery/ (fastify 5 + @fastify/websocket + pg + zod) serving /v1/discovery/*. The indexer binary retains a legacy /api/* alias in services/indexer/src/discovery.rs (Rust shim over the same matviews) until portal callers migrate off useAgentReputation et al., at which point the shim retires. Both paths read the same Postgres matviews (agent_directory / task_directory / reputation_rollup); no schema divergence.

The original M1 plan (cycle 106) was to land discovery inside the indexer binary as services/indexer/src/api/discovery.rs on the stats::router() pattern — rationale was shared PG + Redis pools, one deploy target, no connection-pool doubling. The standalone-TS path landed instead (public 47e29c9) to let the read path evolve on the TS/zod/fastify stack the rest of the web tier already uses; the editorial drift is tracked in INBOX.md. Matview access works identically over a separate PG connection pool at M1 scale; re-fold into the indexer binary is a file-move + main.rs edit, not a schema migration, if the connection-pool footprint becomes load-bearing at M3.

Surface — REST

Base path: /v1/discovery. All responses JSON. Errors follow {error, detail?, request_id} with request_id correlating to a saep_indexer_request_id log field.

Agents

Method + path	Purpose	Query params	Response
`GET /agents`	Filtered + ranked agent list	`capability_mask` (hex u64, required unless `operator` supplied), `min_reputation` (u16 composite 0-10000), `max_base_price` (u64 lamports), `status` (`active`/`slashed`/`paused`, default `active`), `operator` (base58, mutually exclusive w/ `capability_mask`), `cursor` (opaque), `limit` (default 50, max 200), `sort` (`reputation_desc`/`price_asc`/`recent_desc`, default `reputation_desc`)	`{items: AgentSummary[], cursor: string?, total: number?}`
`GET /agents/:did`	Single agent detail	—	`AgentDetail` (full `ReputationScore` + `TreasurySummary` + last 10 task ids)
`GET /agents/:did/tasks`	Tasks for agent, paginated	`status` (filter by TaskStatus enum), `cursor`, `limit`	`{items: TaskSummary[], cursor?}`
`GET /agents/:did/streams`	Active payment streams	`status` (`open`/`closed`, default `open`)	`{items: StreamSummary[]}`
`GET /agents/:did/reputation`	Full reputation time series	`bucket` (`day`/`week`, default `day`), `days` (max 90)	`{series: ReputationPoint[]}`

Shape types below. total is returned only when a WHERE clause narrows the result set below a cheap-count threshold (~10k rows per Postgres EXPLAIN budget); for wider queries it is null + the UI shows "50+".

Tasks

Method + path	Purpose	Query params	Response
`GET /tasks`	Filtered task list	`status` (comma-sep), `creator` (base58), `agent_did` (hex-32), `capability_mask` (hex), `created_after` (RFC3339), `created_before` (RFC3339), `cursor`, `limit`, `sort` (`created_desc`/`deadline_asc`/`reward_desc`, default `created_desc`)	`{items: TaskSummary[], cursor?}`
`GET /tasks/:task_id_hex`	Single task detail	—	`TaskDetail` (full `TaskContract` + current escrow balance + proof verify status)
`GET /tasks/:task_id_hex/timeline`	State transitions	—	`{events: TaskEvent[]}` (parsed from `program_events` rows with `event_name` ∈ `{TaskCreated, BidBookOpened, BidCommitted, BidRevealed, BidBookClosed, ResultSubmitted, TaskVerified, VerificationFailed, TaskReleased, DisputeRaised, TaskExpired, TaskCancelled}`)

Capabilities

Method + path	Purpose	Response
`GET /capabilities`	All approved tags from `CapabilityRegistry`	`{items: CapabilityTag[]}` (cached 300s)
`GET /capabilities/:bit`	One tag + manifest URI	`CapabilityTag`

Treasury

Method + path	Purpose	Query params	Response
`GET /treasury/:did`	Treasury overview	—	`{vaults: VaultBalance[], allowed_mints: AllowedMint[], spend_window: SpendWindow}`
`GET /treasury/:did/vaults`	Per-mint vault balances	`mint` (optional filter)	`{items: VaultBalance[]}`

Shape types

interface AgentSummary {
  did_hex: string;            // 32-byte hex
  operator: string;            // base58 pubkey
  capability_mask: string;     // hex u64
  reputation_composite: number;// 0-10000 bps
  base_price_lamports: string; // u64 as string (JSON-safe)
  status: "active" | "slashed" | "paused";
  manifest_uri: string | null;
  last_active_unix: number;
}

interface AgentDetail extends AgentSummary {
  reputation: {
    quality: number; timeliness: number; availability: number;
    cost_efficiency: number; honesty: number;
  };
  jobs_completed: number;
  jobs_disputed: number;
  stake_lamports: string;
  treasury_summary: { allowed_mints: number; active_streams: number };
  recent_task_ids_hex: string[];  // up to 10
}

interface TaskSummary {
  task_id_hex: string;
  creator: string;
  agent_did_hex: string | null;  // null until accepted
  status: TaskStatus;
  reward_lamports: string;
  capability_mask: string;
  created_at_unix: number;
  deadline_unix: number;
}

interface TaskDetail extends TaskSummary {
  payload_hash_hex: string;
  result_hash_hex: string | null;
  proof_verified: boolean | null;   // null until proof_verifier sees it
  escrow_balance_lamports: string;
  dispute_opened: boolean;
}

interface StreamSummary {
  stream_id_hex: string; from: string; to: string; mint: string;
  rate_per_sec: string; start_unix: number; end_unix: number | null;
  total_paid: string; status: "open" | "closed";
}

interface CapabilityTag {
  bit: number; slug: string; manifest_uri: string; approved: boolean;
}

interface VaultBalance { mint: string; balance: string; decimals: number; }
interface AllowedMint { mint: string; per_tx_cap_base_units: string; }
interface SpendWindow { spent_24h_base_units: string; limit_24h_base_units: string; }

All u64/i64 returned as JSON strings (per borsh_decode contract — JS loses precision at 2^53). Timestamps are unix seconds (int), not RFC3339, to match on-chain i64 timestamps in program_events.data. Hex fields are lowercase, unprefixed.

Surface — WebSocket

Base path: /v1/discovery/ws. One socket per client. Subscription-scoped — client sends {op: "subscribe", channel, filter?}, server confirms {op: "subscribed", id}, then streams {op: "event", id, data} frames. Server-initiated {op: "keepalive", unix} every 20s; client drops if two are missed.

Channels

Channel	Events	Filter schema
`agents`	`AgentRegistered`, `ManifestUpdated`, `StakeIncreased`, `SlashExecuted`, `SlashCancelled`, `StatusChanged`	`{did_hex?, operator?, capability_mask?}`
`tasks`	`TaskCreated`, `BidBookOpened`, `BidRevealed`, `BidBookClosed`, `ResultSubmitted`, `TaskVerified`, `VerificationFailed`, `TaskReleased`, `DisputeRaised`, `TaskExpired`, `TaskCancelled`	`{task_id_hex?, client?, agent_did_hex?, status?}`
`treasury`	`VaultInitialized`, `DepositReceived`, `WithdrawExecuted`, `SwapExecuted`, `StreamInitialized`, `StreamClosed`, `LimitUpdated`	`{did_hex?, mint?}`
`proofs`	`ProofVerified`, `VerificationFailed`, `VkActivated`	`{task_id_hex?, vk_id?}`

Filters are ANDed within a subscription; multiple subscriptions can coexist on one socket. Max 16 concurrent subs per socket; 17th is rejected with {op: "error", reason: "sub_limit"}. Rate-limited to 256 subscribe ops / connection-lifetime to prevent subscribe-flood churn.

Delivery model

WS events mirror Redis Streams entries written by pubsub::Publisher (services/indexer/src/pubsub.rs, cycle ref). Publisher already fires saep:events:<program> + saep:events:all on every successful record_event. Discovery's WS handler maintains one shared redis::aio::ConnectionManager per instance, subscribes to the four logical channels above (mapped to a union of program channels + event-name filter), and fans out to bound sockets matching each sub's filter. At-most-once delivery semantics — no replay, no durable cursor. Consumers that need backfill must do an initial GET /agents/... or GET /tasks/... paginated seed before subscribing (same pattern as useAllAgents + useAgentStream composition in sdk-ui).

Trade-off: a durable cursor (?since_id=<redis_stream_id>) would handle WS disconnect + reconnect gaps. Deferred to Open-Q #5 — the common failure mode (browser tab sleep) is better served by "re-fetch on reconnect" which is already the Tanstack Query default.

Auth model

Public REST + WS reads: no auth required for agent / task / capability / treasury-balance queries. All on-chain events are public by observation; the API merely shapes them. CORS: * on /v1/discovery/* GETs.
Authenticated augmented fields: when a SIWS session cookie is present (same saep_session cookie minted by portal /api/auth/verify, shared verifySessionJwt from @saep/sdk/auth/session), responses include owner-scoped fields: TreasuryDetail.spend_window, AgentDetail.stake_pending_withdrawal, per-stream unclaimed_amount. Without a cookie those fields are omitted (not nulled — omission makes cache-keying cleaner).
Rate-limit scope differs by auth: anonymous callers are bucketed per source-IP via X-Forwarded-For (100 req/min per endpoint class); authenticated callers are bucketed per sub claim (500 req/min). Implementation reuses iacp's KeyedRateLimiter pattern (services/iacp/src/rate_limit.ts, ported to Rust as services/indexer/src/rate_limit.rs — Open-Q #4).
WS auth: bearer token via query string ?token=<ws-token> minted by portal /api/auth/ws-token (5-min TTL, HS256, issuer saep.portal per cycle-62 shared verifier). Same verifier IACP uses — one session-JWT scheme across the platform. Unauthenticated WS connections accepted but restricted to the agents + capabilities channels with public fields only (no treasury / proofs subs, no augmented fields).

Caching — Redis TTLs

redis::aio::ConnectionManager singleton in ApiState, keyed disc:<endpoint>:<hash(query)>.

Endpoint	TTL	Invalidation
`GET /agents` (filtered list)	30s	Event-driven invalidation on `AgentRegistered` / `StatusChanged` / `ManifestUpdated` (publisher subscribes to `saep:events:agent_registry` and `DEL`s prefix `disc:agents:*`).
`GET /agents/:did`	60s	Same event-driven invalidation, narrower key (`disc:agents:did:<did>`).
`GET /agents/:did/tasks`	15s	`disc:agents:tasks:<did>` invalidated on `TaskCreated` / `BidBookClosed` / `ResultSubmitted` / `TaskReleased` / `DisputeRaised` / `TaskExpired` / `TaskCancelled` with `agent_did == <did>`.
`GET /tasks` (filtered list)	10s	Too high cardinality for event-driven invalidation; short TTL + stampede protection (single-flight via `redis SET NX` lock, 2s timeout).
`GET /tasks/:task_id_hex`	30s	Event-driven on any task-state event for that `task_id`.
`GET /tasks/:task_id_hex/timeline`	60s	Append-only underlying data; TTL is the only invalidator (events append but never rewrite).
`GET /capabilities`	300s	Event-driven on `CapabilityTagProposed` / `CapabilityTagApproved` / `CapabilityTagRetired`.
`GET /treasury/:did`	20s	Event-driven on `treasury_standard` events for that did.
All WS responses	n/a	Redis-backed pubsub is the delivery substrate, not cached.

Cache-miss path: DB query → format → SET EX <ttl> → return. Cache-hit serves directly from Redis. X-Cache: HIT|MISS header on all REST responses.

Pagination

Opaque cursor — base64-encoded {last_id, last_sort_value} tuple scoped to the sort key. Cursor-based rather than offset-based to survive insertions during pagination (offset-based would double-serve rows when new events land mid-scroll). Max 200 items per page (prevents memory spikes on denormalized joins). Cursor TTL unbounded on the wire but server re-parses every request — an old cursor against a post-migration schema returns 400 invalid_cursor.

Query implementation

Reads hit these tables from specs/indexer-schema.md:

program_events (primary source of truth for all agent / task / treasury state — event-sourced)
reputation_rollup materialized view (composite score + dims, refreshed 60s)
category_reputation (per-capability-bit reputation — the five dims)

No new tables at M1. Two denormalized views live alongside reputation_rollup for search efficiency:

Materialized view — `agent_directory`

Refreshed every 60s alongside reputation_rollup. One row per agent, folding latest AgentRegistered + ManifestUpdated + StakeIncreased + SlashExecuted events.

Column	Type	Source
`agent_did`	`BYTEA` (32) PRIMARY KEY	event `data->>'agent_did'`
`operator`	`TEXT` (base58)	latest `AgentRegistered.operator`
`capability_mask`	`NUMERIC(20)` (u64 as numeric)	latest `ManifestUpdated.capability_mask` else `AgentRegistered.capability_mask`
`base_price_lamports`	`NUMERIC(20)`	latest `ManifestUpdated.base_price` (M1: event lacks `base_price`; column omitted from matview until event payload extends — see migration `2026-04-17-000005_discovery_views`)
`reputation_composite`	`INT`	join `reputation_rollup.composite_score` (capability-weighted avg per Open-Q #2)
`status`	`TEXT`	derived: `slashed` if any `SlashExecuted` (post-execution slash is terminal at M1; `SlashCancelled` only offsets pre-execution `SlashProposed`); `paused` if latest `StatusChanged.new_status = 1`; else `active`
`manifest_uri`	`TEXT`	latest `ManifestUpdated.manifest_uri` (M1: event lacks `manifest_uri`; column is `NULL::text` stub until event payload extends)
`last_active_unix`	`BIGINT`	max(slot-time) across any event with this did
`refreshed_at`	`TIMESTAMPTZ`	`now()` at refresh

Indexes: (capability_mask, reputation_composite DESC), (status, reputation_composite DESC), (operator).

Materialized view — `task_directory`

Same cadence. One row per task, folding latest state transition.

Column	Type	Source
`task_id`	`BYTEA` (32) PRIMARY KEY	`data->>'task_id'`
`creator`	`TEXT`	`TaskCreated.client`
`agent_did`	`BYTEA` NULL	`TaskCreated.agent_did` (IDL carries `agent_did` on create; no separate `TaskAccepted` event at M1 — acceptance is signalled by `BidBookClosed.winner_agent`)
`status`	`TEXT`	derived from latest state event
`reward_lamports`	`NUMERIC(20)`	`TaskCreated.payment_amount`
`capability_mask`	`NUMERIC(20)`	`TaskCreated.required_capabilities` (M1: event lacks this field; column is `NULL::numeric` stub until event payload extends — `(capability_mask, reward_lamports DESC)` index searches over NULLs)
`created_at_unix`	`BIGINT`	`TaskCreated.timestamp`
`deadline_unix`	`BIGINT`	`TaskCreated.deadline`
`updated_at_unix`	`BIGINT`	max slot-time across all events for this task

Indexes: (status, created_at_unix DESC), (creator, created_at_unix DESC), (agent_did, created_at_unix DESC), (capability_mask, reward_lamports DESC).

Refresh contract: REFRESH MATERIALIZED VIEW CONCURRENTLY — requires unique index on PK, which is satisfied. Refresh takes ~200ms for 10k agents + 50k tasks per EXPLAIN budget; acceptable at M1 scale, reconsider at 100k+.

Rate limits + quotas

Anonymous: 100 req/min per IP per endpoint-class (5 classes: agents / tasks / treasury / capabilities / catch-all). Authenticated: 500 req/min per sub claim. WS: 10 subscribe ops / min per connection, 1024 max queue depth per sub (events dropped on backpressure with {op: "dropped", id, count} frame). Envelope: 64 KiB max WS frame.

Over-limit REST returns 429 with Retry-After header (int seconds). Over-limit WS returns {op: "error", reason: "rate_limit", retry_after_ms} and closes the offending sub (not the socket).

Metrics

New Prometheus surface:

saep_discovery_request_total{endpoint, status} — request count by endpoint class + HTTP status
saep_discovery_request_duration_seconds{endpoint} — histogram, 5ms..5s buckets
saep_discovery_cache_hits_total{endpoint} / saep_discovery_cache_misses_total{endpoint}
saep_discovery_ws_connections — gauge
saep_discovery_ws_subscriptions{channel} — gauge per channel
saep_discovery_ws_events_sent_total{channel}
saep_discovery_ws_events_dropped_total{channel,reason} — rate_limit / queue_full / auth_downgrade
saep_discovery_rate_limited_total{scope, endpoint} — scope ∈ ip/sub/ws
saep_discovery_db_query_duration_seconds{query} — histogram per named query

All metrics share the existing /metrics endpoint via the indexer's prom-client registry.

Error taxonomy

Code	HTTP	Meaning
`invalid_param`	400	Query param failed schema parse (hex length, enum value, range)
`invalid_cursor`	400	Cursor from older schema or malformed base64
`unauthorized`	401	SIWS cookie required but missing / expired — only raised when authenticated-only field is explicitly requested (anonymous callers just get reduced response)
`forbidden`	403	IACP `agent_status != active` for agent-scoped endpoints where the caller is subject-agent
`not_found`	404	DID / task_id / capability_bit not in directory
`rate_limit`	429	Rate-limit bucket empty; `Retry-After` set
`cache_unavailable`	503	Redis unreachable; sets `Retry-After: 2`
`internal`	500	Unhandled — logged with `request_id`

Security checks (§5.1 mapping)

No escalation of read scope via crafted queries. All filters are parameterized; no string concatenation into SQL. did_hex / task_id_hex / operator parsed to fixed-length bytes before query binding.
No cross-tenant leakage via session confusion. SIWS session cookie is HttpOnly, Secure, SameSite=Strict; the augmented-field path reads sub claim only, never mirrors request-body-supplied identifiers into auth scope.
No unbounded memory on WS. Per-sub queue cap 1024, per-connection total 8192; overflow drops events with metric + frame, never grows heap.
No cache-poisoning via cursor. Cursor is opaque server-side; we re-parse + re-validate on every request; a hostile cursor can only select arbitrary rows within the public result set, not pivot to authenticated rows.
No RPC leak. Indexer holds Helius dedicated RPC creds; Discovery is a read over Postgres + Redis and never proxies RPC. Browser still talks to Helius via the Vercel Edge RPC proxy (frontend PDF §3.2) for chain-direct calls.
No PII in logs. request_id + endpoint + status only. IP is hashed before logging per the current axum-extra::middleware::request_id convention; SIWS sub is logged as the first 8 chars only.
No bypass of on-chain truth. Discovery is a read cache; consumers that act on state (hire agent, raise dispute) must re-verify on-chain via SDK before signing. Optimistic UI (useSendTransaction from sdk-ui) already does a simulateTransaction preflight against the user's RPC which catches stale Discovery reads.
No circular auth via IACP. IACP's agent_status check CPIs into agent_registry directly; Discovery's agents/:did reads cached agent_directory.status which can lag up to 60s. IACP must continue doing its own on-chain check — Discovery's cached status is UX-only (greys out UI buttons), never a security boundary.

DOS surface

Hot-key Redis: a viral agent's /agents/:did can burn through the 60s TTL. Mitigation: Redis SET NX single-flight lock (same pattern as the /tasks list). First request populates, concurrent requests wait 2s or fall through to DB.
Cursor exhaustion: cursors don't persist server state, but deep scan cursors let a client walk the full program_events table in pages. Mitigation: 200-item cap per page + composite-index-only query plans (verified via EXPLAIN ANALYZE at authoring time).
WS subscribe storm: a malicious socket subscribes 16 times and disconnects + reconnects. Mitigation: per-IP connection cap (32 concurrent), per-IP subscribe-op rate limit (100/min across all connections from that IP).
Publisher gap: Redis pubsub delivery is at-most-once. A dropped publish means a WS consumer misses an event. Mitigation: every WS consumer does a GET seed on subscribe + each /agents/:did / /tasks/:task_id response carries etag (last-event-id) for optimistic consistency.

Devnet bring-up

Deploys alongside the indexer. No separate DNS / ingress / TLS — routed under the indexer's existing Render public URL at /v1/discovery/*.
No keys, no multisig, no on-chain authority. Provisioning is config-only: DISCOVERY_ENABLED=true, reuses DATABASE_URL, REDIS_URL, SESSION_SECRET from indexer env.
Feature-flag gate at main.rs::build_router — unset env disables route registration; health probe unaffected.
First-boot behavior: if agent_directory / task_directory materialized views are missing (pre-migration state), endpoints return 503 cache_unavailable until the next 60s refresh. Migration adds the views as part of 2026-04-17-000005_discovery_views (new, lands with the implementation).

CPI / external surface

None. Discovery reads Postgres + Redis; writes are event-driven invalidation DELs against Redis. No Solana RPC calls from the Discovery code path — all chain reads are mediated by the indexer's existing poller + reorg watcher.

CU budget

N/A — off-chain. Per-endpoint latency budgets:

cache hit: <20ms p99
cache miss (indexed): <80ms p99
cache miss (full scan, cursor deep-walk): <400ms p99 (hard timeout 2s returns 503)
WS event fan-out: <50ms p99 from publisher → socket

Open questions — for reviewer (maintainer + OtterSec + frontend-engineer)

Composite-reputation weighting. reputation_rollup.composite_score is currently a flat average across the five dims. For agent_directory.reputation_composite, should we re-weight per capability (e.g. quality matters more for code-generation, timeliness for oracle-query)? Default: flat average for M1; per-capability weights as a post-M1 tuning pass with governance control at M2.
total row-count policy. Exact-count via COUNT(*) on wide filters can touch >100k rows. Alternatives: (a) return null above threshold (current spec); (b) return capped count ("100+"); (c) return estimate via pg_class.reltuples. Default: (a) — cleanest to render client-side.
WS durable cursor. Should WS support ?since_id=<redis_stream_id> on subscribe to replay missed events across disconnect? Default: no at M1 (relies on refetch-on-reconnect); revisit if tab-sleep gap becomes visible.
Rate-limiter Rust port. IACP's KeyedRateLimiter is TypeScript (services/iacp/src/rate_limit.ts). Port to Rust, or use tower_governor crate? Default: tower_governor for the HTTP surface, hand-rolled per-sub limiter for WS (different shape — bandwidth + op-count axes).
Cache invalidation vs event stampede. Mass-invalidation (e.g. DEL disc:agents:* on any AgentRegistered) wipes the filtered-list cache ~1x/s during heavy registration periods. Alternative: per-filter-hash invalidation via reverse index. Default: blunt DEL * at M1; optimize if cache hit rate drops below 60% in prod.
Materialized view refresh cadence. 60s matches reputation_rollup. For task_directory, state changes per-task are user-visible and 60s is sluggish. Options: (a) lower to 15s for both (doubles the REFRESH cost); (b) make task_directory event-driven (trigger-based incremental update); (c) keep 60s + rely on WS for immediate feedback. Default: (c) for M1 — WS already handles the UX latency case.
/agents/:did/reputation time-series depth. 90d cap matches analytics endpoints but agent onboarding will want longer views. Default: 90d for M1, extend to 365d + archive older at M3 when we know storage pressure.
Historical backfill. Indexer currently retains all program_events; no pruning at M1. Discovery returns the full history through task_directory / cursor pagination. At what scale does this stop being free? Default: flag at 10M events (per-month growth at expected M1 traffic); revisit then.
SDK client shape. Portal consumes Discovery via a new @saep/sdk/discovery submodule. Shape mirrors REST endpoints as typed functions + sdk-ui hooks (useAgentSearch, useTaskSearch, useAgentDetail). Auth inheritance: same SIWS cookie flow portal already has. Open: zod runtime schema per response type, or TS-only? Default: zod, matches existing SDK accounts-module pattern.
A2A cross-agent discovery at M2. Agent-to-agent hiring (agent hires sub-agent) will need a programmatic discovery channel — same endpoints with agent wallet SIWS, or a dedicated /v1/discovery/a2a with IACP-envelope auth? Default: reuse endpoints + session, add A2A-specific filters (e.g. requires_reputation_attestation) at M2; a dedicated path only if auth model diverges meaningfully.
Redis connection sharing with pubsub Publisher. Publisher (services/indexer/src/pubsub.rs) already holds a redis::aio::ConnectionManager. Discovery should share it or hold its own? Default: share — single Arc<ConnectionManager> in ApiState, half the connection count to Redis, same reconnect semantics.

Invariants

All response amounts use JSON strings for u64/i64 fields (precision-safe for JS consumers).
All response timestamps are unix seconds as integers (matches on-chain i64).
All hex fields are lowercase + unprefixed (matches existing stats.rs + api.rs convention).
No endpoint writes to Postgres or Redis state other than invalidation DEL + cache SET EX.
WS delivery is at-most-once; consumers that need guaranteed delivery must reconcile via REST on reconnect.
Anonymous callers never see augmented fields; authenticated fields are omitted on anonymous responses, not nulled.
Cursor is opaque to clients and parseable only by the current schema version; a schema change invalidates all outstanding cursors.
Event-driven cache invalidation fires strictly after the corresponding program_events row commits (publisher ordering guarantee from cycle-55 reorg / cycle-57 pubsub).
Reads never touch RPC; chain-truth round-trips stay in the SDK.
Rate-limit buckets are per-endpoint-class, not per-path; a misbehaving caller on /agents doesn't starve /tasks.
WS authentication downgrade (anonymous subs allowed on agents + capabilities only) applies at subscribe-time; an auth-expiring socket mid-stream is disconnected, not silently downgraded.
Materialized views are the search substrate; program_events is the source of truth — a view-rebuild from scratch is always possible.

Done checklist

Spec reviewed by solana-indexer-engineer + frontend-engineer + maintainer.
Migration 2026-04-17-000005_discovery_views lands: agent_directory + task_directory materialized views + indexes + refresh job in jobs/.
services/indexer/src/api/discovery.rs implements the 11 REST endpoints against the views + program_events.
services/indexer/src/api/discovery_ws.rs implements the 4 WS channels with Redis pubsub bridge.
Rate limiter lands (Rust port of IACP's KeyedRateLimiter or tower_governor per Open-Q #4).
Prometheus metrics registered + scraped on /metrics.
@saep/sdk/discovery submodule generated, useAgentSearch + useTaskSearch + useAgentDetail landed in @saep/sdk-ui.
Portal /marketplace migrated from useAllAgents to useAgentSearch.
Portal /agents/[did] migrated from client fetchAgentByDid + fetchTasksByAgent to useAgentDetail + useAgentTasks.
Portal /tasks list migrated (frontend-engineer to build — not in scope for M1 marketplace-first landing).
Vitest unit coverage: handlers 90%, SQL builders 100%.
Integration test harness: hits real Postgres + Redis + mock Publisher; covers 12 endpoint × auth combinations.
/v1/discovery/ws smoke test against localnet — subscribe to tasks channel, create task on-chain via SDK, receive event within 2s.
Postgres EXPLAIN ANALYZE snapshots filed at reports/discovery-query-plans.md for every indexed query path.
Rate-limit chaos test: 1000 concurrent connections, 16 subs each — verify fair-queueing + memory ceiling.
Cache hit-rate reports weekly at reports/discovery-cache-health.md once prod traffic lands.
Discovery API documented at apps/docs/v1-discovery/ (auto-generated from zod schemas + handwritten WS protocol walkthrough).
Neodyme M2 audit scope updated to include Discovery endpoints (cross-tenant leakage + rate-limit bypass threat classes) — cross-ref docs/audit/neodyme-m2.md once that scoping doc lands.