Spec — Discovery API
Owner: solana-indexer-engineer + frontend-engineer (consumer)
Depends on: specs/indexer-schema.md, specs/08-iacp-bus.md, specs/03-program-agent-registry.md, specs/07-program-task-market.md, specs/04-program-treasury-standard.md
Blocks: portal /marketplace (server-side search), portal /tasks list view, portal /agents/[did] history + radar, analytics live wiring beyond the 7 stats::* roll-ups, A2A marketplace panel (M2), SDK @saep/sdk/discovery client
References: backend PDF §3 (off-chain infra), frontend PDF §2.3 (Marketplace), §2.5 (Tasks), §3.1 (SIWS), §3.4 (optimistic UI), services/indexer/src/{api,stats,pubsub}.rs
Goal
Replace browser-originated getProgramAccounts memcmp scans (current pattern across useAllAgents, useAgentTasks, useTasksByCreator) with a server-side REST + WS surface hitting the indexer's Postgres schema. Two effects: (1) shifts the read-heavy filter+sort path off mainnet RPC, where 5 memcmp scans per marketplace load is a cost + latency tax that does not scale past a few hundred agents, and (2) gives realtime consumers a single bearer-auth'd WS channel for agent / task / treasury state transitions without standing up a per-page Yellowstone subscription.
Non-goals
- Not a write surface. All mutations stay SDK → RPC → chain. Discovery is read-only.
- Not the analytics API.
services/indexer/src/stats.rsalready owns/stats/*— 7 fixed-shape aggregate endpoints for the public analytics page. Discovery adds row-level search + subscription; aggregate roll-ups stay there. - Not the IACP bus. IACP (
specs/08-iacp-bus.md) moves signed agent-to-agent envelopes; Discovery moves indexed chain state. Overlap is zero — different auth model, different payload shape, different delivery guarantees. - Not a Yellowstone gRPC passthrough. Consumers that need raw chain events (validators, audit tooling) should subscribe to Yellowstone directly via Helius dedicated. Discovery WS delivers decoded, filtered, schema-stable events.
- Not a historical backfill tool. Initial implementation serves live + recent (≤30d default) state; deeper history is Open-Q #8.
Service placement
Discovery lives as a standalone TS service at services/discovery/ (fastify 5 + @fastify/websocket + pg + zod) serving /v1/discovery/*. The indexer binary retains a legacy /api/* alias in services/indexer/src/discovery.rs (Rust shim over the same matviews) until portal callers migrate off useAgentReputation et al., at which point the shim retires. Both paths read the same Postgres matviews (agent_directory / task_directory / reputation_rollup); no schema divergence.
The original M1 plan (cycle 106) was to land discovery inside the indexer binary as services/indexer/src/api/discovery.rs on the stats::router() pattern — rationale was shared PG + Redis pools, one deploy target, no connection-pool doubling. The standalone-TS path landed instead (public 47e29c9) to let the read path evolve on the TS/zod/fastify stack the rest of the web tier already uses; the editorial drift is tracked in INBOX.md. Matview access works identically over a separate PG connection pool at M1 scale; re-fold into the indexer binary is a file-move + main.rs edit, not a schema migration, if the connection-pool footprint becomes load-bearing at M3.
Surface — REST
Base path: /v1/discovery. All responses JSON. Errors follow {error, detail?, request_id} with request_id correlating to a saep_indexer_request_id log field.
Agents
| Method + path | Purpose | Query params | Response |
|---|---|---|---|
GET /agents |
Filtered + ranked agent list | capability_mask (hex u64, required unless operator supplied), min_reputation (u16 composite 0-10000), max_base_price (u64 lamports), status (active/slashed/paused, default active), operator (base58, mutually exclusive w/ capability_mask), cursor (opaque), limit (default 50, max 200), sort (reputation_desc/price_asc/recent_desc, default reputation_desc) |
{items: AgentSummary[], cursor: string?, total: number?} |
GET /agents/:did |
Single agent detail | — | AgentDetail (full ReputationScore + TreasurySummary + last 10 task ids) |
GET /agents/:did/tasks |
Tasks for agent, paginated | status (filter by TaskStatus enum), cursor, limit |
{items: TaskSummary[], cursor?} |
GET /agents/:did/streams |
Active payment streams | status (open/closed, default open) |
{items: StreamSummary[]} |
GET /agents/:did/reputation |
Full reputation time series | bucket (day/week, default day), days (max 90) |
{series: ReputationPoint[]} |
Shape types below. total is returned only when a WHERE clause narrows the result set below a cheap-count threshold (~10k rows per Postgres EXPLAIN budget); for wider queries it is null + the UI shows "50+".
Tasks
| Method + path | Purpose | Query params | Response |
|---|---|---|---|
GET /tasks |
Filtered task list | status (comma-sep), creator (base58), agent_did (hex-32), capability_mask (hex), created_after (RFC3339), created_before (RFC3339), cursor, limit, sort (created_desc/deadline_asc/reward_desc, default created_desc) |
{items: TaskSummary[], cursor?} |
GET /tasks/:task_id_hex |
Single task detail | — | TaskDetail (full TaskContract + current escrow balance + proof verify status) |
GET /tasks/:task_id_hex/timeline |
State transitions | — | {events: TaskEvent[]} (parsed from program_events rows with event_name ∈ {TaskCreated, BidBookOpened, BidCommitted, BidRevealed, BidBookClosed, ResultSubmitted, TaskVerified, VerificationFailed, TaskReleased, DisputeRaised, TaskExpired, TaskCancelled}) |
Capabilities
| Method + path | Purpose | Response |
|---|---|---|
GET /capabilities |
All approved tags from CapabilityRegistry |
{items: CapabilityTag[]} (cached 300s) |
GET /capabilities/:bit |
One tag + manifest URI | CapabilityTag |
Treasury
| Method + path | Purpose | Query params | Response |
|---|---|---|---|
GET /treasury/:did |
Treasury overview | — | {vaults: VaultBalance[], allowed_mints: AllowedMint[], spend_window: SpendWindow} |
GET /treasury/:did/vaults |
Per-mint vault balances | mint (optional filter) |
{items: VaultBalance[]} |
Shape types
interface AgentSummary {
did_hex: string; // 32-byte hex
operator: string; // base58 pubkey
capability_mask: string; // hex u64
reputation_composite: number;// 0-10000 bps
base_price_lamports: string; // u64 as string (JSON-safe)
status: "active" | "slashed" | "paused";
manifest_uri: string | null;
last_active_unix: number;
}
interface AgentDetail extends AgentSummary {
reputation: {
quality: number; timeliness: number; availability: number;
cost_efficiency: number; honesty: number;
};
jobs_completed: number;
jobs_disputed: number;
stake_lamports: string;
treasury_summary: { allowed_mints: number; active_streams: number };
recent_task_ids_hex: string[]; // up to 10
}
interface TaskSummary {
task_id_hex: string;
creator: string;
agent_did_hex: string | null; // null until accepted
status: TaskStatus;
reward_lamports: string;
capability_mask: string;
created_at_unix: number;
deadline_unix: number;
}
interface TaskDetail extends TaskSummary {
payload_hash_hex: string;
result_hash_hex: string | null;
proof_verified: boolean | null; // null until proof_verifier sees it
escrow_balance_lamports: string;
dispute_opened: boolean;
}
interface StreamSummary {
stream_id_hex: string; from: string; to: string; mint: string;
rate_per_sec: string; start_unix: number; end_unix: number | null;
total_paid: string; status: "open" | "closed";
}
interface CapabilityTag {
bit: number; slug: string; manifest_uri: string; approved: boolean;
}
interface VaultBalance { mint: string; balance: string; decimals: number; }
interface AllowedMint { mint: string; per_tx_cap_base_units: string; }
interface SpendWindow { spent_24h_base_units: string; limit_24h_base_units: string; }
All u64/i64 returned as JSON strings (per borsh_decode contract — JS loses precision at 2^53). Timestamps are unix seconds (int), not RFC3339, to match on-chain i64 timestamps in program_events.data. Hex fields are lowercase, unprefixed.
Surface — WebSocket
Base path: /v1/discovery/ws. One socket per client. Subscription-scoped — client sends {op: "subscribe", channel, filter?}, server confirms {op: "subscribed", id}, then streams {op: "event", id, data} frames. Server-initiated {op: "keepalive", unix} every 20s; client drops if two are missed.
Channels
| Channel | Events | Filter schema |
|---|---|---|
agents |
AgentRegistered, ManifestUpdated, StakeIncreased, SlashExecuted, SlashCancelled, StatusChanged |
{did_hex?, operator?, capability_mask?} |
tasks |
TaskCreated, BidBookOpened, BidRevealed, BidBookClosed, ResultSubmitted, TaskVerified, VerificationFailed, TaskReleased, DisputeRaised, TaskExpired, TaskCancelled |
{task_id_hex?, client?, agent_did_hex?, status?} |
treasury |
VaultInitialized, DepositReceived, WithdrawExecuted, SwapExecuted, StreamInitialized, StreamClosed, LimitUpdated |
{did_hex?, mint?} |
proofs |
ProofVerified, VerificationFailed, VkActivated |
{task_id_hex?, vk_id?} |
Filters are ANDed within a subscription; multiple subscriptions can coexist on one socket. Max 16 concurrent subs per socket; 17th is rejected with {op: "error", reason: "sub_limit"}. Rate-limited to 256 subscribe ops / connection-lifetime to prevent subscribe-flood churn.
Delivery model
WS events mirror Redis Streams entries written by pubsub::Publisher (services/indexer/src/pubsub.rs, cycle ref). Publisher already fires saep:events:<program> + saep:events:all on every successful record_event. Discovery's WS handler maintains one shared redis::aio::ConnectionManager per instance, subscribes to the four logical channels above (mapped to a union of program channels + event-name filter), and fans out to bound sockets matching each sub's filter. At-most-once delivery semantics — no replay, no durable cursor. Consumers that need backfill must do an initial GET /agents/... or GET /tasks/... paginated seed before subscribing (same pattern as useAllAgents + useAgentStream composition in sdk-ui).
Trade-off: a durable cursor (?since_id=<redis_stream_id>) would handle WS disconnect + reconnect gaps. Deferred to Open-Q #5 — the common failure mode (browser tab sleep) is better served by "re-fetch on reconnect" which is already the Tanstack Query default.
Auth model
- Public REST + WS reads: no auth required for agent / task / capability / treasury-balance queries. All on-chain events are public by observation; the API merely shapes them. CORS:
*on/v1/discovery/*GETs. - Authenticated augmented fields: when a SIWS session cookie is present (same
saep_sessioncookie minted by portal/api/auth/verify, sharedverifySessionJwtfrom@saep/sdk/auth/session), responses include owner-scoped fields:TreasuryDetail.spend_window,AgentDetail.stake_pending_withdrawal, per-streamunclaimed_amount. Without a cookie those fields are omitted (not nulled — omission makes cache-keying cleaner). - Rate-limit scope differs by auth: anonymous callers are bucketed per source-IP via
X-Forwarded-For(100 req/min per endpoint class); authenticated callers are bucketed persubclaim (500 req/min). Implementation reusesiacp'sKeyedRateLimiterpattern (services/iacp/src/rate_limit.ts, ported to Rust asservices/indexer/src/rate_limit.rs— Open-Q #4). - WS auth: bearer token via query string
?token=<ws-token>minted by portal/api/auth/ws-token(5-min TTL, HS256, issuersaep.portalper cycle-62 shared verifier). Same verifier IACP uses — one session-JWT scheme across the platform. Unauthenticated WS connections accepted but restricted to theagents+capabilitieschannels with public fields only (notreasury/proofssubs, no augmented fields).
Caching — Redis TTLs
redis::aio::ConnectionManager singleton in ApiState, keyed disc:<endpoint>:<hash(query)>.
| Endpoint | TTL | Invalidation |
|---|---|---|
GET /agents (filtered list) |
30s | Event-driven invalidation on AgentRegistered / StatusChanged / ManifestUpdated (publisher subscribes to saep:events:agent_registry and DELs prefix disc:agents:*). |
GET /agents/:did |
60s | Same event-driven invalidation, narrower key (disc:agents:did:<did>). |
GET /agents/:did/tasks |
15s | disc:agents:tasks:<did> invalidated on TaskCreated / BidBookClosed / ResultSubmitted / TaskReleased / DisputeRaised / TaskExpired / TaskCancelled with agent_did == <did>. |
GET /tasks (filtered list) |
10s | Too high cardinality for event-driven invalidation; short TTL + stampede protection (single-flight via redis SET NX lock, 2s timeout). |
GET /tasks/:task_id_hex |
30s | Event-driven on any task-state event for that task_id. |
GET /tasks/:task_id_hex/timeline |
60s | Append-only underlying data; TTL is the only invalidator (events append but never rewrite). |
GET /capabilities |
300s | Event-driven on CapabilityTagProposed / CapabilityTagApproved / CapabilityTagRetired. |
GET /treasury/:did |
20s | Event-driven on treasury_standard events for that did. |
| All WS responses | n/a | Redis-backed pubsub is the delivery substrate, not cached. |
Cache-miss path: DB query → format → SET EX <ttl> → return. Cache-hit serves directly from Redis. X-Cache: HIT|MISS header on all REST responses.
Pagination
Opaque cursor — base64-encoded {last_id, last_sort_value} tuple scoped to the sort key. Cursor-based rather than offset-based to survive insertions during pagination (offset-based would double-serve rows when new events land mid-scroll). Max 200 items per page (prevents memory spikes on denormalized joins). Cursor TTL unbounded on the wire but server re-parses every request — an old cursor against a post-migration schema returns 400 invalid_cursor.
Query implementation
Reads hit these tables from specs/indexer-schema.md:
program_events(primary source of truth for all agent / task / treasury state — event-sourced)reputation_rollupmaterialized view (composite score + dims, refreshed 60s)category_reputation(per-capability-bit reputation — the five dims)
No new tables at M1. Two denormalized views live alongside reputation_rollup for search efficiency:
Materialized view — agent_directory
Refreshed every 60s alongside reputation_rollup. One row per agent, folding latest AgentRegistered + ManifestUpdated + StakeIncreased + SlashExecuted events.
| Column | Type | Source |
|---|---|---|
agent_did |
BYTEA (32) PRIMARY KEY |
event data->>'agent_did' |
operator |
TEXT (base58) |
latest AgentRegistered.operator |
capability_mask |
NUMERIC(20) (u64 as numeric) |
latest ManifestUpdated.capability_mask else AgentRegistered.capability_mask |
base_price_lamports |
NUMERIC(20) |
latest ManifestUpdated.base_price (M1: event lacks base_price; column omitted from matview until event payload extends — see migration 2026-04-17-000005_discovery_views) |
reputation_composite |
INT |
join reputation_rollup.composite_score (capability-weighted avg per Open-Q #2) |
status |
TEXT |
derived: slashed if any SlashExecuted (post-execution slash is terminal at M1; SlashCancelled only offsets pre-execution SlashProposed); paused if latest StatusChanged.new_status = 1; else active |
manifest_uri |
TEXT |
latest ManifestUpdated.manifest_uri (M1: event lacks manifest_uri; column is NULL::text stub until event payload extends) |
last_active_unix |
BIGINT |
max(slot-time) across any event with this did |
refreshed_at |
TIMESTAMPTZ |
now() at refresh |
Indexes: (capability_mask, reputation_composite DESC), (status, reputation_composite DESC), (operator).
Materialized view — task_directory
Same cadence. One row per task, folding latest state transition.
| Column | Type | Source |
|---|---|---|
task_id |
BYTEA (32) PRIMARY KEY |
data->>'task_id' |
creator |
TEXT |
TaskCreated.client |
agent_did |
BYTEA NULL |
TaskCreated.agent_did (IDL carries agent_did on create; no separate TaskAccepted event at M1 — acceptance is signalled by BidBookClosed.winner_agent) |
status |
TEXT |
derived from latest state event |
reward_lamports |
NUMERIC(20) |
TaskCreated.payment_amount |
capability_mask |
NUMERIC(20) |
TaskCreated.required_capabilities (M1: event lacks this field; column is NULL::numeric stub until event payload extends — (capability_mask, reward_lamports DESC) index searches over NULLs) |
created_at_unix |
BIGINT |
TaskCreated.timestamp |
deadline_unix |
BIGINT |
TaskCreated.deadline |
updated_at_unix |
BIGINT |
max slot-time across all events for this task |
Indexes: (status, created_at_unix DESC), (creator, created_at_unix DESC), (agent_did, created_at_unix DESC), (capability_mask, reward_lamports DESC).
Refresh contract: REFRESH MATERIALIZED VIEW CONCURRENTLY — requires unique index on PK, which is satisfied. Refresh takes ~200ms for 10k agents + 50k tasks per EXPLAIN budget; acceptable at M1 scale, reconsider at 100k+.
Rate limits + quotas
Anonymous: 100 req/min per IP per endpoint-class (5 classes: agents / tasks / treasury / capabilities / catch-all). Authenticated: 500 req/min per sub claim. WS: 10 subscribe ops / min per connection, 1024 max queue depth per sub (events dropped on backpressure with {op: "dropped", id, count} frame). Envelope: 64 KiB max WS frame.
Over-limit REST returns 429 with Retry-After header (int seconds). Over-limit WS returns {op: "error", reason: "rate_limit", retry_after_ms} and closes the offending sub (not the socket).
Metrics
New Prometheus surface:
saep_discovery_request_total{endpoint, status}— request count by endpoint class + HTTP statussaep_discovery_request_duration_seconds{endpoint}— histogram, 5ms..5s bucketssaep_discovery_cache_hits_total{endpoint}/saep_discovery_cache_misses_total{endpoint}saep_discovery_ws_connections— gaugesaep_discovery_ws_subscriptions{channel}— gauge per channelsaep_discovery_ws_events_sent_total{channel}saep_discovery_ws_events_dropped_total{channel,reason}—rate_limit/queue_full/auth_downgradesaep_discovery_rate_limited_total{scope, endpoint}—scope∈ip/sub/wssaep_discovery_db_query_duration_seconds{query}— histogram per named query
All metrics share the existing /metrics endpoint via the indexer's prom-client registry.
Error taxonomy
| Code | HTTP | Meaning |
|---|---|---|
invalid_param |
400 | Query param failed schema parse (hex length, enum value, range) |
invalid_cursor |
400 | Cursor from older schema or malformed base64 |
unauthorized |
401 | SIWS cookie required but missing / expired — only raised when authenticated-only field is explicitly requested (anonymous callers just get reduced response) |
forbidden |
403 | IACP agent_status != active for agent-scoped endpoints where the caller is subject-agent |
not_found |
404 | DID / task_id / capability_bit not in directory |
rate_limit |
429 | Rate-limit bucket empty; Retry-After set |
cache_unavailable |
503 | Redis unreachable; sets Retry-After: 2 |
internal |
500 | Unhandled — logged with request_id |
Security checks (§5.1 mapping)
- No escalation of read scope via crafted queries. All filters are parameterized; no string concatenation into SQL.
did_hex/task_id_hex/operatorparsed to fixed-length bytes before query binding. - No cross-tenant leakage via session confusion. SIWS session cookie is
HttpOnly, Secure, SameSite=Strict; the augmented-field path readssubclaim only, never mirrors request-body-supplied identifiers into auth scope. - No unbounded memory on WS. Per-sub queue cap 1024, per-connection total 8192; overflow drops events with metric + frame, never grows heap.
- No cache-poisoning via cursor. Cursor is opaque server-side; we re-parse + re-validate on every request; a hostile cursor can only select arbitrary rows within the public result set, not pivot to authenticated rows.
- No RPC leak. Indexer holds Helius dedicated RPC creds; Discovery is a read over Postgres + Redis and never proxies RPC. Browser still talks to Helius via the Vercel Edge RPC proxy (frontend PDF §3.2) for chain-direct calls.
- No PII in logs.
request_id+ endpoint + status only. IP is hashed before logging per the currentaxum-extra::middleware::request_idconvention; SIWSsubis logged as the first 8 chars only. - No bypass of on-chain truth. Discovery is a read cache; consumers that act on state (hire agent, raise dispute) must re-verify on-chain via SDK before signing. Optimistic UI (
useSendTransactionfrom sdk-ui) already does asimulateTransactionpreflight against the user's RPC which catches stale Discovery reads. - No circular auth via IACP. IACP's
agent_statuscheck CPIs into agent_registry directly; Discovery'sagents/:didreads cachedagent_directory.statuswhich can lag up to 60s. IACP must continue doing its own on-chain check — Discovery's cached status is UX-only (greys out UI buttons), never a security boundary.
DOS surface
- Hot-key Redis: a viral agent's
/agents/:didcan burn through the 60s TTL. Mitigation: RedisSET NXsingle-flight lock (same pattern as the/taskslist). First request populates, concurrent requests wait 2s or fall through to DB. - Cursor exhaustion: cursors don't persist server state, but deep scan cursors let a client walk the full
program_eventstable in pages. Mitigation: 200-item cap per page + composite-index-only query plans (verified viaEXPLAIN ANALYZEat authoring time). - WS subscribe storm: a malicious socket subscribes 16 times and disconnects + reconnects. Mitigation: per-IP connection cap (32 concurrent), per-IP subscribe-op rate limit (100/min across all connections from that IP).
- Publisher gap: Redis pubsub delivery is at-most-once. A dropped publish means a WS consumer misses an event. Mitigation: every WS consumer does a
GETseed on subscribe + each/agents/:did//tasks/:task_idresponse carriesetag(last-event-id) for optimistic consistency.
Devnet bring-up
- Deploys alongside the indexer. No separate DNS / ingress / TLS — routed under the indexer's existing Render public URL at
/v1/discovery/*. - No keys, no multisig, no on-chain authority. Provisioning is config-only:
DISCOVERY_ENABLED=true, reusesDATABASE_URL,REDIS_URL,SESSION_SECRETfrom indexer env. - Feature-flag gate at
main.rs::build_router— unset env disables route registration; health probe unaffected. - First-boot behavior: if
agent_directory/task_directorymaterialized views are missing (pre-migration state), endpoints return503 cache_unavailableuntil the next 60s refresh. Migration adds the views as part of2026-04-17-000005_discovery_views(new, lands with the implementation).
CPI / external surface
None. Discovery reads Postgres + Redis; writes are event-driven invalidation DELs against Redis. No Solana RPC calls from the Discovery code path — all chain reads are mediated by the indexer's existing poller + reorg watcher.
CU budget
N/A — off-chain. Per-endpoint latency budgets:
- cache hit: <20ms p99
- cache miss (indexed): <80ms p99
- cache miss (full scan, cursor deep-walk): <400ms p99 (hard timeout 2s returns
503) - WS event fan-out: <50ms p99 from publisher → socket
Open questions — for reviewer (maintainer + OtterSec + frontend-engineer)
- Composite-reputation weighting.
reputation_rollup.composite_scoreis currently a flat average across the five dims. Foragent_directory.reputation_composite, should we re-weight per capability (e.g. quality matters more forcode-generation, timeliness fororacle-query)? Default: flat average for M1; per-capability weights as a post-M1 tuning pass with governance control at M2. totalrow-count policy. Exact-count viaCOUNT(*)on wide filters can touch >100k rows. Alternatives: (a) returnnullabove threshold (current spec); (b) return capped count ("100+"); (c) return estimate viapg_class.reltuples. Default: (a) — cleanest to render client-side.- WS durable cursor. Should WS support
?since_id=<redis_stream_id>on subscribe to replay missed events across disconnect? Default: no at M1 (relies on refetch-on-reconnect); revisit if tab-sleep gap becomes visible. - Rate-limiter Rust port. IACP's
KeyedRateLimiteris TypeScript (services/iacp/src/rate_limit.ts). Port to Rust, or usetower_governorcrate? Default:tower_governorfor the HTTP surface, hand-rolled per-sub limiter for WS (different shape — bandwidth + op-count axes). - Cache invalidation vs event stampede. Mass-invalidation (e.g.
DEL disc:agents:*on anyAgentRegistered) wipes the filtered-list cache ~1x/s during heavy registration periods. Alternative: per-filter-hash invalidation via reverse index. Default: bluntDEL *at M1; optimize if cache hit rate drops below 60% in prod. - Materialized view refresh cadence. 60s matches
reputation_rollup. Fortask_directory, state changes per-task are user-visible and 60s is sluggish. Options: (a) lower to 15s for both (doubles the REFRESH cost); (b) maketask_directoryevent-driven (trigger-based incremental update); (c) keep 60s + rely on WS for immediate feedback. Default: (c) for M1 — WS already handles the UX latency case. /agents/:did/reputationtime-series depth. 90d cap matches analytics endpoints but agent onboarding will want longer views. Default: 90d for M1, extend to 365d + archive older at M3 when we know storage pressure.- Historical backfill. Indexer currently retains all
program_events; no pruning at M1. Discovery returns the full history throughtask_directory/ cursor pagination. At what scale does this stop being free? Default: flag at 10M events (per-month growth at expected M1 traffic); revisit then. - SDK client shape. Portal consumes Discovery via a new
@saep/sdk/discoverysubmodule. Shape mirrors REST endpoints as typed functions + sdk-ui hooks (useAgentSearch,useTaskSearch,useAgentDetail). Auth inheritance: same SIWS cookie flow portal already has. Open: zod runtime schema per response type, or TS-only? Default: zod, matches existing SDK accounts-module pattern. - A2A cross-agent discovery at M2. Agent-to-agent hiring (agent hires sub-agent) will need a programmatic discovery channel — same endpoints with agent wallet SIWS, or a dedicated
/v1/discovery/a2awith IACP-envelope auth? Default: reuse endpoints + session, add A2A-specific filters (e.g.requires_reputation_attestation) at M2; a dedicated path only if auth model diverges meaningfully. - Redis connection sharing with pubsub Publisher. Publisher (
services/indexer/src/pubsub.rs) already holds aredis::aio::ConnectionManager. Discovery should share it or hold its own? Default: share — singleArc<ConnectionManager>inApiState, half the connection count to Redis, same reconnect semantics.
Invariants
- All response amounts use JSON strings for
u64/i64fields (precision-safe for JS consumers). - All response timestamps are unix seconds as integers (matches on-chain
i64). - All hex fields are lowercase + unprefixed (matches existing
stats.rs+api.rsconvention). - No endpoint writes to Postgres or Redis state other than invalidation
DEL+ cacheSET EX. - WS delivery is at-most-once; consumers that need guaranteed delivery must reconcile via REST on reconnect.
- Anonymous callers never see augmented fields; authenticated fields are omitted on anonymous responses, not nulled.
- Cursor is opaque to clients and parseable only by the current schema version; a schema change invalidates all outstanding cursors.
- Event-driven cache invalidation fires strictly after the corresponding
program_eventsrow commits (publisher ordering guarantee from cycle-55 reorg / cycle-57 pubsub). - Reads never touch RPC; chain-truth round-trips stay in the SDK.
- Rate-limit buckets are per-endpoint-class, not per-path; a misbehaving caller on
/agentsdoesn't starve/tasks. - WS authentication downgrade (anonymous subs allowed on
agents+capabilitiesonly) applies at subscribe-time; an auth-expiring socket mid-stream is disconnected, not silently downgraded. - Materialized views are the search substrate;
program_eventsis the source of truth — a view-rebuild from scratch is always possible.
Done checklist
- Spec reviewed by solana-indexer-engineer + frontend-engineer + maintainer.
- Migration
2026-04-17-000005_discovery_viewslands:agent_directory+task_directorymaterialized views + indexes + refresh job injobs/. -
services/indexer/src/api/discovery.rsimplements the 11 REST endpoints against the views +program_events. -
services/indexer/src/api/discovery_ws.rsimplements the 4 WS channels with Redis pubsub bridge. - Rate limiter lands (Rust port of IACP's
KeyedRateLimiterortower_governorper Open-Q #4). - Prometheus metrics registered + scraped on
/metrics. -
@saep/sdk/discoverysubmodule generated,useAgentSearch+useTaskSearch+useAgentDetaillanded in@saep/sdk-ui. - Portal
/marketplacemigrated fromuseAllAgentstouseAgentSearch. - Portal
/agents/[did]migrated from clientfetchAgentByDid+fetchTasksByAgenttouseAgentDetail+useAgentTasks. - Portal
/taskslist migrated (frontend-engineer to build — not in scope for M1 marketplace-first landing). - Vitest unit coverage: handlers 90%, SQL builders 100%.
- Integration test harness: hits real Postgres + Redis + mock Publisher; covers 12 endpoint × auth combinations.
-
/v1/discovery/wssmoke test against localnet — subscribe totaskschannel, create task on-chain via SDK, receive event within 2s. - Postgres
EXPLAIN ANALYZEsnapshots filed atreports/discovery-query-plans.mdfor every indexed query path. - Rate-limit chaos test: 1000 concurrent connections, 16 subs each — verify fair-queueing + memory ceiling.
- Cache hit-rate reports weekly at
reports/discovery-cache-health.mdonce prod traffic lands. - Discovery API documented at
apps/docs/v1-discovery/(auto-generated from zod schemas + handwritten WS protocol walkthrough). - Neodyme M2 audit scope updated to include Discovery endpoints (cross-tenant leakage + rate-limit bypass threat classes) — cross-ref
docs/audit/neodyme-m2.mdonce that scoping doc lands.