Post-launch optimization tickets
Parent: backend-build.pdf §6.1 ("Post-launch optimization").
Scope: performance / cost / operational work that is not a pre-audit blocker but should land on the M2→M3 ramp, once M1 is live on devnet and observability is pinning real numbers against the scaffold assumptions.
These tickets stay out of the M1 critical path because each one is either (a) premature without production-shaped traffic, (b) requires a feature not yet on mainnet (e.g. ZK compression with active indexers, SIMD-0228 scheduled tx), or (c) a tuning exercise that needs live CU/RPC/throughput telemetry to target correctly. The common thread: we can do the scaffolding work safely, but the parameters are unknowable until the protocol carries real flow.
Ordering is by unblock-sequence: CU tuning first (feeds every other cost model), ALT caching next (tx-size sibling of CU), Light Protocol compression third (account-count reduction — deepest change), Redis Streams backpressure last (off-chain only, independent track).
1. CU auto-tuning
Why post-launch: M1 ships with cu-measurements.md carrying per-instruction budgets from anchor test localnet runs + a ±30% headroom. Real mainnet CU varies with account-data size, bumpmap collisions, Jupiter route depth, and warm/cold PDA touch cost — none of which the localnet harness can reproduce faithfully. Locking budgets pre-freeze would either over-pay (wasted fee) or under-pay (prod tx rejection).
Target: ship a settlement-worker path that observes actual CU per instruction class over a rolling window and sets ComputeBudgetProgram::set_compute_unit_limit per tx to p99 × 1.1.
Pieces:
- Telemetry. Indexer already exposes
saep_indexer_poll_cycle_duration_seconds{program}(cycle 58). Addsaep_tx_cu_consumed{program,ix}sampled fromgetTransaction(...).meta.computeUnitsConsumedfor every settled tx the worker submitted. Histogram buckets:10k, 50k, 100k, 200k, 400k, 800k, 1.4M(Solana's per-tx cap). - Rolling window store. Redis sorted set per
(program, ix), last 1000 samples, eviction by insertion-index. Reads are lazy at build-tx time. - Tx builder integration.
services/indexer/src/settlement_worker/tx_builder.rs(already exists per cycle-92 git log) grows aCuEstimatorfield. Per-ixset_compute_unit_limitix prepended fromestimator.p99(program, ix) * 1.1. Floor: currentcu-measurements.mdbudget (never go below the audit-measured number). Cap: 1.4M (Solana limit). - Priority-fee pairing. Already handled by
@saep/sdk/submit/staked.ts::withPriorityFee(cycle 66). CU-tuning changesset_compute_unit_limit, notset_compute_unit_price— they compose cleanly. - Cold-start fallback. Fewer than 50 samples in window → use
cu-measurements.mdstatic budget. No autotune amplification from a tiny sample set. - Governance.
cu_autotune_enabled: boolin settlement-worker config, default off until telemetry accrues. Flipping to on is ops action, not governance vote — it's a tx-construction hint, not an on-chain parameter.
Not in scope: on-chain CU-aware routing (e.g. switching between two Jupiter routes based on estimated CU). That's a worker-side optimization layered on top once the primitive works.
Acceptance:
- Settlement worker emits tuned CU budgets within ±15% of the actually-consumed value for 90%+ of settled txs over a 7-day window.
- Fee spend on
set_compute_unit_limitoverhead (one ix per tx) is bounded by per-sig base fee × bundle size; measure and report.
2. Address Lookup Table caching
Why post-launch: ALTs reduce tx size for multi-account txs (settlement bundles touching escrow + vaults + Jupiter route accounts can hit 30+ account refs). M1 ships without ALTs because the bundle composition is still settling — ALT contents shouldn't drift mid-audit. Once the stable set of frequently-co-accessed accounts is known, ALTs cut tx-size pressure and unblock bigger bundles.
Target: settlement worker reads from a small set of ALTs (5-10) covering the hot accounts; ALTs refresh weekly without worker restart.
Pieces:
- ALT inventory. One ALT per "composition class":
settlement-core(fee_collector vaults, market_global, proof_verifier config),jupiter-hot(Jupiter v6 route hotspots per the tip-oracle's top-10-routes telemetry),token2022-hot(SAEP mint + common operator ATAs),iacp-anchor(SPL-Memo program + IACP anchor PDAs). - ALT authority. Dedicated PDA per ALT, derived from
fee_collector's authority (Squads 4-of-7) so ALT mutation is gated on the same quorum as fee-param changes. Address lookups are read-only at use-time; auth only matters for extend/freeze. - Refresh job. Daily cron pulls current account-id list from the tip oracle + indexer stats, diffs against the committed ALT, emits
extend_lookup_tableix for new entries. No delete at M2 — ALT entry removal requires ALT rotation (new table, not edit); defer to M3 unless churn is high. - Worker integration.
tx_builder.rsgainsalt_resolver: Arc<AltCache>. Cache loads all ALTs at worker start, refreshes on a 15-minute tick (cheap RPC; ALT account is read-only-ish).build_versioned_tx()usesMessageV0::try_compile(payer, ixs, &alts, blockhash). - Fallback. ALT load failure → legacy-tx fallback, with a warning log. Never block settlement on ALT unavailability.
- SDK side.
@saep/sdkexposesgetKnownAlts(): Pubkey[]for frontend paths that want tx-size savings (e.g. quick-hire bundle in/marketplace). Frontend opt-in, not required.
Not in scope: dynamic ALT creation per user / per agent. Account count at M1 is too low for that to pay. If needed at scale, layer on top.
Acceptance:
- Settlement-worker bundles that would have exceeded the legacy 1232-byte tx limit now fit, measured pre/post switch.
- ALT entries cover ≥95% of account refs observed on settled txs over 7 days (miss-rate bounded).
3. Light Protocol badge compression
Why post-launch: Reputation badges + capability-proof receipts are account-per-agent data that scales linearly with agent count. At M1 each AgentAccount is ~400 bytes; at 100k agents the on-chain rent footprint is ~0.6 SOL/agent/epoch = hundreds of SOL in rent reserves. Light Protocol's ZK compression reduces the on-chain cost to a merkle-root commitment with off-chain state + ZK proof on access. Not a solve for the hot path (active agent PDAs stay uncompressed); compression is for the long tail — badges, historical reputation snapshots, retired agent accounts.
Target: compressed_reputation side-table per agent, reconstructible from the Light indexer, with on-chain merkle root in RegistryConfig. Read path: SDK pulls proof + leaf from Light's photon indexer, verifies against on-chain root.
Pieces:
- Schema decision. Which fields compress: historical reputation snapshots (6-axis scores per epoch), task-completion badges (capability bit × completion-count), retired agent archive. Active ReputationScore PDA stays uncompressed — it's the hot-path.
- Light Protocol integration.
programs/agent_registrygains acompress_historicalix that reads the last N epochs of ReputationScore, packs them into a Light concurrent-merkle-tree leaf via Light'saccount-compressionprogram CPI, and zeros the on-chain PDA (refundable rent → fee_collector's treasury bucket). CPI target:SysvarC1ock11111111111111111111111111111111+ Light's compression program ID (pinned inglobal.light_compression). - Indexer. Subscribe to Light's photon gRPC in addition to Yellowstone. Decompressed leaves land in a separate Postgres table
compressed_reputationwith the same shape asreputation_rollup+merkle_leaf_index+tree_idcolumns. - SDK read path.
@saep/sdk/reputation/fetchHistory(did, epochRange)branches: if epoch is within the last 12 epochs, read the uncompressed PDA; older, fetch from photon + verify proof againstRegistryConfig.historical_root. - Audit footprint. Adds a new cryptographic dependency (Light's merkle program + their prover). Halborn (M3 audit) needs to see the integration; scoping happens when Halborn engagement lands.
Not in scope at M2: task-market-side compression (tasks are short-lived; PDA close already recovers rent). Governance proposal archiving (separate concern). M3+ may revisit.
Open question: Light Protocol's programs aren't currently on devnet in a maintained form (confirmed via light-protocol/light-protocol repo activity — all of their deployments target mainnet-beta). Devnet rehearsal may require self-deploying Light's programs against a localnet, which is a multi-day task of its own. Flagged for reviewer — could defer to M3+ if the devnet path is too expensive.
Acceptance:
- On-chain rent for reputation history drops to ≤10% of the uncompressed equivalent at 100k simulated agents.
- SDK read-path p99 latency for historical reputation ≤200ms (photon + verify).
4. Redis Streams backpressure + consumer ACK
Why post-launch: IACP bus ships with Redis Streams (cycle 65 rate-limit + metrics) but the consumer-ack story is minimal — messages are published fire-and-forget, consumers read from $ with no durable cursor, and there's no explicit backpressure path if the indexer's anchor-worker pool (cycle 64) falls behind. At M1 volume this is fine. At 100+ messages/sec sustained, publisher-side buffering + consumer-side lag become real.
Target: every Stream has a registered consumer group, ACKed reads, pending-entries-list monitoring, and a bounded-queue policy that either blocks publishers or drops old entries per-topic.
Pieces:
- Consumer groups. Every Stream gets a named group at publisher boot:
saep:events:<program>→ groupindexer-ingest;saep:task.*→ groupiacp-anchor-worker. Groups created idempotently (XGROUP CREATE with MKSTREAM). - XREADGROUP instead of XREAD. Consumers read with
XREADGROUP GROUP <name> <consumer-id> ... >so each message is delivered once per group. Consumer-id ={hostname}-{pid}-{service-version}— stable across restarts for the same deploy, unique across concurrent workers. - XACK on success. After the consumer finishes processing (e.g.
record_eventreturns OK for the indexer), XACK the message. Pending entries list (PEL) holds unacked messages for the inspection path. - PEL monitoring.
XPENDING GROUP <name>on a 30-second tick → new metriciacp_stream_pel_depth{stream,group}. This was flagged "deferred" in cycle 65's rate-limiter landing; ship it here. - Lag metric.
iacp_stream_lag_seconds{topic}— already flagged deferred in cycle 65. Computesnow - earliest_pel_timestamp. Grafana alert at 30s lag for task.* topics (settlement latency SLO). - Claim-stale-pending.
XCLAIMon entries pending > 5min → rebalance to a healthy consumer. Runs from the same cron tick as PEL monitoring. - MAXLEN bounds. Every XADD uses
XADD <stream> MAXLEN ~ <N>with per-streamN:saep:events:*= 100k (indexer ingest is steady-state),task.*= 10k (bounded by settlement velocity),iacp.anchor.*= 100k (anchor queue is slower than ingest). Approximate trim (~) for perf; exact trim only at shutdown. - Publisher backpressure. When a publisher's target stream's MAXLEN is already reached AND the consumer lag metric exceeds threshold, new messages get a
503 BusCongestedresponse (REST) /{type: "congested"}WS-close (WS). Contrast with MAXLEN-only which silently drops old messages — backpressure surfaces the congestion to the caller so agent-side retry logic engages. - Replay. Consumer groups enable replay: a new consumer joining
indexer-ingestreads PEL first (unacked messages), then catches up from the group's delivery cursor. Handles indexer restart cleanly without reprocessing acked entries.
Not in scope: cross-region replication, durability SLA negotiation with Redis provider. Single-region Upstash/Render single-instance Redis is fine at M2 volume.
Open question: Redis Stream consumer-group semantics with Upstash's regional failover path are underspecified in their docs. If we pick Upstash for the RPC-proxy rate limiter (INBOX open item), revisit whether consumer-ack survives failover or if every failover reprocesses the PEL. Alternative: keep IACP Streams on a Render single-instance Redis and use Upstash only for stateless rate-limit counters.
Acceptance:
- Indexer restart replays PEL, processes all unacked entries, 0 ingest-layer dupes in
program_eventspost-reprocess (idempotent on(program, sig, slot, ix_index)already). iacp_stream_lag_seconds{topic="task.verified"}p99 ≤ 5s at 50 msg/sec sustained.- Publisher-side
503 BusCongestedrate matches synthetic-congestion load-test expectations; zero silent message loss past MAXLEN.
Sequencing
M1 ship (devnet alpha)
│
├─ observability baseline (week 1-2 post-launch)
│ └─ saep_tx_cu_consumed histograms accrue
│
├─ CU auto-tuning (#1) — needs 2 weeks of samples to tune
│
├─ ALT caching (#2) — independent track, can start at week 1
│
├─ Redis Streams backpressure (#4) — independent track, can start at week 1
│
└─ Light Protocol compression (#3) — M3+ candidate,
blocked on Light devnet rehearsal decision
Items 1 + 2 + 4 are M2 candidates. Item 3 is M3+ pending the devnet-rehearsal answer.
Open questions
- Should the CU estimator live in the worker or as a shared
@saep/sdk/cumodule? Frontend tx-builders could benefit from the same estimator, but serving it to the browser requires exposing per-ix histograms — possibly over the Discovery API (specs/discovery-api.md). Trade: code reuse vs surface-area-of-public-endpoints growth. Default: worker-local for M2, promote to SDK if frontend asks. - ALT authority: dedicated PDA vs reuse of
fee_collectorauthority. Spec proposes dedicated PDA (cleaner blast radius if ALT auth is ever rotated independently) but adds one more PDA class to track. Reviewer's call. - Light compression on devnet feasibility. See §3 open question. If self-deploying Light's programs on localnet is too expensive, item 3 defers entirely to M3 trusted-setup window.
- Redis Streams backpressure: congestion signal at REST vs protocol-level.
503 BusCongestedis HTTP-level; an IACP-envelope-levelcongestedcode might be more semantic. Default: HTTP for REST, WS-close-code for WS, both documented inspecs/08-iacp-bus.mdwhen this lands. - Who runs the CU-telemetry cron: indexer (knows settled txs) vs settlement-worker (knows intended budget). Indexer is simpler (already has
getTransactionpath); worker gets a smaller feedback loop. Default: indexer, because the settlement-worker path already depends on indexer signals for IACP-subscribed settlement triggers.
Cross-references
specs/indexer-schema.md—saep_tx_cu_consumedlands alongside existing histograms in §Metrics.specs/08-iacp-bus.md— MAXLEN + consumer-group policy per topic.specs/discovery-api.md— CU estimator as possible future public endpoint.specs/mev-settlement.md— ALT-composed bundles as the shipping path for multi-ix bundles.backend-build.pdf§6.1 — original scope.reports/cu-measurements.md— static CU budgets that serve as the autotune floor.