Purpose: Establish quantitative latency, throughput, and resource guardrails for critical user flows and platform components. Budgets guide design, tuning, regression detection, and release readiness.
1. Scope & Principles
- User-centric p95 and p99 latency targets for interactive actions.
- Separation of synchronous UX flows vs asynchronous background work.
- Budgets are ceilings, not averages; sustained exceedance triggers investigation.
- Prefer additive caching and incremental optimization before deep refactors.
- Observability (traces + metrics) required to measure every budget.
2. Flow Latency Budgets
| Flow / Action |
Target p95 |
Target p99 |
Ceiling (hard stop) |
Notes |
| Password Login |
400ms |
650ms |
800ms |
Includes DB user fetch + token issue |
| SSO Login (OIDC) |
800ms |
1200ms |
1500ms |
External IdP median ~300ms; optimize local steps |
| Signup Provisioning |
1200ms |
1800ms |
2000ms |
Tenant + first user + baseline events |
| Leave Request Submit |
350ms |
500ms |
650ms |
Balance calc + event emission + cache update |
| Leave Cancellation |
250ms |
400ms |
500ms |
Simple status + balance reversal |
| Goal Status Change |
200ms |
300ms |
400ms |
Includes analytics event (US-332) |
| Document Upload ACK (pre-scan) |
500ms |
800ms |
1000ms |
Returns pending status; excludes scan |
| Document Scan Completion (async) |
30s |
45s |
60s |
p95 turnaround from enqueue to status Clean |
| Subscription Upgrade (US-205) |
2000ms |
3000ms |
3500ms |
Gateway API variability; proration calc local <150ms |
| Profile Read (standard) |
250ms |
400ms |
500ms |
Includes permission filtering & cache assist |
| Event Emission Overhead |
<10ms |
<20ms |
30ms |
Serialization + dispatch; measured inside span |
| Analytics Event Write |
<15ms |
<30ms |
50ms |
Non-blocking path preferred |
3. Component Budgets
| Component |
Budget Dimension |
Target |
Hard Ceiling |
Measurement Method |
| API Pod CPU per request |
Avg CPU time |
<50ms |
80ms |
span CPU metrics / eBPF samples |
| API Pod RSS Memory |
Steady-state |
<300MB |
400MB |
container metrics |
| DB Simple Query (indexed) |
p95 latency |
<20ms |
35ms |
query tracing sample |
| DB Multi-Join Query |
p95 latency |
<60ms |
90ms |
trace spans annotated |
| Leave Accrual Transaction |
p95 latency |
<120ms |
180ms |
transaction root span |
| Cache GET Hit |
p95 latency |
<5ms |
10ms |
client metrics |
| Cache Hit Ratio (leave balances) |
Percentage |
>95% |
<90% triggers alert |
aggregated counters |
| Queue Publish (domain event) |
p95 latency |
<25ms |
40ms |
publish span |
| Queue Lag (critical flows) |
p95 delay |
<2s |
5s |
consumer metrics |
| Document Scan Worker CPU |
Avg/core |
<65% |
80% |
node exporter metrics |
| Billing Webhook Processing |
p95 latency |
<500ms |
800ms |
webhook handler span |
| Trace Sampling Rate (critical flows) |
Percentage retained |
80% |
90% max (avoid overhead) |
collector config audit |
| Instrumentation Overhead |
CPU impact |
<2% |
5% |
diff baseline vs instrumented load |
4. Throughput & Concurrency Assumptions
| Metric |
Baseline |
Scaled Target (Year 1) |
Notes |
| Peak Concurrent Auth Requests |
50 RPS |
500 RPS |
Password + SSO combined |
| Peak Leave Ops (submit/cancel) |
20 RPS |
200 RPS |
Seasonal spikes around holidays |
| Peak Document Uploads |
5 RPS |
50 RPS |
Burst handling with queue backpressure |
| Peak Goal Status Changes |
10 RPS |
100 RPS |
Review cycle bursts |
| Event Throughput (domain+analytics) |
150 RPS |
1500 RPS |
Partition & batching strategy evolves |
5. Capacity Guardrails
- DB connection pool wait p95 < 10ms; alert on sustained >25ms.
- Queue backlog (age p95) must remain under 2 * flow p95 latency budget for synchronous dependencies.
- Cache memory eviction rate <5% per hour for critical keys (leave balances, tenant config).
- SSO token introspection failures <0.5% of attempts monthly.
6. Regression Detection
| Signal |
Threshold |
Action |
| Any p95 flow > budget for 3 consecutive 5-min windows |
Degradation alert |
Create performance incident ticket |
| DB query p99 > 2x target |
Investigate query plan |
Add/adjust index or cache layer |
| Cache hit ratio < target for 15 min |
Hot path analysis |
Evaluate key expiry & prewarm jobs |
| Event publish latency p95 > 40ms |
Broker health check |
Scale partitions / investigate network |
| Trace retention < target |
Collector config audit |
Adjust sampling rules |
7. Testing Strategy
- Load Generation:
k6 scenarios per critical flow (login, leave submit, doc upload ACK, upgrade).
- Profiling: CPU & memory under sustained 10-minute peak; capture flamegraphs.
- Synthetic Daily Run: Executes lightweight smoke + latency assertions (GitHub Action or cron job).
- Pre-Release Load Test: 2x expected peak for 30 minutes; must remain within p99 ceilings.
- Performance Test Artifacts: Stored in
docs/performance/results/<YYYY-MM-DD>/ with summary JSON.
8. Optimization Playbook (Ordered)
- Confirm measurement accuracy (trace spans & metrics present).
- Identify top contributors (flamegraph / span breakdown).
- Apply low-risk improvements (indexes, cache TTL tuning, batch writes).
- Consider architectural changes (async boundaries, denormalization) if >25% over budget persists.
- Re-run load & update budget review notes.
9. Future Expansion
- Introduce percentiles for cold-start (deployment rollover) vs warmed state.
- Separate regional latency budgets when multi-region deployed.
- Automated SLO derivation & error budget tracking (phase 2).
- Real-user monitoring (RUM) for frontend correlation.
10. Governance & Review Cadence
- Monthly review: Adjust budgets based on growth metrics & infra changes.
- Change control: Any upward adjustment requires documented rationale & mitigation plan.
- Ownership: Platform engineering maintains doc; each domain owner signs off revisions.
11. Instrumentation Requirements Checklist
12. Open Questions
- Introduce per-tenant isolation latency budgets? (e.g., queries under tenant predicate.)
- Track warm vs cold path separately for document scan worker startup?
- Need SLA differentiation between paid tiers (e.g., faster document scan)?
Version: 1.0 (2025-11-22)