Core HR Performance Budget¶

Purpose: Establish quantitative latency, throughput, and resource guardrails for critical user flows and platform components. Budgets guide design, tuning, regression detection, and release readiness.

1. Scope & Principles¶

User-centric p95 and p99 latency targets for interactive actions.
Separation of synchronous UX flows vs asynchronous background work.
Budgets are ceilings, not averages; sustained exceedance triggers investigation.
Prefer additive caching and incremental optimization before deep refactors.
Observability (traces + metrics) required to measure every budget.

2. Flow Latency Budgets¶

Flow / Action	Target p95	Target p99	Ceiling (hard stop)	Notes
Password Login	400ms	650ms	800ms	Includes DB user fetch + token issue
SSO Login (OIDC)	800ms	1200ms	1500ms	External IdP median ~300ms; optimize local steps
Signup Provisioning	1200ms	1800ms	2000ms	Tenant + first user + baseline events
Leave Request Submit	350ms	500ms	650ms	Balance calc + event emission + cache update
Leave Cancellation	250ms	400ms	500ms	Simple status + balance reversal
Goal Status Change	200ms	300ms	400ms	Includes analytics event (US-332)
Document Upload ACK (pre-scan)	500ms	800ms	1000ms	Returns pending status; excludes scan
Document Scan Completion (async)	30s	45s	60s	p95 turnaround from enqueue to status Clean
Subscription Upgrade (US-205)	2000ms	3000ms	3500ms	Gateway API variability; proration calc local <150ms
Profile Read (standard)	250ms	400ms	500ms	Includes permission filtering & cache assist
Event Emission Overhead	<10ms	<20ms	30ms	Serialization + dispatch; measured inside span
Analytics Event Write	<15ms	<30ms	50ms	Non-blocking path preferred

3. Component Budgets¶

Component	Budget Dimension	Target	Hard Ceiling	Measurement Method
API Pod CPU per request	Avg CPU time	<50ms	80ms	span CPU metrics / eBPF samples
API Pod RSS Memory	Steady-state	<300MB	400MB	container metrics
DB Simple Query (indexed)	p95 latency	<20ms	35ms	query tracing sample
DB Multi-Join Query	p95 latency	<60ms	90ms	trace spans annotated
Leave Accrual Transaction	p95 latency	<120ms	180ms	transaction root span
Cache GET Hit	p95 latency	<5ms	10ms	client metrics
Cache Hit Ratio (leave balances)	Percentage	>95%	<90% triggers alert	aggregated counters
Queue Publish (domain event)	p95 latency	<25ms	40ms	publish span
Queue Lag (critical flows)	p95 delay	<2s	5s	consumer metrics
Document Scan Worker CPU	Avg/core	<65%	80%	node exporter metrics
Billing Webhook Processing	p95 latency	<500ms	800ms	webhook handler span
Trace Sampling Rate (critical flows)	Percentage retained	80%	90% max (avoid overhead)	collector config audit
Instrumentation Overhead	CPU impact	<2%	5%	diff baseline vs instrumented load

4. Throughput & Concurrency Assumptions¶

Metric	Baseline	Scaled Target (Year 1)	Notes
Peak Concurrent Auth Requests	50 RPS	500 RPS	Password + SSO combined
Peak Leave Ops (submit/cancel)	20 RPS	200 RPS	Seasonal spikes around holidays
Peak Document Uploads	5 RPS	50 RPS	Burst handling with queue backpressure
Peak Goal Status Changes	10 RPS	100 RPS	Review cycle bursts
Event Throughput (domain+analytics)	150 RPS	1500 RPS	Partition & batching strategy evolves

5. Capacity Guardrails¶

DB connection pool wait p95 < 10ms; alert on sustained >25ms.
Queue backlog (age p95) must remain under 2 * flow p95 latency budget for synchronous dependencies.
Cache memory eviction rate <5% per hour for critical keys (leave balances, tenant config).
SSO token introspection failures <0.5% of attempts monthly.

6. Regression Detection¶

Signal	Threshold	Action
Any p95 flow > budget for 3 consecutive 5-min windows	Degradation alert	Create performance incident ticket
DB query p99 > 2x target	Investigate query plan	Add/adjust index or cache layer
Cache hit ratio < target for 15 min	Hot path analysis	Evaluate key expiry & prewarm jobs
Event publish latency p95 > 40ms	Broker health check	Scale partitions / investigate network
Trace retention < target	Collector config audit	Adjust sampling rules

7. Testing Strategy¶

Load Generation: k6 scenarios per critical flow (login, leave submit, doc upload ACK, upgrade).
Profiling: CPU & memory under sustained 10-minute peak; capture flamegraphs.
Synthetic Daily Run: Executes lightweight smoke + latency assertions (GitHub Action or cron job).
Pre-Release Load Test: 2x expected peak for 30 minutes; must remain within p99 ceilings.
Performance Test Artifacts: Stored in docs/performance/results/<YYYY-MM-DD>/ with summary JSON.

8. Optimization Playbook (Ordered)¶

Confirm measurement accuracy (trace spans & metrics present).
Identify top contributors (flamegraph / span breakdown).
Apply low-risk improvements (indexes, cache TTL tuning, batch writes).
Consider architectural changes (async boundaries, denormalization) if >25% over budget persists.
Re-run load & update budget review notes.

9. Future Expansion¶

Introduce percentiles for cold-start (deployment rollover) vs warmed state.
Separate regional latency budgets when multi-region deployed.
Automated SLO derivation & error budget tracking (phase 2).
Real-user monitoring (RUM) for frontend correlation.

10. Governance & Review Cadence¶

Monthly review: Adjust budgets based on growth metrics & infra changes.
Change control: Any upward adjustment requires documented rationale & mitigation plan.
Ownership: Platform engineering maintains doc; each domain owner signs off revisions.

11. Instrumentation Requirements Checklist¶

Trace spans exist for each budgeted flow start/end.
Span attributes include tenantId (non-PII surrogate), user role, component.
Metrics exported: latency histograms, cache hits, queue lag, DB timings.
Alert rules defined for all regression thresholds.
Load test scripts stored & versioned.

12. Open Questions¶

Introduce per-tenant isolation latency budgets? (e.g., queries under tenant predicate.)
Track warm vs cold path separately for document scan worker startup?
Need SLA differentiation between paid tiers (e.g., faster document scan)?

Version: 1.0 (2025-11-22)