P95/P99
- Anand Nerurkar
- May 6
- 1 min read
⏱️ Latency (P95/P99 Response Time) is a key performance metric used to measure how fast a system or API responds under real-world load — not just on average, but at the “worst-case” tail of user experiences.
📌 Definitions:
P95 (95th percentile latency):95% of all requests are faster than this value — the slowest 5% take longer.➤ A good balance between average performance and worst-case outliers.
P99 (99th percentile latency):99% of all requests are faster than this value — only 1% are slower.➤ Highlights rare but impactful latency spikes.
🧠 Why It Matters:
Metric | What It Tells You |
Average latency | General performance under normal load |
P95 latency | How most users experience your app |
P99 latency | Worst-case response time under load — can expose outliers, contention, or scaling issues |
📊 Example:
Let’s say your API returns:
Request # | Response Time |
1–90 | 100–200ms |
91–95 | 300–400ms |
96–99 | 800–1000ms |
100 | 2,000ms |
P95 latency ≈ 400ms
P99 latency ≈ 1,000ms
Max latency = 2,000ms
✅ Target Ranges (Good Practice):
API Type | P95 Target | P99 Target |
Internal APIs | < 300ms | < 500ms |
External/Public APIs | < 500ms | < 1000ms |
Critical real-time (e.g., payments) | < 100ms | < 250ms |
Comments