top of page

P95/P99

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • May 6
  • 1 min read

⏱️ Latency (P95/P99 Response Time) is a key performance metric used to measure how fast a system or API responds under real-world load — not just on average, but at the “worst-case” tail of user experiences.

📌 Definitions:

  • P95 (95th percentile latency):95% of all requests are faster than this value — the slowest 5% take longer.➤ A good balance between average performance and worst-case outliers.

  • P99 (99th percentile latency):99% of all requests are faster than this value — only 1% are slower.➤ Highlights rare but impactful latency spikes.

🧠 Why It Matters:

Metric

What It Tells You

Average latency

General performance under normal load

P95 latency

How most users experience your app

P99 latency

Worst-case response time under load — can expose outliers, contention, or scaling issues

📊 Example:

Let’s say your API returns:

Request #

Response Time

1–90

100–200ms

91–95

300–400ms

96–99

800–1000ms

100

2,000ms

  • P95 latency ≈ 400ms

  • P99 latency ≈ 1,000ms

  • Max latency = 2,000ms

Target Ranges (Good Practice):

API Type

P95 Target

P99 Target

Internal APIs

< 300ms

< 500ms

External/Public APIs

< 500ms

< 1000ms

Critical real-time (e.g., payments)

< 100ms

< 250ms


 
 
 

Recent Posts

See All
Ops Efficiency 30 % improvement

how did you achieve 30 % operational efficiency Achieving 30% operational efficiency in a BFSI-grade, microservices-based personal...

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page