top of page

Key Performance Metrices in Microservices Architecture

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • Aug 20
  • 2 min read

🔹 1. Infrastructure & Platform Metrics

Track system health and scalability.

  • CPU, Memory, Disk, Network Utilization → To ensure pods/containers are not resource-starved.

  • Container & Pod Restarts (Kubernetes) → Indicates stability issues.

  • Autoscaling Effectiveness → How well HPA (Horizontal Pod Autoscaler) or cluster scaling is reacting.

  • Error Events (K8s, Istio, etc.) → CrashLoopBackOff, OOMKilled, etc.

🔹 2. Service-Level Metrics (Golden Signals from SRE)

These are Istio/Envoy + App telemetry based.

  • Latency (p95, p99) → Response time of each service/API.

  • Traffic → Number of requests per second (RPS), throughput (QPS).

  • Errors → 4xx (client issues), 5xx (server failures), gRPC errors.

  • Saturation → Queue depth, thread pool usage.

🔹 3. Application Metrics

  • API Success vs Failure Rate → Helps in SLA/SLO tracking.

  • Dependency Latency → DB queries, Kafka topics, downstream services.

  • Circuit Breaker Metrics (Istio/Resilience4j) → Open/half-open states, retries.

  • Cache Hit Ratio (Redis, CDN) → Efficiency of caching strategy.

  • DB Metrics → Slow queries, connection pool utilization, replication lag.

🔹 4. Reliability & Resilience Metrics

  • MTTR (Mean Time to Recovery)

  • MTBF (Mean Time Between Failures)

  • Service Availability (uptime %)

  • Request Retry Count & Fallback Usage

🔹 5. Security Metrics

  • Auth/Token Failures (OAuth/JWT/OPA)

  • mTLS handshake failures (Istio)

  • Unauthorized access attempts

  • Vulnerability scan & patch compliance

🔹 6. Business & Domain Metrics

Directly tied to business outcomes.

  • Transaction Success Rate (e.g., successful payments/loans approved).

  • User Onboarding Conversion Rate

  • Revenue-impacting Latency (checkout, payment APIs).

  • Abandonment Rate (loan journey drop-offs).

  • Fraud Detection Accuracy (false positives/negatives).

🔹 7. Observability & Tracing Metrics

  • Distributed Tracing (Jaeger/Zipkin/Tempo) → End-to-end latency.

  • Log Volume & Error Patterns (ELK, Loki, Splunk).

  • Service-to-Service Dependency Graph (Istio Kiali).

✅ In practice, you should track a combination of Golden Signals + Business KPIs.

  • Golden Signals (Latency, Traffic, Errors, Saturation) → For SRE/DevOps.

  • Business KPIs → For Product/Management.

 
 
 

Recent Posts

See All
EA Day to Day Activity

🔹 Typical Day-to-Day Activities (Enterprise Architect – Digital Lending Program) 1. Start of Day – Communication & Prioritization Read &...

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page