top of page

Capacity Planning-Digital Banking

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • May 16
  • 5 min read

Capacity Planning Guide for AKS Cluster — Banking Use Case

1. Understand the Inputs

Parameter

Banking Standard / Example Input

Concurrent Users

10,000+

TPS (Transactions per Second)

1,000 to 2,000

Microservices

~30 Spring Boot microservices

Replicas per service

3 replicas (for HA and load balancing)

Sidecars

1 Istio Envoy proxy per app pod

Messaging

Kafka cluster with 3 brokers + 3 ZK

Observability stack

Prometheus, Grafana, ELK (6 pods approx.)

Node capacity

~7 pods per node (accounting for overhead)

2. Calculate Total Pods Required Per Region

Component

Pod Count Calculation

Result

Microservice pods

30 services × 3 replicas

90

Istio sidecars (one per pod)

90

90

Kafka brokers + Zookeeper

3 + 3

6

Istio control plane

Pilot, Mixer, Ingress Gateway (approximate)

6

Observability stack

Prometheus, Grafana, ELK (~6 pods)

6

System/DaemonSet pods

Kubernetes system pods, monitoring agents (approx.)

10

Total pods

Sum of all

208

3. Calculate Number of Nodes Needed

  • Assume 7 pods per node as a safe average (leaves room for system overhead and sidecars)

  • Total nodes per region = Total pods / pods per node


Nodes per region=208/7≈30\text{Nodes per region} = \frac{208}{7} \approx 30Nodes per region

  • Add 30–50% buffer for autoscaling, failures, and burst loads

Max nodes per region=30×1.5=45\text{Max nodes per region} = 30 \times 1.5 = 45Max nodes per region=30×1.5=45


4. Scale Pods and Nodes Based on TPS

  • Estimate TPS per pod:

    Industry benchmark: 50–100 TPS per Spring Boot pod (depends on JVM tuning, DB latency)

  • For 2000 TPS peak load:

Pods required=2000/75≈27 pods


  • Since you have 30 services × 3 replicas = 90 pods baseline (more than enough), scaling is often driven by load spikes on specific services.

  • Use Horizontal Pod Autoscaler (HPA) to increase replicas for high load microservices.

  • When pods cannot be scheduled due to lack of resources, Cluster Autoscaler (CA) adds nodes.


5. Node Pools Design

Node Pool

Purpose

Approx. Nodes

Notes

App Services

Spring Boot microservices

20–25

Main business logic

Kafka Cluster

Kafka Brokers and Zookeeper

6

High IOPS, persistent storage

Istio Components

Ingress gateway, control plane

3–5

Control plane stability

Observability

Prometheus, Grafana, ELK

3–5

Monitoring and logging

System

Kubernetes system pods

1–3

Essential node-level system services

6. Active-Active Setup

  • Deploy two identical AKS clusters in north India and West India.

  • Each cluster sized as above (min ~30 nodes, max ~45 nodes).

  • Use Kafka MirrorMaker2 or Confluent Replicator for geo-replication.

  • Multi-AZ enabled inside each region for fault tolerance.

  • Traffic routed with Azure Front Door or Traffic Manager.


7. Scaling Triggers and Strategy

What to Scale

How / When

Pods (HPA)

CPU > 70%, Memory > 75%, Kafka lag threshold

Nodes (Cluster Autoscaler)

Pods unschedulable due to resource limits

Kafka Brokers

Monitor throughput, disk usage, increase if lag spikes

Istio/Observability

Scale with HPA or manually for stability

8. Additional Considerations

  • JVM tuning for microservices critical to TPS performance.

  • Use KEDA for event-driven autoscaling of Kafka consumers.

  • Use PodDisruptionBudgets to ensure minimum availability during scaling/updates.

  • Use taints/tolerations for node pool isolation.

  • Monitor latency, error rates, and throughput continuously using Prometheus and Application Insights.

Summary Table per Region

Component

Pods

Nodes (Min)

Nodes (Max)

Spring Boot microservices

90



Istio Sidecars

90



Kafka + Zookeeper

6



Istio Control Plane

6



Observability Stack

6



System / DaemonSets

10



Total Pods

208

30

45

Step-by-Step Capacity Planning Calculation

Step 1: Estimate Number of Application Pods

  • You have 30 microservices, each deployed with 3 replicas for availability and load balancing.

App pods=30×3=90 pods


Step 2: Add Sidecar Pods for Istio

  • Istio injects one Envoy sidecar per app pod for service mesh features.

Sidecar pods=App pods=90 pods


Step 3: Add Platform and System Pods

Component

Pods Estimate

Kafka brokers + Zookeeper

3 + 3 = 6 pods

Istio control plane + ingress

~6 pods

Observability stack (Prometheus, Grafana, ELK)

~6 pods

System DaemonSets (DNS, monitoring, logging)

~10 pods

Infra pods=6+6+6+10=28 pods\text{Infra pods} = 6 + 6 + 6 + 10 = 28 \text{ pods}Infra pods=6+6+6+10=28 pods

Step 4: Calculate Total Pods Per Region

Total pods=App pods+Sidecar pods+Infra pods=90+90+28=208 pods


Step 5: Estimate Pods Per Node

  • Industry best practice suggests ~7 pods per AKS node to allow CPU/memory overhead for Kubernetes system components, sidecars, and ensure stability.

Pods per node≈7\text{Pods per node} \approx 7Pods per node≈7


Step 6: Calculate Number of Nodes Needed

Divide total pods by pods per node:

Min nodes=208/7≈30 nodes {Min nodes}


Add buffer for autoscaling and failover (~50%):


Max nodes=30×1.5=45 nodes{Max nodes}


Step 7: Verify TPS and Pod Throughput


  • Assume each Spring Boot pod handles ~75 TPS based on JVM, DB, and network tuning.

  • For peak 2000 TPS:


Pods required for TPS=2000/75≈27 pods


Since you have 90 pods running, the system can handle up to 75 TPS × 90 = 6750 TPS (theoretically) at peak capacity, which gives room for scaling, retries, and spiky traffic.


Step 8: Active-Active Setup

  • The above sizing is per region (north India and West India).

  • Each region runs a full AKS cluster sized with 30-45 nodes for HA and compliance.

  • Kafka is geo-replicated for data consistency.

Summary:

Calculation Aspect

Value

App pods

90

Sidecar pods

90

Infra pods

28

Total pods

208

Pods per node

7

Min nodes

30

Max nodes

45

TPS per pod

75

TPS supported (90 pods)

6750 TPS (theoretical max)


“Add Platform and System Pods” estimates come from understanding the typical infrastructure and platform services running in an AKS cluster supporting a microservices banking app with Kafka, Istio, and observability tools.


Here’s how each part contributes to the pod count:

1. Kafka Brokers + Zookeeper

  • Kafka usually runs as a cluster of brokers, with 3 brokers for fault tolerance and throughput.

  • Zookeeper is needed by Kafka for metadata management, typically 3 nodes for HA.

Total: 3 (Kafka) + 3 (ZK) = 6 pods

2. Istio Control Plane + Ingress Gateway

  • Istio’s control plane consists of multiple components:

    • Pilot (service discovery)

    • Mixer (policy and telemetry)

    • Citadel (security)

    • Ingress Gateway (edge routing)

  • Typically, you run:

    • 4 control plane pods (for redundancy and load)

    • 2 ingress gateway pods (for HA ingress routing)

Total: 4 + 2 = 6 pods

3. Observability Stack

  • The monitoring and logging stack is usually composed of:

    • Prometheus server

    • Grafana dashboard

    • ELK Stack (ElasticSearch, Logstash, Kibana)

  • For HA and scaling, these run on multiple pods:

    • Prometheus: 2 pods (primary + replica)

    • Grafana: 2 pods

    • ELK components: 2 pods (could be more depending on scale)

Estimated total: ~6 pods

4. System / DaemonSet Pods

  • Kubernetes system components include:

    • CoreDNS

    • kube-proxy

    • Azure monitoring agents

    • Fluentd/Log collectors

    • Other DaemonSets for node-level operations

  • These are typically running one pod per node; for 30 nodes, this would be 30 pods but since we count per cluster, a safe estimate is around 10 pods to cover essential system services that consume cluster resources (some overlap with DaemonSets running on every node).

Summary Table:

Platform Component

Pods Estimate

Reasoning

Kafka Brokers + Zookeeper

6

3 brokers + 3 ZK nodes

Istio Control Plane + Gateway

6

4 control plane + 2 ingress pods

Observability (Prom, Grafana, ELK)

6

Multi-pod HA for monitoring stack

Kubernetes System/DaemonSets

10

Node agents and system pods

Total

28 pods

Sum of above

Why approximate?

  • Exact counts depend on cluster size, workload, and HA requirements.

  • These values represent a typical production-grade setup for a banking system with multiple microservices and observability tools.

  • You can tune up or down depending on your architecture maturity and scale.


is this for only one region south india, wha t abt other region west india


Yes, the pod and node counts I shared are for a single AKS cluster in one region — for example, north India.

What about the other region (West India)?

  • You would deploy an identical AKS cluster in West India as part of your active-active setup.

  • This means the same pod counts and node counts apply per region, assuming the workload and scale are similar.

  • Each region operates independently but replicates data (e.g., via Kafka MirrorMaker2) for resiliency and disaster recovery.

Summary:

Region

Pods (approx.)

Nodes (min)

Nodes (max)

South India

~208

~30

~45

West India

~208

~30

~45

Why replicate clusters?

  • High availability: If one region goes down, the other continues serving.

  • Low latency: Users routed to the nearest region.

  • Regulatory compliance: Data residency and disaster recovery mandates.

  • Scalability: Each region scales independently based on local demand.



 
 
 

Recent Posts

See All
Ops Efficiency 30 % improvement

how did you achieve 30 % operational efficiency Achieving 30% operational efficiency in a BFSI-grade, microservices-based personal...

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page