Digital Banking Platform- Capacity
- Anand Nerurkar
- May 16
- 3 min read
Cloud Architecture Review Document: Digital Banking Platform
1. Executive Summary This document outlines the cloud-native architecture, capacity planning, and scaling strategy for the Digital Banking Platform, designed to support high concurrency, transaction throughput, and regulatory compliance using Microsoft Azure and Kubernetes.
2. Business Objectives
Modernize legacy banking systems into scalable microservices
Ensure high availability and disaster recovery
Achieve 500-1200+ TPS during peak load
Support 10,000+ concurrent users
Comply with BFSI regulatory mandates (e.g., RBI, SEBI)
3. Target Architecture Overview
Cloud Platform: Microsoft Azure
Orchestration: Azure Kubernetes Service (AKS)
Regions: Active-Active setup in North India and West India
Architecture Style: Microservices (Spring Boot), Event-Driven
Message Broker: Apache Kafka
API Gateway: Azure API Management + Istio Ingress Gateway
Observability: Prometheus, Grafana, ELK, Azure Monitor
Security: Azure Key Vault, Azure AD B2C, WAF, NSG, Pod Security Policies
4. Workload Summary
Microservices: 30 Spring Boot services
Replicas per service: 3
Kafka Brokers: 3 per region
Zookeeper Nodes: 3 per region
Istio Control Plane: 4 components
Observability Stack: Prometheus, Grafana, ELK (6 pods total)
Sidecars: Envoy (1 per app pod)
5. Capacity Planning (Per Region)
Component | Pods |
Microservices (30x3) | 90 |
Istio Sidecars | 90 |
Kafka + Zookeeper | 6 |
Istio Control Plane | 4 |
Istio Ingress Gateway | 2 |
Observability Stack | 6 |
System/DaemonSets | 10 |
Total Pods | 208 |
Pods per Node (avg): 7
Min Nodes: 30 = 208 /7 =aprox 30
Max Nodes: 45 = min node + 50 % overhead=30+15=45
6. Scaling Strategy
Horizontal Pod Autoscaler (HPA): Triggers on CPU >70%, memory >75%, or Kafka lag
Cluster Autoscaler (CA): Adds/removes nodes based on pod schedulability
KEDA: Optional for event-based auto-scaling (e.g., Kafka consumer lag)
Rate Limiting: Istio + Azure API Gateway with circuit breakers and retries
7. High Availability & DR
Active-Active Deployment: North India + West India (Azure Paired Regions)
Multi-AZ within each region for zone redundancy
Geo-redundant Kafka topics (optional via MirrorMaker2 or Confluent Replicator)
8. Observability & Monitoring
Prometheus + Grafana: Real-time metrics (latency, throughput, resource usage)
Azure Monitor + Application Insights: Logs, traces, health checks
ELK Stack: Centralized logging, error analysis, audit trails
9. Security & Compliance
Authentication: Azure AD B2C, OAuth2/JWT
Secrets Management: Azure Key Vault
Network Security: NSG, WAF, private endpoints
Regulatory Compliance: Data residency, RBAC, ISO 27001, PCI-DSS alignment
10. Summary This architecture provides a highly scalable, secure, and cloud-native foundation for digital banking, capable of handling 1000+ TPS with 10,000 concurrent users while ensuring compliance and operational excellence. It follows Azure and CNCF best practices and is designed for resilience, observability, and agility.
Validate architecture with load testing and chaos engineering:
Use tools like JMeter, Locust, or k6 for load testing APIs and Kafka throughput.
Apply chaos engineering using Azure Chaos Studio or Litmus to test resiliency of pods, nodes, and services under failure scenarios.
Finalize node pool isolation (Kafka, app, system):
Create dedicated node pools for application services, Kafka workloads, and system components.
Use taints and tolerations to enforce workload placement and avoid resource contention.
Implement CI/CD with security gates:
Set up Azure DevOps pipelines with static code analysis (e.g., SonarQube), image scanning (e.g., Trivy), and secrets validation.
Enforce environment promotion via manual or approval gates for QA, UAT, and Prod.
Set up DR drills and compliance reviews:
Schedule quarterly DR drills simulating region failover and backup recovery.
Align reviews with ISO 27001, RBI MSP guidelines, and document audit readiness checkpoints.
댓글