Active Passive Set Up Scenario
- Anand Nerurkar
- May 16
- 7 min read
In an active-passive setup, the replication flow and failover mechanism are quite different from active-active. Here’s how it works — step by step — in text format for a banking platform deployed on Azure AKS with Kafka, databases, and observability.
🔄 Replication Flow: Active-Passive Setup
🧍 Scenario:
User 1 is served by the active region (North India).
West India is configured as a standby/passive region.
Only one region handles traffic at a time.
If North India fails, the West India region is activated.
📍 1. Request Routing (Before Failure)
All user requests are routed to the North India AKS cluster (active).
West India AKS is running, but does not serve live traffic (except health probes).
Azure Front Door or Traffic Manager is configured to send all production traffic to North India.
📍 2. Kafka Replication (Active → Passive)
Kafka is deployed in North India only for active processing.
Kafka MirrorMaker 2 replicates all topics and partitions asynchronously to the West India Kafka cluster.
West India Kafka does not accept writes — it mirrors only.
📍 3. Database Replication
Azure SQL or Cosmos DB is geo-replicated:
Primary write replica in North India.
Read-only replicas in West India.
No writes happen in West India unless a manual failover is triggered.
📍 4. Configuration and Secrets
App configs and secrets (Azure App Configuration, Azure Key Vault) are synced across regions using CI/CD or GitOps pipelines.
Both clusters have the same app code and infra, but only North India is receiving traffic.
📍 5. Monitoring and Observability
Observability stack (Prometheus, Grafana, ELK) is deployed in both regions, but only North India is active.
Metrics are pushed from North India; West India components remain in hot standby.
🚨 6. Failure Scenario: North India Goes Down
Azure Front Door detects North India failure via probe.
Manual or automated failover process is initiated:
West India becomes active:
Kafka in West India becomes the primary producer/consumer cluster.
Azure SQL failover is triggered to promote West India DB as primary.
User requests now route to West India AKS.
All microservices start processing transactions using the now-active West India cluster.
🧠 Key Differences from Active-Active
Feature | Active-Active | Active-Passive |
Traffic distribution | Both regions live | One region handles 100% traffic |
Failover type | Seamless, health-based (auto) | Manual or semi-automated |
Data sync | Bi-directional Kafka, DB writes both | One-way Kafka/DB sync (active → passive) |
Resource utilization | High (both active) | Lower (one idle) |
Recovery time | Instant (seconds) | Minutes (depends on automation) |
Cost | Higher | Lower |
🧱 Azure Services for Active-Passive
Layer | Service | Function |
Traffic Mgmt | Azure Front Door / Traffic Manager | Failover routing |
Message Bus | Kafka + MirrorMaker 2 | Topic replication from active → passive |
Database | Azure SQL Geo-Replication | Standby read-replica, failover when needed |
Config Mgmt | Azure App Config + GitOps | Keep apps synced across regions |
Secrets | Azure Key Vault | Synced or staged across regions |
✅ Summary of Flow (Text Format)
sql
CopyEdit
User 1 → Azure Front Door → South India AKS → Kafka (South) → DB (South)
↓
MirrorMaker2 → West India Kafka (read-only)
DB Geo-Replication → West India DB (read-only)
On failure:
Azure Front Door → West India AKS → West India Kafka (now primary) → DB (failover to primary)
Active Passive Set up - Scenario
===
Let’s walk through what happens in an Active-Passive setup for your scenario. Here’s a detailed impact and flow breakdown:
🧾 Scenario Recap:
User 1 is doing a transaction in North India (active region)
User 5 is doing a transaction in West India (passive region)
Suddenly, North India region fails
❓What Happens to User 1?
✅ Before Failure:
User 1’s request goes to North India AKS, hits:
Spring Boot microservices
Kafka (South India brokers)
Azure SQL (South India primary DB)
❌ When Failure Happens:
In-progress transaction may:
Fail temporarily if not completed before the region crash.
Be partially committed (needs retry or reconciliation).
Be lost if not yet replicated (depends on DB/Kafka sync delay).
Recovery:
After manual failover, User 1 will be routed to West India, where:
The latest Kafka topic state (replicated via MirrorMaker2) is available.
The database (Azure SQL) is promoted as primary in West India.
❓What Happens to User 5?
⚠️ Important Clarification:
In Active-Passive, only the active region handles traffic.
So under normal operations, User 5 should not have been served by the passive region.
🟡 Possibilities:
If West India is truly passive:
User 5's request should be rejected or rerouted by Azure Front Door.
If User 5 reached West India AKS somehow (e.g., misconfiguration):
The app might have been running but Kafka and DB were in read-only state, leading to:
Errors
Incomplete processing
Logging without persistence
🔁 Failover Process Summary
Step | Description |
1. Region failure | North India becomes unresponsive |
2. Health check triggers | Azure Front Door/Traffic Manager detects failure |
3. Reroute traffic | All traffic now routed to West India |
4. DB failover | Azure SQL read replica in West India is promoted to read-write primary |
5. Kafka role switch | Kafka in West India is activated (if prepared) or takes over as primary |
6. Users resume | Both User 1 and User 5 now operate from West India |
⚠️ Design Considerations for Banking (Active-Passive)
Aspect | Recommendation |
User Stickiness | Use session affinity + re-auth if region changes |
Idempotency | Ensure APIs and Kafka consumers can retry safely |
Data Sync | Keep replication delay minimal (seconds) for Kafka and DB |
DR Drills | Run quarterly simulations to test seamless switch |
Failure Recovery | Queue uncommitted requests, reconcile after failover |
✅ Final Outcome
User | Initial Region | Impact After Failure | Status Post-Failover |
User 1 | North India | Interrupted mid-transaction | Routed to West, retries resume |
User 5 | West India | Likely rejected (if passive) |
In active passive , user 1 done txn in active region north india, west india is passive, north india in outage and user 1 conenct again, in that case user 1 reroute to passive region, but passive region no kafka topic replicated, only db synch ??
Excellent question — this goes to the heart of a critical design challenge in active-passive systems. Let’s unpack it clearly:
🔁 Scenario Breakdown
User 1 did a transaction in north India (Active).
Kafka events related to that transaction were only written in north India.
Azure SQL (or other DB) was replicated to West India using geo-replication.
north India goes down.
User 1 reconnects → Azure Front Door routes to West India (Passive).
❓ What Happens if Kafka Topics Are Not Replicated to West India?
🔻 Result:
User 1’s transaction events (Kafka-based) are not available in West India.
Services relying on event-driven data (e.g., fraud detection, downstream updates, audit logging) won’t have those events.
The database has the updated state (e.g., account/loan updated), but Kafka consumers and workflows in West India may behave inconsistently or restart from stale offsets.
🔎 Why This Is a Problem in Banking:
Risk | Impact |
Incomplete event history | Audit trails, fraud detection fail |
Downstream async services break | e.g., notification, KYC processing, scoring |
Kafka consumers in passive region | See a gap in topic partitions |
Data loss on event side | Even if DB is consistent, workflows break |
✅ Best Practice: ALWAYS REPLICATE KAFKA TOPICS Across Regions — Even in Active-Passive
To avoid this issue:
➤ Use Kafka MirrorMaker 2 or Confluent Replicator to:
Continuously mirror topics from North (Active) to West (Passive).
Keep Kafka offsets, events, and partitions in sync.
This ensures that when North goes down, the West region’s Kafka has all recent events, and services in West can resume seamlessly.
✔️ Updated Flow (with Kafka replication):
pgsql
User 1 → South India AKS → Kafka (South) + DB (South)
↘ MirrorMaker2 →
Kafka (West) ← Azure SQL Geo-replication ← DB (South)
South India goes down
User 1 reconnects → Azure Front Door → West India
→ Kafka (West) has all events
→ DB (West) promoted to read-write
→ Application continues with consistent state + event trail
🧠 Summary
Component | Without Kafka Replication | With Kafka Replication |
Kafka topic state | Lost in West India | Fully replicated, ready to consume |
Event workflows | Broken, inconsistent | Resume normally |
Audit/fraud logs | Missing gaps | Complete |
User experience | Retry fails or acts on old data | Retry succeeds with accurate state |
✅ Recommendation
Even in active-passive, Kafka topics must be replicated using:
MirrorMaker2 (open-source Kafka)
Confluent Replicator (Confluent Platform)
And make sure:
Kafka consumers in passive region are ready but idle.
After failover, Kafka brokers in passive region become primary, and consumers start immediately.
✅ Keep Kafka consumers in the passive region “ready” but not actively consuming,✅ So that they can immediately take over when the active region fails.
🎯 Goal
Deploy Kafka consumers in the passive region but keep them idle (not consuming messages) until failover is triggered.
✅ Strategies to Keep Consumers Ready but Idle
1. Start Consumers with Paused State
Kafka consumers (in Spring Boot, Java, Python, etc.) can be started and immediately paused using:
java
CopyEdit
consumer.pause(consumer.assignment());
They stay connected to the Kafka cluster but do not fetch records until resumed.
On failover:
java
CopyEdit
consumer.resume(consumer.assignment());
2. Use Feature Flags or Config Switch
Control activation via Spring Cloud Config, Azure App Configuration, or custom flag in your CI/CD pipeline.
For example:
yaml
CopyEdit
consumer.enabled: false # In passive region
Consumers check this flag before starting polling.
3. Assign a Consumer Group but Do Not Subscribe
Initialize consumers with a group ID but do not assign or subscribe to any topics.
java
CopyEdit
// Do not call consumer.subscribe(...) until failover
On failover, the app subscribes to relevant topics and starts consuming from the latest or committed offset.
4. Run Passive Cluster in “Readiness Mode”
In Kubernetes, deploy consumer pods but:
Set replicas = 0, or
Use startupProbe to hold them in a non-ready state
Use an environment variable like ACTIVE_REGION=true/false to toggle behavior
5. Control Partition Assignment in MirrorMaker2
Ensure that MirrorMaker2 syncs topics but does not auto-assign partition ownership in passive consumers.
Only after failover do you allow consumer groups in the passive region to take over partitions.
🛡️ Additional Tips for Safe Failover
Practice | Why It Matters |
Ensure idempotent consumers | Replaying messages after failover won't cause duplicates |
Commit offsets atomically | Avoid partial commits that cause re-processing |
Health-check Kafka topics | Verify topic replication status before switching |
Monitor lag in passive region | Helps ensure passive Kafka is fully synced |
🔄 Failover Activation Process (Simplified)
Active region fails
Azure Front Door reroutes traffic
Passive consumers:
Resume polling (or start)
Subscribe to topics
Begin consuming messages
Offsets and partitions are picked up from replicated state
✅ Summary
Technique | Result |
consumer.pause() on startup | Ready but idle |
Feature flags / config switch | Controlled start/stop |
No subscription until failover | No consumption activity |
Deployment gating in AKS | Controlled pod behavior |
Comments