Active Passive Set Up Scenario

Anand Nerurkar
May 16
7 min read

In an active-passive setup, the replication flow and failover mechanism are quite different from active-active. Here’s how it works — step by step — in text format for a banking platform deployed on Azure AKS with Kafka, databases, and observability.

🔄 Replication Flow: Active-Passive Setup

🧍 Scenario:

User 1 is served by the active region (North India).
West India is configured as a standby/passive region.
Only one region handles traffic at a time.
If North India fails, the West India region is activated.

📍 1. Request Routing (Before Failure)

All user requests are routed to the North India AKS cluster (active).
West India AKS is running, but does not serve live traffic (except health probes).
Azure Front Door or Traffic Manager is configured to send all production traffic to North India.

📍 2. Kafka Replication (Active → Passive)

Kafka is deployed in North India only for active processing.
Kafka MirrorMaker 2 replicates all topics and partitions asynchronously to the West India Kafka cluster.
West India Kafka does not accept writes — it mirrors only.

📍 3. Database Replication

Azure SQL or Cosmos DB is geo-replicated:
- Primary write replica in North India.
- Read-only replicas in West India.
No writes happen in West India unless a manual failover is triggered.

📍 4. Configuration and Secrets

App configs and secrets (Azure App Configuration, Azure Key Vault) are synced across regions using CI/CD or GitOps pipelines.
Both clusters have the same app code and infra, but only North India is receiving traffic.

📍 5. Monitoring and Observability

Observability stack (Prometheus, Grafana, ELK) is deployed in both regions, but only North India is active.
Metrics are pushed from North India; West India components remain in hot standby.

🚨 6. Failure Scenario: North India Goes Down

Azure Front Door detects North India failure via probe.
Manual or automated failover process is initiated:
- West India becomes active:
  - Kafka in West India becomes the primary producer/consumer cluster.
  - Azure SQL failover is triggered to promote West India DB as primary.
User requests now route to West India AKS.
All microservices start processing transactions using the now-active West India cluster.

🧠 Key Differences from Active-Active

Feature	Active-Active	Active-Passive
Traffic distribution	Both regions live	One region handles 100% traffic
Failover type	Seamless, health-based (auto)	Manual or semi-automated
Data sync	Bi-directional Kafka, DB writes both	One-way Kafka/DB sync (active → passive)
Resource utilization	High (both active)	Lower (one idle)
Recovery time	Instant (seconds)	Minutes (depends on automation)
Cost	Higher	Lower

🧱 Azure Services for Active-Passive

Layer	Service	Function
Traffic Mgmt	Azure Front Door / Traffic Manager	Failover routing
Message Bus	Kafka + MirrorMaker 2	Topic replication from active → passive
Database	Azure SQL Geo-Replication	Standby read-replica, failover when needed
Config Mgmt	Azure App Config + GitOps	Keep apps synced across regions
Secrets	Azure Key Vault	Synced or staged across regions

✅ Summary of Flow (Text Format)

sql

CopyEdit

User 1 → Azure Front Door → South India AKS → Kafka (South) → DB (South)

↓

MirrorMaker2 → West India Kafka (read-only)

DB Geo-Replication → West India DB (read-only)

On failure:

Azure Front Door → West India AKS → West India Kafka (now primary) → DB (failover to primary)

Active Passive Set up - Scenario

===

Let’s walk through what happens in an Active-Passive setup for your scenario. Here’s a detailed impact and flow breakdown:

🧾 Scenario Recap:

User 1 is doing a transaction in North India (active region)
User 5 is doing a transaction in West India (passive region)
Suddenly, North India region fails

❓What Happens to User 1?

✅ Before Failure:

User 1’s request goes to North India AKS, hits:
- Spring Boot microservices
- Kafka (South India brokers)
- Azure SQL (South India primary DB)

❌ When Failure Happens:

In-progress transaction may:
- Fail temporarily if not completed before the region crash.
- Be partially committed (needs retry or reconciliation).
- Be lost if not yet replicated (depends on DB/Kafka sync delay).

Recovery:

After manual failover, User 1 will be routed to West India, where:
- The latest Kafka topic state (replicated via MirrorMaker2) is available.
- The database (Azure SQL) is promoted as primary in West India.

❓What Happens to User 5?

⚠️ Important Clarification:

In Active-Passive, only the active region handles traffic.
So under normal operations, User 5 should not have been served by the passive region.

🟡 Possibilities:

If West India is truly passive:
- User 5's request should be rejected or rerouted by Azure Front Door.
If User 5 reached West India AKS somehow (e.g., misconfiguration):
- The app might have been running but Kafka and DB were in read-only state, leading to:
  - Errors
  - Incomplete processing
  - Logging without persistence

🔁 Failover Process Summary

Step	Description
1. Region failure	North India becomes unresponsive
2. Health check triggers	Azure Front Door/Traffic Manager detects failure
3. Reroute traffic	All traffic now routed to West India
4. DB failover	Azure SQL read replica in West India is promoted to read-write primary
5. Kafka role switch	Kafka in West India is activated (if prepared) or takes over as primary
6. Users resume	Both User 1 and User 5 now operate from West India

⚠️ Design Considerations for Banking (Active-Passive)

Aspect	Recommendation
User Stickiness	Use session affinity + re-auth if region changes
Idempotency	Ensure APIs and Kafka consumers can retry safely
Data Sync	Keep replication delay minimal (seconds) for Kafka and DB
DR Drills	Run quarterly simulations to test seamless switch
Failure Recovery	Queue uncommitted requests, reconcile after failover

✅ Final Outcome

User	Initial Region	Impact After Failure	Status Post-Failover
User 1	North India	Interrupted mid-transaction	Routed to West, retries resume
User 5	West India	Likely rejected (if passive)

In active passive , user 1 done txn in active region north india, west india is passive, north india in outage and user 1 conenct again, in that case user 1 reroute to passive region, but passive region no kafka topic replicated, only db synch ??

Excellent question — this goes to the heart of a critical design challenge in active-passive systems. Let’s unpack it clearly:

🔁 Scenario Breakdown

User 1 did a transaction in north India (Active).
Kafka events related to that transaction were only written in north India.
Azure SQL (or other DB) was replicated to West India using geo-replication.
north India goes down.
User 1 reconnects → Azure Front Door routes to West India (Passive).

❓ What Happens if Kafka Topics Are Not Replicated to West India?

🔻 Result:

User 1’s transaction events (Kafka-based) are not available in West India.
Services relying on event-driven data (e.g., fraud detection, downstream updates, audit logging) won’t have those events.
The database has the updated state (e.g., account/loan updated), but Kafka consumers and workflows in West India may behave inconsistently or restart from stale offsets.

🔎 Why This Is a Problem in Banking:

Risk	Impact
Incomplete event history	Audit trails, fraud detection fail
Downstream async services break	e.g., notification, KYC processing, scoring
Kafka consumers in passive region	See a gap in topic partitions
Data loss on event side	Even if DB is consistent, workflows break

✅ Best Practice: ALWAYS REPLICATE KAFKA TOPICS Across Regions — Even in Active-Passive

To avoid this issue:

➤ Use Kafka MirrorMaker 2 or Confluent Replicator to:

Continuously mirror topics from North (Active) to West (Passive).
Keep Kafka offsets, events, and partitions in sync.

This ensures that when North goes down, the West region’s Kafka has all recent events, and services in West can resume seamlessly.

✔️ Updated Flow (with Kafka replication):

pgsql

User 1 → South India AKS → Kafka (South) + DB (South)

↘ MirrorMaker2 →

Kafka (West) ← Azure SQL Geo-replication ← DB (South)

South India goes down

User 1 reconnects → Azure Front Door → West India

→ Kafka (West) has all events

→ DB (West) promoted to read-write

→ Application continues with consistent state + event trail

🧠 Summary

Component	Without Kafka Replication	With Kafka Replication
Kafka topic state	Lost in West India	Fully replicated, ready to consume
Event workflows	Broken, inconsistent	Resume normally
Audit/fraud logs	Missing gaps	Complete
User experience	Retry fails or acts on old data	Retry succeeds with accurate state

✅ Recommendation

Even in active-passive, Kafka topics must be replicated using:

MirrorMaker2 (open-source Kafka)
Confluent Replicator (Confluent Platform)

And make sure:

Kafka consumers in passive region are ready but idle.
After failover, Kafka brokers in passive region become primary, and consumers start immediately.

✅ Keep Kafka consumers in the passive region “ready” but not actively consuming,✅ So that they can immediately take over when the active region fails.

🎯 Goal

Deploy Kafka consumers in the passive region but keep them idle (not consuming messages) until failover is triggered.

✅ Strategies to Keep Consumers Ready but Idle

1. Start Consumers with Paused State

Kafka consumers (in Spring Boot, Java, Python, etc.) can be started and immediately paused using:

java

CopyEdit

consumer.pause(consumer.assignment());

They stay connected to the Kafka cluster but do not fetch records until resumed.
On failover:

java

CopyEdit

consumer.resume(consumer.assignment());

2. Use Feature Flags or Config Switch

Control activation via Spring Cloud Config, Azure App Configuration, or custom flag in your CI/CD pipeline.
For example:

yaml

CopyEdit

consumer.enabled: false # In passive region

Consumers check this flag before starting polling.

3. Assign a Consumer Group but Do Not Subscribe

Initialize consumers with a group ID but do not assign or subscribe to any topics.

java

CopyEdit

// Do not call consumer.subscribe(...) until failover

On failover, the app subscribes to relevant topics and starts consuming from the latest or committed offset.

4. Run Passive Cluster in “Readiness Mode”

In Kubernetes, deploy consumer pods but:
- Set replicas = 0, or
- Use startupProbe to hold them in a non-ready state
- Use an environment variable like ACTIVE_REGION=true/false to toggle behavior

5. Control Partition Assignment in MirrorMaker2

Ensure that MirrorMaker2 syncs topics but does not auto-assign partition ownership in passive consumers.
Only after failover do you allow consumer groups in the passive region to take over partitions.

🛡️ Additional Tips for Safe Failover

Practice	Why It Matters
Ensure idempotent consumers	Replaying messages after failover won't cause duplicates
Commit offsets atomically	Avoid partial commits that cause re-processing
Health-check Kafka topics	Verify topic replication status before switching
Monitor lag in passive region	Helps ensure passive Kafka is fully synced

🔄 Failover Activation Process (Simplified)

Active region fails
Azure Front Door reroutes traffic
Passive consumers:
- Resume polling (or start)
- Subscribe to topics
- Begin consuming messages
Offsets and partitions are picked up from replicated state

✅ Summary

Technique	Result
consumer.pause() on startup	Ready but idle
Feature flags / config switch	Controlled start/stop
No subscription until failover	No consumption activity
Deployment gating in AKS	Controlled pod behavior

🔄 Replication Flow: Active-Passive Setup

🧍 Scenario:

📍 1. Request Routing (Before Failure)

📍 2. Kafka Replication (Active → Passive)

📍 3. Database Replication

📍 4. Configuration and Secrets

📍 5. Monitoring and Observability

🚨 6. Failure Scenario: North India Goes Down

🧠 Key Differences from Active-Active

🧱 Azure Services for Active-Passive

✅ Summary of Flow (Text Format)

🧾 Scenario Recap:

❓What Happens to User 1?

✅ Before Failure:

❌ When Failure Happens:

❓What Happens to User 5?

⚠️ Important Clarification:

🟡 Possibilities:

🔁 Failover Process Summary

⚠️ Design Considerations for Banking (Active-Passive)

✅ Final Outcome

🔁 Scenario Breakdown

❓ What Happens if Kafka Topics Are Not Replicated to West India?

🔻 Result:

🔎 Why This Is a Problem in Banking:

✅ Best Practice: ALWAYS REPLICATE KAFKA TOPICS Across Regions — Even in Active-Passive

➤ Use Kafka MirrorMaker 2 or Confluent Replicator to:

✔️ Updated Flow (with Kafka replication):

🧠 Summary

✅ Recommendation

🎯 Goal

✅ Strategies to Keep Consumers Ready but Idle

1. Start Consumers with Paused State

2. Use Feature Flags or Config Switch

3. Assign a Consumer Group but Do Not Subscribe

4. Run Passive Cluster in “Readiness Mode”

5. Control Partition Assignment in MirrorMaker2

🛡️ Additional Tips for Safe Failover

🔄 Failover Activation Process (Simplified)

✅ Summary

Comments