top of page

Active Passive Set Up Scenario

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • May 16
  • 7 min read

In an active-passive setup, the replication flow and failover mechanism are quite different from active-active. Here’s how it works — step by step — in text format for a banking platform deployed on Azure AKS with Kafka, databases, and observability.


🔄 Replication Flow: Active-Passive Setup

🧍 Scenario:

  • User 1 is served by the active region (North India).

  • West India is configured as a standby/passive region.

  • Only one region handles traffic at a time.

  • If North India fails, the West India region is activated.

📍 1. Request Routing (Before Failure)

  • All user requests are routed to the North India AKS cluster (active).

  • West India AKS is running, but does not serve live traffic (except health probes).

  • Azure Front Door or Traffic Manager is configured to send all production traffic to North India.

📍 2. Kafka Replication (Active → Passive)

  • Kafka is deployed in North India only for active processing.

  • Kafka MirrorMaker 2 replicates all topics and partitions asynchronously to the West India Kafka cluster.

  • West India Kafka does not accept writes — it mirrors only.


📍 3. Database Replication

  • Azure SQL or Cosmos DB is geo-replicated:

    • Primary write replica in North India.

    • Read-only replicas in West India.

  • No writes happen in West India unless a manual failover is triggered.

📍 4. Configuration and Secrets

  • App configs and secrets (Azure App Configuration, Azure Key Vault) are synced across regions using CI/CD or GitOps pipelines.

  • Both clusters have the same app code and infra, but only North India is receiving traffic.

📍 5. Monitoring and Observability

  • Observability stack (Prometheus, Grafana, ELK) is deployed in both regions, but only North India is active.

  • Metrics are pushed from North India; West India components remain in hot standby.

🚨 6. Failure Scenario: North India Goes Down

  1. Azure Front Door detects North India failure via probe.

  2. Manual or automated failover process is initiated:

    • West India becomes active:

      • Kafka in West India becomes the primary producer/consumer cluster.

      • Azure SQL failover is triggered to promote West India DB as primary.

  3. User requests now route to West India AKS.

  4. All microservices start processing transactions using the now-active West India cluster.

🧠 Key Differences from Active-Active

Feature

Active-Active

Active-Passive

Traffic distribution

Both regions live

One region handles 100% traffic

Failover type

Seamless, health-based (auto)

Manual or semi-automated

Data sync

Bi-directional Kafka, DB writes both

One-way Kafka/DB sync (active → passive)

Resource utilization

High (both active)

Lower (one idle)

Recovery time

Instant (seconds)

Minutes (depends on automation)

Cost

Higher

Lower

🧱 Azure Services for Active-Passive

Layer

Service

Function

Traffic Mgmt

Azure Front Door / Traffic Manager

Failover routing

Message Bus

Kafka + MirrorMaker 2

Topic replication from active → passive

Database

Azure SQL Geo-Replication

Standby read-replica, failover when needed

Config Mgmt

Azure App Config + GitOps

Keep apps synced across regions

Secrets

Azure Key Vault

Synced or staged across regions

✅ Summary of Flow (Text Format)

sql

CopyEdit

User 1 → Azure Front Door → South India AKS → Kafka (South) → DB (South)

MirrorMaker2 → West India Kafka (read-only)

DB Geo-Replication → West India DB (read-only)


On failure:

Azure Front Door → West India AKS → West India Kafka (now primary) → DB (failover to primary)


Active Passive Set up - Scenario

===


Let’s walk through what happens in an Active-Passive setup for your scenario. Here’s a detailed impact and flow breakdown:

🧾 Scenario Recap:

  • User 1 is doing a transaction in North India (active region)

  • User 5 is doing a transaction in West India (passive region)

  • Suddenly, North India region fails

❓What Happens to User 1?

✅ Before Failure:

  • User 1’s request goes to North India AKS, hits:

    • Spring Boot microservices

    • Kafka (South India brokers)

    • Azure SQL (South India primary DB)

❌ When Failure Happens:

  • In-progress transaction may:

    • Fail temporarily if not completed before the region crash.

    • Be partially committed (needs retry or reconciliation).

    • Be lost if not yet replicated (depends on DB/Kafka sync delay).

Recovery:
  • After manual failover, User 1 will be routed to West India, where:

    • The latest Kafka topic state (replicated via MirrorMaker2) is available.

    • The database (Azure SQL) is promoted as primary in West India.

❓What Happens to User 5?

⚠️ Important Clarification:

  • In Active-Passive, only the active region handles traffic.

  • So under normal operations, User 5 should not have been served by the passive region.

🟡 Possibilities:

  • If West India is truly passive:

    • User 5's request should be rejected or rerouted by Azure Front Door.

  • If User 5 reached West India AKS somehow (e.g., misconfiguration):

    • The app might have been running but Kafka and DB were in read-only state, leading to:

      • Errors

      • Incomplete processing

      • Logging without persistence


🔁 Failover Process Summary

Step

Description

1. Region failure

North India becomes unresponsive

2. Health check triggers

Azure Front Door/Traffic Manager detects failure

3. Reroute traffic

All traffic now routed to West India

4. DB failover

Azure SQL read replica in West India is promoted to read-write primary

5. Kafka role switch

Kafka in West India is activated (if prepared) or takes over as primary

6. Users resume

Both User 1 and User 5 now operate from West India

⚠️ Design Considerations for Banking (Active-Passive)

Aspect

Recommendation

User Stickiness

Use session affinity + re-auth if region changes

Idempotency

Ensure APIs and Kafka consumers can retry safely

Data Sync

Keep replication delay minimal (seconds) for Kafka and DB

DR Drills

Run quarterly simulations to test seamless switch

Failure Recovery

Queue uncommitted requests, reconcile after failover

✅ Final Outcome

User

Initial Region

Impact After Failure

Status Post-Failover

User 1

North India

Interrupted mid-transaction

Routed to West, retries resume

User 5

West India

Likely rejected (if passive)



In active passive , user 1 done txn in active region north india, west india is passive, north india in outage and user 1 conenct again, in that case user 1 reroute to passive region, but passive region no kafka topic replicated, only db synch ??


Excellent question — this goes to the heart of a critical design challenge in active-passive systems. Let’s unpack it clearly:

🔁 Scenario Breakdown

  • User 1 did a transaction in north India (Active).

  • Kafka events related to that transaction were only written in north India.

  • Azure SQL (or other DB) was replicated to West India using geo-replication.

  • north India goes down.

  • User 1 reconnects → Azure Front Door routes to West India (Passive).

❓ What Happens if Kafka Topics Are Not Replicated to West India?

🔻 Result:

  • User 1’s transaction events (Kafka-based) are not available in West India.

  • Services relying on event-driven data (e.g., fraud detection, downstream updates, audit logging) won’t have those events.

  • The database has the updated state (e.g., account/loan updated), but Kafka consumers and workflows in West India may behave inconsistently or restart from stale offsets.

🔎 Why This Is a Problem in Banking:

Risk

Impact

Incomplete event history

Audit trails, fraud detection fail

Downstream async services break

e.g., notification, KYC processing, scoring

Kafka consumers in passive region

See a gap in topic partitions

Data loss on event side

Even if DB is consistent, workflows break

✅ Best Practice: ALWAYS REPLICATE KAFKA TOPICS Across Regions — Even in Active-Passive

To avoid this issue:

Use Kafka MirrorMaker 2 or Confluent Replicator to:

  • Continuously mirror topics from North (Active) to West (Passive).

  • Keep Kafka offsets, events, and partitions in sync.

This ensures that when North goes down, the West region’s Kafka has all recent events, and services in West can resume seamlessly.

✔️ Updated Flow (with Kafka replication):

pgsql

User 1 → South India AKS → Kafka (South) + DB (South)

↘ MirrorMaker2 →

Kafka (West) ← Azure SQL Geo-replication ← DB (South)


South India goes down


User 1 reconnects → Azure Front Door → West India

→ Kafka (West) has all events

→ DB (West) promoted to read-write

→ Application continues with consistent state + event trail


🧠 Summary

Component

Without Kafka Replication

With Kafka Replication

Kafka topic state

Lost in West India

Fully replicated, ready to consume

Event workflows

Broken, inconsistent

Resume normally

Audit/fraud logs

Missing gaps

Complete

User experience

Retry fails or acts on old data

Retry succeeds with accurate state

✅ Recommendation

Even in active-passive, Kafka topics must be replicated using:

  • MirrorMaker2 (open-source Kafka)

  • Confluent Replicator (Confluent Platform)

And make sure:

  • Kafka consumers in passive region are ready but idle.

  • After failover, Kafka brokers in passive region become primary, and consumers start immediately.


Keep Kafka consumers in the passive region “ready” but not actively consuming,✅ So that they can immediately take over when the active region fails.

🎯 Goal

Deploy Kafka consumers in the passive region but keep them idle (not consuming messages) until failover is triggered.

✅ Strategies to Keep Consumers Ready but Idle

1. Start Consumers with Paused State

  • Kafka consumers (in Spring Boot, Java, Python, etc.) can be started and immediately paused using:

java

CopyEdit

consumer.pause(consumer.assignment());

  • They stay connected to the Kafka cluster but do not fetch records until resumed.

  • On failover:

java

CopyEdit

consumer.resume(consumer.assignment());

2. Use Feature Flags or Config Switch

  • Control activation via Spring Cloud Config, Azure App Configuration, or custom flag in your CI/CD pipeline.

  • For example:

yaml

CopyEdit

consumer.enabled: false  # In passive region

  • Consumers check this flag before starting polling.

3. Assign a Consumer Group but Do Not Subscribe

  • Initialize consumers with a group ID but do not assign or subscribe to any topics.

java

CopyEdit

// Do not call consumer.subscribe(...) until failover

  • On failover, the app subscribes to relevant topics and starts consuming from the latest or committed offset.

4. Run Passive Cluster in “Readiness Mode”

  • In Kubernetes, deploy consumer pods but:

    • Set replicas = 0, or

    • Use startupProbe to hold them in a non-ready state

    • Use an environment variable like ACTIVE_REGION=true/false to toggle behavior

5. Control Partition Assignment in MirrorMaker2

  • Ensure that MirrorMaker2 syncs topics but does not auto-assign partition ownership in passive consumers.

  • Only after failover do you allow consumer groups in the passive region to take over partitions.

🛡️ Additional Tips for Safe Failover

Practice

Why It Matters

Ensure idempotent consumers

Replaying messages after failover won't cause duplicates

Commit offsets atomically

Avoid partial commits that cause re-processing

Health-check Kafka topics

Verify topic replication status before switching

Monitor lag in passive region

Helps ensure passive Kafka is fully synced

🔄 Failover Activation Process (Simplified)

  1. Active region fails

  2. Azure Front Door reroutes traffic

  3. Passive consumers:

    • Resume polling (or start)

    • Subscribe to topics

    • Begin consuming messages

  4. Offsets and partitions are picked up from replicated state

✅ Summary

Technique

Result

consumer.pause() on startup

Ready but idle

Feature flags / config switch

Controlled start/stop

No subscription until failover

No consumption activity

Deployment gating in AKS

Controlled pod behavior


 
 
 

Recent Posts

See All
Ops Efficiency 30 % improvement

how did you achieve 30 % operational efficiency Achieving 30% operational efficiency in a BFSI-grade, microservices-based personal...

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page