Modernization Around Core CBS

Anand Nerurkar
Feb 18
17 min read

Updated: Feb 26

🏦 Current State (Typical Bank Using Finacle Modules)

Finacle

Traditional setup:

Channel → Finacle LOS → Finacle LMS → Finacle CBS

Where:

LOS = Origination
LMS = Loan servicing
CBS = Ledger & accounting

Problem:

Tight vendor coupling
Limited agility
Slow product innovation
Heavy customization risk
Upgrade pain

🎯 Target: Digital Lending Modernization Around Core

You introduce a Digital Lending Platform (DLP) layer.

Mobile/Web
    ↓
Digital Lending Platform (Your Layer)
    ↓
LOS / LMS / CBS (as transactional engines)

This layer becomes:

✔ Journey orchestrator✔ Product configurator✔ Rules engine✔ Eligibility engine✔ Document workflow engine✔ API façade

LOS/LMS/CBS become back-end processors.

🧠 What Should Your Digital Layer Own?

1️⃣ Customer Journey Orchestration

Instead of:

LOS controlling workflow

You control:

Step progression
Dynamic form rendering
Pre-approved offers
Consent management
Multi-product bundling

LOS only gets:

Final application payload

2️⃣ Credit Decisioning (Externalized)

Instead of:

Hardcoded LOS rules

You introduce:

Independent decision engine
Real-time bureau calls
Alternative data scoring
Risk-based pricing

This gives agility.

3️⃣ Product Configuration Outside CBS

Rather than:

Finacle product tables driving everything

You maintain:

Product catalog service
Rate computation service
Fee computation service

CBS just books:

Loan account
Schedule
Accounting entries

4️⃣ Event-Driven Loan Lifecycle

Instead of tight LOS-LMS coupling:

Loan Approved → Event
Loan Disbursed → Event
EMI Paid → Event
Delinquent → Event

Your digital layer subscribes and reacts:

Push notifications
Cross-sell triggers
Collection workflow updates

🧩 Architecture Blueprint

Channels
    ↓
API Gateway
    ↓
Digital Lending Microservices
    - Journey Service
    - Eligibility Service
    - Pricing Engine
    - Document Service
    - Offer Service
    ↓
Integration Layer
    ↓
LOS / LMS / CBS

CBS remains:✔ Ledger✔ GL posting✔ Regulatory engine

LOS/LMS become:✔ Transaction processors✔ Data persistence engines

🚨 But Here’s the Critical Design Principle

Do NOT:

Rebuild accounting logic
Recalculate EMI independently of CBS
Duplicate amortization logic
Create shadow balances

Otherwise:You create reconciliation nightmare.

Digital layer must be:

Orchestration + intelligenceNOT financial book of record

🔥 Why This Approach Is Powerful

1️⃣ Vendor Independence

If tomorrow you:

Replace Finacle LMS
Introduce new lending core

Only integration layer changes.

Channels remain untouched.

2️⃣ Faster Product Launch

Want to launch:

BNPL
Instant top-up
Co-lending product

You configure it in digital layer.

LOS just receives structured booking instruction.

3️⃣ Scalable Digital Traffic

Heavy:

Pre-eligibility checks
Simulations
EMI calculators

Should NOT hit CBS.

Digital layer handles it.

⚖ Trade-Offs (Important for CTO Discussion)

Benefit	Risk
Agility	Increased architecture complexity
Decoupling	Need strong governance
Innovation speed	Requires strong integration discipline
Vendor flexibility	Must avoid duplication of core logic

🎯 Strategic Positioning

“Can we modernize digital lending around LOS/LMS/CBS instead of using them end-to-end?”

“Yes. We can introduce a digital lending abstraction layer that owns journey orchestration, decisioning, and product configuration, while LOS/LMS/CBS remain transactional engines. This enables agility and vendor decoupling without compromising ledger integrity.”

That sounds enterprise-ready.

🏦 High-Level Architecture Vision

Core principle:

Digital layer owns journey + intelligence Core banking owns accounting + regulatory ledger

Using:

Finacle/TCS BaNC (CBS – account booking)
Fenergo (KYC / onboarding)
Actimize (Fraud / AML)

🧱 1️⃣ Macro Architecture

[ Mobile / Web App ]
        ↓
[ API Gateway ]
        ↓
[ Digital Lending Platform - Microservices Layer ]
        ↓
[ Event Bus (Kafka / MQ) ]
        ↓
[ Risk / ML / RegTech / CBS Integrations ]

Core components inside your abstraction layer:

Identity Service
Consent Service
Document Service
Application Service
Orchestration Service
Risk Decision Engine
Notification Service
Integration Adapter Layer

🚀 2️⃣ End-to-End Journey Flow (Event-Driven)

Let’s go step by step.

🔐 Step 1 – Login

OAuth2 / OIDC
MFA if needed
Customer profile fetch (cached, not direct CBS hit)

Event:

USER_AUTHENTICATED

📝 Step 2 – Consent Management

Customer accepts:

Data usage
Bureau pull
Income verification
AML screening

Stored in:

Consent DB (immutable store)

Event:

CONSENT_CAPTURED

Important:Consent must be auditable & tamper-proof.

📂 Step 3 – Application Initiated

Customer enters:

Loan amount
Tenure
Employer details
Income details

System generates:

Application Reference ID

Event:

APPLICATION_INITIATED

This triggers the pipeline.

⚙️ 3️⃣ Event-Driven Processing Pipeline

Now the orchestration begins.

🧾 Stage 1 – OCR & Document Processing

Input:

Uploaded salary slip / bank statement

Process:

OCR extraction
Data normalization
Confidence scoring

Event:

OCR_COMPLETED

If confidence low:→ Manual review queue

🪪 Stage 2 – KYC (via Fenergo)

Call:Fenergo API

Checks:

Identity verification
Sanction screening
PEP check

Event:

KYC_COMPLETED

If failed:→ APPLICATION_REJECTED

📊 Stage 3 – Credit Risk Assessment

Parallel execution:

Bureau pull
Bank internal ML model:
- Credit model
- Income stability model
Bank rule engine (DTI, FOIR etc.)

Event:

CREDIT_RISK_COMPLETED

Output:

Risk grade
Recommended limit
Pricing band

🛡 Stage 4 – Fraud & AML Screening

Call:Actimize

Also:

Internal fraud ML model
Device fingerprinting
Velocity checks

Event:

FRAUD_RISK_COMPLETED
AML_SCREENING_COMPLETED

If high risk:→ Escalate to manual investigation queue

🧠 4️⃣ Decision Engine (Central Brain)

Now orchestration service aggregates:

OCR output
KYC status
Credit score
Fraud score
AML status
Internal rule evaluation

Decision logic:

If all green → APPROVED
If moderate risk → REFER
If failed → REJECTED

Event:

LOAN_APPROVED

📜 5️⃣ Loan Agreement & E-Sign

System generates:

Personalized agreement PDF
Risk-based pricing
EMI schedule preview

Customer signs via:

E-sign provider

Event:

LOAN_AGREEMENT_SIGNED

Only after signed → Disbursement allowed.

🏦 6️⃣ CBS Booking (Finacle API Call)

Now your Integration Adapter Layer calls:

Finacle API:

Create customer if new
Create loan account
Create repayment schedule
Post initial disbursement

Critical:

Idempotency key = Application ID
Retry-safe
Exactly-once guarantee

Event:

LOAN_ACCOUNT_CREATED
DISBURSEMENT_SUCCESS

CBS now becomes system of record.

🔔 7️⃣ Notification & Downstream Updates

Events trigger:

SMS / Email
CRM update
Analytics pipeline
Collection strategy assignment
Regulatory reporting update

Event:

NOTIFICATION_SENT

🧩 8️⃣ Event Bus Orchestration Pattern

Important:

Use choreography pattern where possible.

Only use orchestration for:

Critical approval decision
CBS booking coordination

Everything else async.

🛡 9️⃣ Resilience & Control Mechanisms

Must include:

✔ Circuit breaker for CBS✔ Timeout control for Actimize / Fenergo✔ Retry with exponential backoff✔ Dead letter queue✔ Manual review workflow✔ Reconciliation microservice

Never allow:Duplicate disbursement.

📊 10️⃣ Data & Audit Layer

Maintain:

Immutable event store
Application state machine table
Full traceability (for regulator)
Risk decision explainability (model governance)

Critical for compliance.

🧠 11️⃣ Separation of Concerns

Layer	Responsibility
Digital Layer	Journey + Intelligence
RegTech (Fenergo)	Compliance
Risk (Actimize + ML)	Fraud & AML
CBS (Finacle)	Accounting
Event Bus	Decoupling
Notification	Customer communication

🎯 Final Architecture Summary

User
 ↓
Digital Lending Platform
 ↓
Event Pipeline
 ↓
Risk + RegTech + ML
 ↓
Decision Engine
 ↓
Finacle CBS (Ledger)
 ↓
Notifications + Analytics

CBS touched only once:→ At booking & disbursement.

Everything else happens outside.

💎 Why This Architecture Is Strong

✔ Composable✔ Vendor-independent✔ Cloud scalable✔ Upgrade safe✔ AI-friendly✔ Regulator auditable✔ Future core replacement ready

1️⃣ How would you modernize the Finacle ecosystem?

Objective: Add agility, digital capabilities, and integration without disrupting the core CBS (Finacle).

Step-by-Step Approach:

Assess Core Footprint
- Identify Finacle modules in use: CBS, Treasury, Loan, Deposits.
- Map upstream/downstream systems: LOS, LMS, Payments, AML.
Define Bounded Contexts
- Keep Finacle as system of record.
- Create a Digital Lending / Digital Banking Layer as an abstraction layer.
Event-Driven Integration
- Introduce Kafka / Event Bus for all domain events.
- Use Outbox pattern in digital services.
- Digital services subscribe to events for orchestration and analytics.
Adapter Layer
- AxisCBSAdapter → abstracts Finacle APIs (REST / SOAP)
- Handles protocol/message transformation, retries, throttling.
Digital Services
- Microservices for loan journey: KYC, Credit, Fraud, AML, Consent, Notifications.
- Store projections in Postgres / Redis / CosmosDB.
Data Platform
- Raw → Curated → Analytics → Feature Store
- Enable ML for credit, fraud, collections.
Hybrid Modernization Pattern
- Strangler pattern: gradually shift business logic to digital layer.
- Avoid touching core CBS.

2️⃣ How do you ensure event consistency in banking?

Objective: Avoid mismatched states across digital layer and core systems.

Patterns and Approach:

Outbox + Event Bus
- Every state change writes to Outbox table, then published to Kafka.
- Guarantees exactly-once publishing.
Idempotency
- Every command includes idempotency key (loan ID, transaction ID).
- Prevents duplicate processing in LOS/LMS/CBS.
Saga / Choreography
- Use orchestration for multi-step flows (loan application → approval → account creation → disbursement).
- If one service fails, rollbacks or compensating actions are triggered.
Acknowledgement & Retry
- Adapter waits for system ACK before marking event processed.
- Retries with exponential backoff, dead-letter queues for failed events.
Reconciliation
- Near real-time and end-of-day reconciliation service.
- Detects and resolves mismatches across Digital ↔ LOS ↔ LMS ↔ CBS.

3️⃣ How do you design for 99.99% availability?

Objective: Ensure high reliability for critical banking operations.

Step-by-Step Design:

Microservice Resilience
- Stateless services with autoscaling.
- Circuit breakers, bulkheads, rate limiting.
Database Layer
- Multi-AZ deployments, hot standby replicas.
- Use distributed cache (Redis / CosmosDB) for low-latency reads.
Message Bus
- Kafka clusters with replication factor ≥3.
- Multi-region setup if necessary.
Core System Adapter
- Retry, idempotency, failover endpoints.
Monitoring & Alerts
- SLA monitoring, anomaly detection.
- Pager duty integration.
Disaster Recovery
- Multi-region DR for Kafka, databases, core adapters.
- Automatic failover with minimal RPO/RTO.

4️⃣ How would you architect UPI at scale?

Requirements: Millions of TPS, low latency (<1 sec), high reliability.

Step-by-Step Architecture:

API Gateway
- Terminals / apps call stateless UPI gateway.
Event-Driven Transaction Processing
- Transaction events → Kafka → Orchestrator → Core CBS/Ledger.
High-Throughput Ledger
- Use Finacle / TCS Bancs with adapter layer.
- Ensure atomic debit/credit.
Concurrency & Idempotency
- UPI transaction ID = idempotency key.
- Prevent duplicate debits.
Scaling
- Horizontal scaling of gateways & microservices.
- Kafka partitioning by VPA/Bank ID for throughput.
Real-Time Settlement
- NEFT/IMPS/UPI pipelines for inter-bank settlement.
- Event-driven notifications for payer/payee.
Fraud & Risk
- Real-time ML scoring per transaction.

5️⃣ How do you manage cost vs performance tradeoff?

Principles:

Use Cloud Elasticity
- Autoscale for peaks (e.g., EMI dates, salary days).
- Scale down in off-peak.
Caching & Projection
- Redis for real-time journey state.
- Avoid hitting LOS/LMS/CBS for every UI request.
Batch for Non-Critical Work
- EOD batch reconciliation.
- Analytics pipelines in Spark / Databricks.
Prioritize SLAs
- Critical flows (payments, loan approval) → synchronous, high-cost path.
- Non-critical (analytics, dashboards) → asynchronous, low-cost.

6️⃣ How do you run ARB (Application Reconciliation Batch)?

Objective: Detect mismatches between digital layer and LOS/LMS/CBS.

Steps:

Query all domain events in the day.
Compare:
- Loan application IDs, loan amount, status.
- EMI schedule, disbursed amount, ledger entries.
Generate reconciliation report:
- Exceptions flagged
- Operations ticket created
Automate retries for minor issues.
Persist ARB results in audit/logging system for compliance.

7️⃣ How do you balance build vs buy?

Framework:

Buy: LOS, LMS, CBS, Finacle — core transactional systems.
Build: Digital orchestration, ML scoring, event-driven pipelines, adapters.
Criteria:
- Strategic differentiation → build (e.g., digital journey UX, ML models).
- Commodity / stable → buy (e.g., core banking, payments clearing).
- Regulatory compliance → buy unless internal expertise exists.

8️⃣ How do you handle regulatory audit scenario?

Steps:

Immutable Audit Logs
- Store raw events in Data Lake raw zone.
- Append-only, timestamped, versioned.
Lineage Tracking
- All transformations logged: raw → curated → feature → decision.
Reconciliation Evidence
- EOD / ARB reports, exception handling, SLA adherence.
Explainable ML
- Keep feature + model version for each decision (RBI compliance).
Access Control & Governance
- SailPoint for roles.
- Fine-grained audit of who accessed or modified data.
Regulatory Reporting
- Export dashboards and reports to regulator-ready formats.

✅ Summary Table for Interview

Question	Key Answer Pillars
Finacle Modernization	Digital Layer + Adapters + Event-driven + Strangler pattern
Event Consistency	Outbox, Kafka, Saga, Idempotency, Reconciliation
99.99% Availability	Resilient microservices, multi-AZ DB, Kafka replication, DR
UPI at Scale	API Gateway, Event-driven, Idempotency, Real-time ML
Cost vs Performance	Autoscaling, caching, batch vs real-time, SLA prioritization
ARB	Event comparison, reconciliation reports, ops ticketing
Build vs Buy	Strategic differentiation → Build, Commodity → Buy
Regulatory Audit	Immutable logs, lineage, explainable ML, role-based access, reports

1️⃣ Hybrid Architecture (On-Prem Finacle + Cloud Microservices)

Objective: Modernize banking operations without disrupting Finacle, while leveraging cloud agility.

Step-by-Step Approach:

Assess Core Footprint
- On-prem Finacle = system of record, immutable business logic.
- Identify modules to modernize: CBS, Loan, Treasury, Payments.
Define Hybrid Boundary
- Keep Finacle on-prem for critical core banking.
- Move digital services, analytics, ML scoring, orchestration, notifications to cloud.
Integration Layer
- Introduce AxisCBSAdapter on bank side:
  - Handles REST / SOAP calls
  - Protocol transformation
  - Retry / throttling / idempotency
- Adapter ensures cloud services don’t directly touch Finacle.
Data Management
- Cloud databases for digital layer:
  - Postgres (relational state)
  - Redis (caching, low-latency reads)
  - CosmosDB (geo-redundant, multi-region)
- Event-driven updates ensure consistency with Finacle.
Network & Security
- Use secure VPN / ExpressRoute / PrivateLink for cloud ↔ on-prem traffic.
- Apply zero-trust principles for microservices.

2️⃣ API Gateway Pattern

Objective: Centralized entry-point for all digital traffic.

Step-by-Step:

Expose Microservices
- All digital microservices (loan-svc, kyc-svc, fraud-svc, aml-svc) exposed via API Gateway.
Functions of API Gateway
- Authentication & authorization (Azure AD / OAuth2 / JWT)
- Rate limiting & throttling
- Routing to services
- Aggregation of responses (for multi-service calls)
- Protocol translation (HTTP → gRPC / REST)
Benefits
- Single entry-point for apps & UPI APIs.
- Shield core Finacle from direct exposure.
- Enables analytics on traffic (request counts, latencies).

3️⃣ Event-Driven Using Kafka

Objective: Loose coupling, scalability, eventual consistency.

Step-by-Step:

Event Sourcing Pattern
- Digital microservices emit domain events (loan-initiated, KYC-verified, loan-approved).
- Outbox pattern ensures exactly-once delivery.
Kafka Topics
- One topic per domain event type:
  loan-initiated-event kyc-verified-event credit-score-verified-event loan-account-created-event
Consumers
- Orchestration services, audit service, data lake pipelines, and adapters subscribe to events.
- Enables real-time processing, ML scoring, and reconciliation.
Advantages
- High throughput, fault-tolerant.
- Microservices don’t call each other synchronously → avoids tight coupling.

4️⃣ Service Mesh

Objective: Simplify microservice communication, observability, and security in a cloud environment.

Step-by-Step:

Deploy a Service Mesh (e.g., Istio / Linkerd) in Kubernetes / OpenShift:
- Sidecar proxies manage service-to-service traffic.
- Provides mTLS encryption.
- Handles service discovery, retries, load balancing.
Observability Features
- Distributed tracing (Jaeger)
- Metrics collection (Prometheus)
- Traffic routing / blue-green / canary deployments
Benefits
- Decouples networking concerns from application logic.
- Standardizes resilience policies across services (retry, timeout, circuit breaker).

5️⃣ Observability Stack

Objective: Monitor microservices and hybrid environment for performance and failures.

Step-by-Step:

Metrics
- Use Prometheus / Grafana to capture CPU, memory, request latency, error rates.
Tracing
- Distributed tracing (Jaeger / OpenTelemetry)
- Trace events from UI → Digital Layer → Adapters → Finacle
Logging
- Centralized logging (ELK / EFK stack)
- Include structured logs, correlation IDs, and request IDs.
Alerting
- SLA / SLO violations trigger alerts.
- PagerDuty / OpsGenie integration.
Business Observability
- Track key banking KPIs (loan approvals, disbursements, failures) in real time.

6️⃣ SRE (Site Reliability Engineering) Model

Objective: Ensure 99.99% availability and operational excellence.

Step-by-Step:

Define SLIs / SLOs / SLAs
- Example: Loan approval API latency < 2s 99.9% of the time.
- Availability of digital layer = 99.99%.
Automated Incident Management
- Self-healing microservices with retry + circuit breaker.
- Reconciliation service detects mismatches and triggers ops tickets automatically.
Capacity Planning
- Use auto-scaling based on load.
- Pre-provisioned Kafka partitions & replicas for peak banking hours.
Change Management
- Canary releases / blue-green deployments.
- Reduce risk for production changes in hybrid setup.
Postmortems & Learning
- Every outage / mismatch triggers blameless postmortem.
- Feed improvements back to system design.

🔹 Combined Hybrid Architecture Flow (Axis Bank Style)

User Apps / UPI API
        ↓
   API Gateway
        ↓
Digital Layer Microservices
        ↓
Event Bus (Kafka) ←→ Service Mesh (Istio)
        ↓
Adapters (LOS / LMS / CBS)
        ↓
On-Prem Systems (Finacle / LOS / LMS)
        ↓
Data Lake → Analytics → Feature Store (ML)
        ↓
Observability Stack + SRE dashboards

✅ This ensures:

Hybrid integration (on-prem + cloud)
Event consistency
Observability
High availability
Cloud-native microservice patterns

🏦 Typical ABC Bank-Style Hybrid Deployment Model

🔹 On-Prem (ABC Bank Data Center)

This is where regulated, core, and legacy systems stay.

Deployed On-Prem:

CBS (like Finacle)
LOS
LMS
Enterprise Service Bus (if legacy)
Core payment switch
Core DB clusters
Core adapters (often)
Security & HSM modules

Why On-Prem?

Regulatory requirements
Data residency
Tight control on financial ledger
Lower latency to core systems
Vendor support model

CBS is always treated as System of Record and rarely moved fully to cloud in traditional banks.

☁️ Cloud (Digital Modernization Layer)

This is where innovation happens.

Deployed in Cloud:

Digital lending microservices (loan-svc, kyc-svc, fraud-svc)
Orchestration service
Kafka (managed cluster)
API Gateway
ML scoring services
Feature store
Data lake
Redis / Postgres / CosmosDB
Observability stack
Reconciliation engine

Why Cloud?

Elastic scaling (UPI peaks, EMI days)
Faster deployments
Microservices-friendly
Lower infra management overhead
AI/ML workloads

🔄 How They Connect (Very Important)

You NEVER expose Finacle directly to cloud.

Instead:

Cloud Digital Layer
        ↓
Secure VPN / Private Link / ExpressRoute
        ↓
On-Prem Adapter Layer
        ↓
Finacle CBS

🧩 Where Should Adapter Be Deployed?

There are two common models:

✅ Model 1 (Most Common in Banks)

Adapter deployed On-Prem

Cloud Digital
      ↓
Secure Channel
      ↓
AxisCBSAdapter (On-Prem)
      ↓
Finacle CBS

Why?

Keeps Finacle insulated
Better latency (adapter close to CBS)
Easier protocol transformation
Centralized throttling & control
Security boundary protection

This is the safest enterprise model.

⚡ Model 2 (Less Common)

Adapter in Cloud.

But this increases:

Security exposure
Network dependency risk
Compliance complexity

Most large banks prefer adapter near core.

📍 Final Deployment Architecture

On-Prem Zone

Finacle CBS
LOS
LMS
CBS Adapter
LOS Adapter
LMS Adapter

Cloud Zone

API Gateway
Digital Microservices
Kafka
ML Services
Data Lake
Reconciliation Engine
Monitoring stack

🛡 Security Considerations

Between Cloud ↔ On-Prem:

Mutual TLS
IP whitelisting
Private connectivity (not public internet)
Strict firewall rules
Rate limiting at adapter layer

🎯

Where would you deploy Finacle and adapters in hybrid setup?

“In a hybrid model, Finacle CBS remains on-prem as the financial system of record. We deploy the CBS adapter on-prem as well, close to Finacle, to handle protocol transformation, resiliency, and security control. The digital microservices layer runs in cloud for scalability and agility, connected via secure private network links.”

How to ensure Reliability in Banking

====

🎯 Step 1: Define What Reliability Means in Banking

In banking, reliability means:

No data loss
No duplicate transactions
No financial inconsistencies
High availability (99.99%+)
Predictable performance
Fast recovery from failure

So reliability = Correctness + Availability + Resilience + Recoverability

🏗 Step 2: Reliability at Each Layer

We ensure reliability across 7 layers.

1️⃣ Infrastructure Reliability

In Cloud (Digital Layer)

Multi-AZ deployment
Auto-scaling groups
Managed Kubernetes (AKS / EKS)
Load balancers with health checks
Distributed Kafka cluster (replication factor ≥ 3)

On-Prem (CBS Side)

Active-passive or active-active CBS
Database replication
Redundant network paths
Dual firewalls

2️⃣ Service-Level Reliability (Microservices)

Each microservice must:

✅ Be Stateless

No session stored locally.

✅ Use Circuit Breaker

If CBS is slow:

Stop calling
Return fallback response
Prevent cascading failure

✅ Timeouts + Retries

Set strict timeout (e.g., 2s)
Retry with exponential backoff
Max retry threshold

✅ Bulkhead Pattern

Separate connection pools for:

CBS calls
LOS calls
LMS calls

Prevents one failure from affecting entire system.

3️⃣ Data Reliability

This is most critical in banking.

🔐 Idempotency

Every request includes:

X-Idempotency-Key = LoanID or TransactionID

Prevents duplicate loan creation.

🔄 Outbox Pattern

When service updates DB:

Save business data
Save event in outbox table
Background process publishes event to Kafka

Guarantees:

No event loss
No partial update

📦 Kafka Reliability

Replication factor ≥ 3
Acknowledgment level = ALL
Dead letter queues for failed messages
Consumer offset tracking

4️⃣ Transaction Reliability (Saga Pattern)

Loan creation involves:

Loan approval
CBS account creation
LMS loan schedule
Disbursement

We use Orchestrated Saga Pattern:

If CBS fails:

Compensate → mark loan as FAILED
Do not proceed to LMS

This prevents inconsistent state.

5️⃣ Hybrid Network Reliability

Between Cloud ↔ On-Prem:

Private connectivity (ExpressRoute / MPLS)
Mutual TLS
Retry logic at adapter
Secondary failover endpoint

If primary CBS endpoint fails:

Switch to secondary

6️⃣ Monitoring & Observability

Reliability without visibility is impossible.

Metrics

API latency
Error rate
CBS response time
Kafka lag

Tracing

Track full journey:

User → Gateway → LoanSvc → Adapter → CBS

Alerting

SLA breach alerts
Kafka consumer lag alerts
DB replication lag alerts

7️⃣ Recovery & Reconciliation

Even with best design, failures happen.

So we implement:

Near Real-Time Reconciliation

Digital vs LOS vs LMS vs CBS comparison.

End-of-Day ARB

Financial reconciliation.

Replay Capability

Kafka allows replaying events from offset.

Manual Override Dashboard

Ops team can:

Retry
Reconcile
Re-trigger events

🛡 Reliability Example Scenario

Scenario:

Loan account created in CBS but response lost.

Without reliability:→ Duplicate loan risk.

With reliability:

Adapter uses idempotency key.
If retry happens:
- CBS detects duplicate
- Returns existing loan account
Reconciliation confirms consistency.

No financial corruption.

📊

How will you ensure reliability in hybrid banking architecture?

You say:

“I design reliability across infrastructure, service, data, and operational layers. I ensure stateless microservices with circuit breakers, idempotent APIs, outbox-based event publishing, Kafka replication, saga-based transaction management, and continuous reconciliation between digital and core systems. Additionally, we implement observability and automated recovery mechanisms to maintain 99.99% availability.”

That answer shows maturity.

🧠 Bonus: Reliability Pyramid

Infrastructure Stability
        ↓
Service Resilience
        ↓
Data Consistency
        ↓
Transaction Integrity
        ↓
Monitoring & Recovery

1️⃣ How Do You Run ARB? (Architecture Review Board)

ARB is not a meeting.It’s a governance mechanism to control architecture quality, risk, cost, and alignment.

🎯 Step 1: Define ARB Charter

Clearly define:

Scope (All new systems? Only Tier-1 changes?)
Review triggers:
- New platform
- Major integration
- Cloud adoption
- Vendor onboarding
- Regulatory-impacting change

Without scope clarity, ARB becomes chaos.

🎯 Step 2: Standardized Submission Template

Every proposal must include:

Business objective
Current architecture
Proposed architecture diagram
NFRs (availability, performance, RTO/RPO)
Security model
Data classification
Integration points
Cost estimate
Build vs Buy analysis
Risk assessment

This prevents emotional decisions.

🎯 Step 3: Structured Review Dimensions

ARB evaluates across 7 pillars:

Pillar	What We Check
Alignment	Does it align with enterprise target architecture?
Security	IAM, encryption, PII handling
Reliability	HA, DR, resiliency patterns
Integration	Event-driven? APIs? Tight coupling?
Data	Duplication? Data ownership?
Cost	Capex/Opex impact
Compliance	RBI/GDPR/SOX implications

🎯 Step 4: Decision Outcomes

ARB decisions should be:

Approved
Approved with conditions
Rework required
Rejected

And everything documented.

No informal approvals.

🎯 Step 5: Post-Approval Governance

ARB doesn’t end after approval.

You track:

Design compliance
Drift detection
Production alignment

Otherwise teams deviate later.

2️⃣ What Do You Evaluate?

This is critical.

🏗 Architecture Evaluation Areas

1️⃣ Technical Fit

Does it align with hybrid architecture?
Does it reuse shared services (Kafka, API Gateway)?
Is it introducing a new stack unnecessarily?

2️⃣ NFR Coverage

Availability target?
Scaling model?
Latency expectations?
DR plan?

3️⃣ Integration Strategy

REST vs Event?
Synchronous dependencies?
Risk of cascading failures?

4️⃣ Data Governance

Source of truth defined?
Data duplication?
Audit trail available?

5️⃣ Operational Readiness

Monitoring defined?
SLOs documented?
Support model defined?

6️⃣ Vendor Risk (If Buy)

Lock-in risk?
Exit strategy?
SLA commitment?

7️⃣ Long-Term Sustainability

Tech roadmap?
Skills availability?
Community support?

3️⃣ How Do You Prevent Tech Sprawl?

Tech sprawl = uncontrolled tools, frameworks, vendors.

It kills maintainability and increases cost.

🎯 Step 1: Define Approved Technology Stack

Example:

Backend: Java / .NET
Messaging: Kafka
Cache: Redis
DB: Postgres
Observability: Prometheus + Grafana
API Gateway: Standardized

No arbitrary new tools without ARB approval.

🎯 Step 2: Platform Engineering Model

Provide shared platforms:

Shared CI/CD
Shared Kafka cluster
Shared Kubernetes
Shared observability stack

When platform is easy, teams won’t build their own.

🎯 Step 3: Reuse-First Principle

Before approving new tech, ask:

Can existing platform solve this?
Is 80% solution acceptable?

🎯 Step 4: Periodic Rationalization

Every 6–12 months:

List all tools in use
Identify duplicates
Decommission low-usage systems

This prevents entropy.

4️⃣ How Do You Handle Deviations?

Deviations are inevitable.

What matters is how you control them.

🎯 Step 1: Categorize Deviation

Type	Example
Minor	Version mismatch
Medium	Using alternate DB
Major	Introducing new event bus

🎯 Step 2: Temporary vs Permanent

If deviation is needed:

Document reason
Define sunset timeline
Define rollback plan

Never allow undocumented permanent deviation.

🎯 Step 3: Risk Assessment

Evaluate:

Security impact?
Operational complexity?
Compliance violation?
Future integration risk?

🎯 Step 4: Compensating Controls

If deviation allowed:

Additional monitoring
Extra documentation
Restricted scope
Periodic review

🎯 Step 5: Governance Tracking

Maintain:

Architecture Deviation Register

Include:

System name
Deviation type
Risk level
Expiry date
Owner

Without tracking, deviations become architecture debt.

🎤

How do you run ARB and prevent sprawl?

“We operate ARB with structured submission templates and evaluate proposals across alignment, security, NFRs, integration, cost, and compliance. We enforce a standardized enterprise tech stack and promote reuse via platform engineering. Deviations are documented, risk-assessed, time-bound, and tracked in a deviation register to prevent architectural entropy.”

Modernization Around MainFrame

-------

🎯 Strong Leadership Answer Framework

1️⃣ Acknowledge the Reality

“Absolutely — mainframe teams are usually protective because they’ve built and maintained the system for 15–20 years. There is institutional knowledge and natural resistance.”

2️⃣ Address the “You Don’t Understand” Concern

“We never approached modernization with the mindset that we knew better. In fact, the first phase was reverse engineering and knowledge acquisition.”

you actually did:

Conducted application portfolio assessment
Identified COBOL modules, JCL jobs, CICS transactions
Mapped batch vs online flows
Created dependency maps
Captured data lineage
Built functional decomposition

This shows depth.

3️⃣ Did You Learn It Yourself?

This is where you differentiate as a leader.

Do NOT say “I learned COBOL myself.”

Say this:

“As a technology leader, I didn’t try to become a COBOL expert. Instead, I ensured we had hybrid squads — legacy SMEs + modern engineers — and I personally drove structured knowledge capture workshops.”

“However, I invested time to understand high-level mainframe architecture, batch scheduling, VSAM datasets, and integration touchpoints so that modernization decisions were informed.”

That balances hands-on awareness + leadership positioning.

4️⃣ How Did You Handle Non-Cooperation?

“Resistance reduced significantly when we reframed modernization from ‘replacement’ to ‘risk reduction and performance uplift’. We involved mainframe SMEs in design reviews and made them co-owners of the future state.”

Psychology > Technology.

Also mention:

Incentivized knowledge documentation
Formal KT sign-offs
Shadow runs
Parallel production runs

5️⃣ Who Validated If You Didn’t Know?

This is the most powerful part.

Say:

“Validation was not individual-driven — it was systemic.”

Then explain:

Automated regression testing
Data reconciliation scripts
Parallel run comparisons
Output parity validation
Audit logs
Business sign-off checkpoints

For BFSI especially:

Reconciliation at field level
Batch output comparison
T+1 reconciliation reports

That shows governance maturity.

6️⃣ Address Time Consumption

Say:

“Yes, modernization is time-consuming if done blindly. We reduced risk and timeline by using a strangler pattern approach rather than big-bang replacement.”

Mention:

API wrapping
Incremental module migration
Domain-driven segmentation
Event-driven extraction

💎

“Mainframe modernization always comes with knowledge concentration and resistance. We addressed this by first building deep system understanding through reverse engineering workshops and dependency mapping. Rather than positioning modernization as replacement, we positioned it as risk reduction and performance uplift. We formed hybrid squads combining mainframe SMEs and cloud-native engineers. Validation was systematic — automated regression suites, parallel run comparisons, and reconciliation at data and transaction levels ensured parity. Instead of big-bang migration, we adopted a strangler approach — gradually extracting domains via APIs and microservices. My role was to ensure technical correctness, stakeholder alignment, and controlled risk — not to become a COBOL developer, but to orchestrate a structured transition.”