Modernization Around Core CBS
- Anand Nerurkar
- Feb 18
- 17 min read
Updated: Feb 26
🏦 Current State (Typical Bank Using Finacle Modules)
Finacle
Traditional setup:
Channel → Finacle LOS → Finacle LMS → Finacle CBS
Where:
LOS = Origination
LMS = Loan servicing
CBS = Ledger & accounting
Problem:
Tight vendor coupling
Limited agility
Slow product innovation
Heavy customization risk
Upgrade pain
🎯 Target: Digital Lending Modernization Around Core
You introduce a Digital Lending Platform (DLP) layer.
Mobile/Web
↓
Digital Lending Platform (Your Layer)
↓
LOS / LMS / CBS (as transactional engines)
This layer becomes:
✔ Journey orchestrator✔ Product configurator✔ Rules engine✔ Eligibility engine✔ Document workflow engine✔ API façade
LOS/LMS/CBS become back-end processors.
🧠 What Should Your Digital Layer Own?
1️⃣ Customer Journey Orchestration
Instead of:
LOS controlling workflow
You control:
Step progression
Dynamic form rendering
Pre-approved offers
Consent management
Multi-product bundling
LOS only gets:
Final application payload
2️⃣ Credit Decisioning (Externalized)
Instead of:
Hardcoded LOS rules
You introduce:
Independent decision engine
Real-time bureau calls
Alternative data scoring
Risk-based pricing
This gives agility.
3️⃣ Product Configuration Outside CBS
Rather than:
Finacle product tables driving everything
You maintain:
Product catalog service
Rate computation service
Fee computation service
CBS just books:
Loan account
Schedule
Accounting entries
4️⃣ Event-Driven Loan Lifecycle
Instead of tight LOS-LMS coupling:
Loan Approved → Event
Loan Disbursed → Event
EMI Paid → Event
Delinquent → Event
Your digital layer subscribes and reacts:
Push notifications
Cross-sell triggers
Collection workflow updates
🧩 Architecture Blueprint
Channels
↓
API Gateway
↓
Digital Lending Microservices
- Journey Service
- Eligibility Service
- Pricing Engine
- Document Service
- Offer Service
↓
Integration Layer
↓
LOS / LMS / CBS
CBS remains:✔ Ledger✔ GL posting✔ Regulatory engine
LOS/LMS become:✔ Transaction processors✔ Data persistence engines
🚨 But Here’s the Critical Design Principle
Do NOT:
Rebuild accounting logic
Recalculate EMI independently of CBS
Duplicate amortization logic
Create shadow balances
Otherwise:You create reconciliation nightmare.
Digital layer must be:
Orchestration + intelligenceNOT financial book of record
🔥 Why This Approach Is Powerful
1️⃣ Vendor Independence
If tomorrow you:
Replace Finacle LMS
Introduce new lending core
Only integration layer changes.
Channels remain untouched.
2️⃣ Faster Product Launch
Want to launch:
BNPL
Instant top-up
Co-lending product
You configure it in digital layer.
LOS just receives structured booking instruction.
3️⃣ Scalable Digital Traffic
Heavy:
Pre-eligibility checks
Simulations
EMI calculators
Should NOT hit CBS.
Digital layer handles it.
⚖ Trade-Offs (Important for CTO Discussion)
Benefit | Risk |
Agility | Increased architecture complexity |
Decoupling | Need strong governance |
Innovation speed | Requires strong integration discipline |
Vendor flexibility | Must avoid duplication of core logic |
🎯 Strategic Positioning
“Can we modernize digital lending around LOS/LMS/CBS instead of using them end-to-end?”
“Yes. We can introduce a digital lending abstraction layer that owns journey orchestration, decisioning, and product configuration, while LOS/LMS/CBS remain transactional engines. This enables agility and vendor decoupling without compromising ledger integrity.”
That sounds enterprise-ready.
🏦 High-Level Architecture Vision
Core principle:
Digital layer owns journey + intelligence Core banking owns accounting + regulatory ledger
Using:
Finacle/TCS BaNC (CBS – account booking)
Fenergo (KYC / onboarding)
Actimize (Fraud / AML)
🧱 1️⃣ Macro Architecture
[ Mobile / Web App ]
↓
[ API Gateway ]
↓
[ Digital Lending Platform - Microservices Layer ]
↓
[ Event Bus (Kafka / MQ) ]
↓
[ Risk / ML / RegTech / CBS Integrations ]
Core components inside your abstraction layer:
Identity Service
Consent Service
Document Service
Application Service
Orchestration Service
Risk Decision Engine
Notification Service
Integration Adapter Layer
🚀 2️⃣ End-to-End Journey Flow (Event-Driven)
Let’s go step by step.
🔐 Step 1 – Login
OAuth2 / OIDC
MFA if needed
Customer profile fetch (cached, not direct CBS hit)
Event:
USER_AUTHENTICATED
📝 Step 2 – Consent Management
Customer accepts:
Data usage
Bureau pull
Income verification
AML screening
Stored in:
Consent DB (immutable store)
Event:
CONSENT_CAPTURED
Important:Consent must be auditable & tamper-proof.
📂 Step 3 – Application Initiated
Customer enters:
Loan amount
Tenure
Employer details
Income details
System generates:
Application Reference ID
Event:
APPLICATION_INITIATED
This triggers the pipeline.
⚙️ 3️⃣ Event-Driven Processing Pipeline
Now the orchestration begins.
🧾 Stage 1 – OCR & Document Processing
Input:
Uploaded salary slip / bank statement
Process:
OCR extraction
Data normalization
Confidence scoring
Event:
OCR_COMPLETED
If confidence low:→ Manual review queue
🪪 Stage 2 – KYC (via Fenergo)
Call:Fenergo API
Checks:
Identity verification
Sanction screening
PEP check
Event:
KYC_COMPLETED
If failed:→ APPLICATION_REJECTED
📊 Stage 3 – Credit Risk Assessment
Parallel execution:
Bureau pull
Bank internal ML model:
Credit model
Income stability model
Bank rule engine (DTI, FOIR etc.)
Event:
CREDIT_RISK_COMPLETED
Output:
Risk grade
Recommended limit
Pricing band
🛡 Stage 4 – Fraud & AML Screening
Call:Actimize
Also:
Internal fraud ML model
Device fingerprinting
Velocity checks
Event:
FRAUD_RISK_COMPLETED
AML_SCREENING_COMPLETED
If high risk:→ Escalate to manual investigation queue
🧠 4️⃣ Decision Engine (Central Brain)
Now orchestration service aggregates:
OCR output
KYC status
Credit score
Fraud score
AML status
Internal rule evaluation
Decision logic:
If all green → APPROVED
If moderate risk → REFER
If failed → REJECTED
Event:
LOAN_APPROVED
📜 5️⃣ Loan Agreement & E-Sign
System generates:
Personalized agreement PDF
Risk-based pricing
EMI schedule preview
Customer signs via:
E-sign provider
Event:
LOAN_AGREEMENT_SIGNED
Only after signed → Disbursement allowed.
🏦 6️⃣ CBS Booking (Finacle API Call)
Now your Integration Adapter Layer calls:
Finacle API:
Create customer if new
Create loan account
Create repayment schedule
Post initial disbursement
Critical:
Idempotency key = Application ID
Retry-safe
Exactly-once guarantee
Event:
LOAN_ACCOUNT_CREATED
DISBURSEMENT_SUCCESS
CBS now becomes system of record.
🔔 7️⃣ Notification & Downstream Updates
Events trigger:
SMS / Email
CRM update
Analytics pipeline
Collection strategy assignment
Regulatory reporting update
Event:
NOTIFICATION_SENT
🧩 8️⃣ Event Bus Orchestration Pattern
Important:
Use choreography pattern where possible.
Only use orchestration for:
Critical approval decision
CBS booking coordination
Everything else async.
🛡 9️⃣ Resilience & Control Mechanisms
Must include:
✔ Circuit breaker for CBS✔ Timeout control for Actimize / Fenergo✔ Retry with exponential backoff✔ Dead letter queue✔ Manual review workflow✔ Reconciliation microservice
Never allow:Duplicate disbursement.
📊 10️⃣ Data & Audit Layer
Maintain:
Immutable event store
Application state machine table
Full traceability (for regulator)
Risk decision explainability (model governance)
Critical for compliance.
🧠 11️⃣ Separation of Concerns
Layer | Responsibility |
Digital Layer | Journey + Intelligence |
RegTech (Fenergo) | Compliance |
Risk (Actimize + ML) | Fraud & AML |
CBS (Finacle) | Accounting |
Event Bus | Decoupling |
Notification | Customer communication |
🎯 Final Architecture Summary
User
↓
Digital Lending Platform
↓
Event Pipeline
↓
Risk + RegTech + ML
↓
Decision Engine
↓
Finacle CBS (Ledger)
↓
Notifications + Analytics
CBS touched only once:→ At booking & disbursement.
Everything else happens outside.
💎 Why This Architecture Is Strong
✔ Composable✔ Vendor-independent✔ Cloud scalable✔ Upgrade safe✔ AI-friendly✔ Regulator auditable✔ Future core replacement ready
1️⃣ How would you modernize the Finacle ecosystem?
Objective: Add agility, digital capabilities, and integration without disrupting the core CBS (Finacle).
Step-by-Step Approach:
Assess Core Footprint
Identify Finacle modules in use: CBS, Treasury, Loan, Deposits.
Map upstream/downstream systems: LOS, LMS, Payments, AML.
Define Bounded Contexts
Keep Finacle as system of record.
Create a Digital Lending / Digital Banking Layer as an abstraction layer.
Event-Driven Integration
Introduce Kafka / Event Bus for all domain events.
Use Outbox pattern in digital services.
Digital services subscribe to events for orchestration and analytics.
Adapter Layer
AxisCBSAdapter → abstracts Finacle APIs (REST / SOAP)
Handles protocol/message transformation, retries, throttling.
Digital Services
Microservices for loan journey: KYC, Credit, Fraud, AML, Consent, Notifications.
Store projections in Postgres / Redis / CosmosDB.
Data Platform
Raw → Curated → Analytics → Feature Store
Enable ML for credit, fraud, collections.
Hybrid Modernization Pattern
Strangler pattern: gradually shift business logic to digital layer.
Avoid touching core CBS.
2️⃣ How do you ensure event consistency in banking?
Objective: Avoid mismatched states across digital layer and core systems.
Patterns and Approach:
Outbox + Event Bus
Every state change writes to Outbox table, then published to Kafka.
Guarantees exactly-once publishing.
Idempotency
Every command includes idempotency key (loan ID, transaction ID).
Prevents duplicate processing in LOS/LMS/CBS.
Saga / Choreography
Use orchestration for multi-step flows (loan application → approval → account creation → disbursement).
If one service fails, rollbacks or compensating actions are triggered.
Acknowledgement & Retry
Adapter waits for system ACK before marking event processed.
Retries with exponential backoff, dead-letter queues for failed events.
Reconciliation
Near real-time and end-of-day reconciliation service.
Detects and resolves mismatches across Digital ↔ LOS ↔ LMS ↔ CBS.
3️⃣ How do you design for 99.99% availability?
Objective: Ensure high reliability for critical banking operations.
Step-by-Step Design:
Microservice Resilience
Stateless services with autoscaling.
Circuit breakers, bulkheads, rate limiting.
Database Layer
Multi-AZ deployments, hot standby replicas.
Use distributed cache (Redis / CosmosDB) for low-latency reads.
Message Bus
Kafka clusters with replication factor ≥3.
Multi-region setup if necessary.
Core System Adapter
Retry, idempotency, failover endpoints.
Monitoring & Alerts
SLA monitoring, anomaly detection.
Pager duty integration.
Disaster Recovery
Multi-region DR for Kafka, databases, core adapters.
Automatic failover with minimal RPO/RTO.
4️⃣ How would you architect UPI at scale?
Requirements: Millions of TPS, low latency (<1 sec), high reliability.
Step-by-Step Architecture:
API Gateway
Terminals / apps call stateless UPI gateway.
Event-Driven Transaction Processing
Transaction events → Kafka → Orchestrator → Core CBS/Ledger.
High-Throughput Ledger
Use Finacle / TCS Bancs with adapter layer.
Ensure atomic debit/credit.
Concurrency & Idempotency
UPI transaction ID = idempotency key.
Prevent duplicate debits.
Scaling
Horizontal scaling of gateways & microservices.
Kafka partitioning by VPA/Bank ID for throughput.
Real-Time Settlement
NEFT/IMPS/UPI pipelines for inter-bank settlement.
Event-driven notifications for payer/payee.
Fraud & Risk
Real-time ML scoring per transaction.
5️⃣ How do you manage cost vs performance tradeoff?
Principles:
Use Cloud Elasticity
Autoscale for peaks (e.g., EMI dates, salary days).
Scale down in off-peak.
Caching & Projection
Redis for real-time journey state.
Avoid hitting LOS/LMS/CBS for every UI request.
Batch for Non-Critical Work
EOD batch reconciliation.
Analytics pipelines in Spark / Databricks.
Prioritize SLAs
Critical flows (payments, loan approval) → synchronous, high-cost path.
Non-critical (analytics, dashboards) → asynchronous, low-cost.
6️⃣ How do you run ARB (Application Reconciliation Batch)?
Objective: Detect mismatches between digital layer and LOS/LMS/CBS.
Steps:
Query all domain events in the day.
Compare:
Loan application IDs, loan amount, status.
EMI schedule, disbursed amount, ledger entries.
Generate reconciliation report:
Exceptions flagged
Operations ticket created
Automate retries for minor issues.
Persist ARB results in audit/logging system for compliance.
7️⃣ How do you balance build vs buy?
Framework:
Buy: LOS, LMS, CBS, Finacle — core transactional systems.
Build: Digital orchestration, ML scoring, event-driven pipelines, adapters.
Criteria:
Strategic differentiation → build (e.g., digital journey UX, ML models).
Commodity / stable → buy (e.g., core banking, payments clearing).
Regulatory compliance → buy unless internal expertise exists.
8️⃣ How do you handle regulatory audit scenario?
Steps:
Immutable Audit Logs
Store raw events in Data Lake raw zone.
Append-only, timestamped, versioned.
Lineage Tracking
All transformations logged: raw → curated → feature → decision.
Reconciliation Evidence
EOD / ARB reports, exception handling, SLA adherence.
Explainable ML
Keep feature + model version for each decision (RBI compliance).
Access Control & Governance
SailPoint for roles.
Fine-grained audit of who accessed or modified data.
Regulatory Reporting
Export dashboards and reports to regulator-ready formats.
✅ Summary Table for Interview
Question | Key Answer Pillars |
Finacle Modernization | Digital Layer + Adapters + Event-driven + Strangler pattern |
Event Consistency | Outbox, Kafka, Saga, Idempotency, Reconciliation |
99.99% Availability | Resilient microservices, multi-AZ DB, Kafka replication, DR |
UPI at Scale | API Gateway, Event-driven, Idempotency, Real-time ML |
Cost vs Performance | Autoscaling, caching, batch vs real-time, SLA prioritization |
ARB | Event comparison, reconciliation reports, ops ticketing |
Build vs Buy | Strategic differentiation → Build, Commodity → Buy |
Regulatory Audit | Immutable logs, lineage, explainable ML, role-based access, reports |
1️⃣ Hybrid Architecture (On-Prem Finacle + Cloud Microservices)
Objective: Modernize banking operations without disrupting Finacle, while leveraging cloud agility.
Step-by-Step Approach:
Assess Core Footprint
On-prem Finacle = system of record, immutable business logic.
Identify modules to modernize: CBS, Loan, Treasury, Payments.
Define Hybrid Boundary
Keep Finacle on-prem for critical core banking.
Move digital services, analytics, ML scoring, orchestration, notifications to cloud.
Integration Layer
Introduce AxisCBSAdapter on bank side:
Handles REST / SOAP calls
Protocol transformation
Retry / throttling / idempotency
Adapter ensures cloud services don’t directly touch Finacle.
Data Management
Cloud databases for digital layer:
Postgres (relational state)
Redis (caching, low-latency reads)
CosmosDB (geo-redundant, multi-region)
Event-driven updates ensure consistency with Finacle.
Network & Security
Use secure VPN / ExpressRoute / PrivateLink for cloud ↔ on-prem traffic.
Apply zero-trust principles for microservices.
2️⃣ API Gateway Pattern
Objective: Centralized entry-point for all digital traffic.
Step-by-Step:
Expose Microservices
All digital microservices (loan-svc, kyc-svc, fraud-svc, aml-svc) exposed via API Gateway.
Functions of API Gateway
Authentication & authorization (Azure AD / OAuth2 / JWT)
Rate limiting & throttling
Routing to services
Aggregation of responses (for multi-service calls)
Protocol translation (HTTP → gRPC / REST)
Benefits
Single entry-point for apps & UPI APIs.
Shield core Finacle from direct exposure.
Enables analytics on traffic (request counts, latencies).
3️⃣ Event-Driven Using Kafka
Objective: Loose coupling, scalability, eventual consistency.
Step-by-Step:
Event Sourcing Pattern
Digital microservices emit domain events (loan-initiated, KYC-verified, loan-approved).
Outbox pattern ensures exactly-once delivery.
Kafka Topics
One topic per domain event type:
loan-initiated-event kyc-verified-event credit-score-verified-event loan-account-created-event
Consumers
Orchestration services, audit service, data lake pipelines, and adapters subscribe to events.
Enables real-time processing, ML scoring, and reconciliation.
Advantages
High throughput, fault-tolerant.
Microservices don’t call each other synchronously → avoids tight coupling.
4️⃣ Service Mesh
Objective: Simplify microservice communication, observability, and security in a cloud environment.
Step-by-Step:
Deploy a Service Mesh (e.g., Istio / Linkerd) in Kubernetes / OpenShift:
Sidecar proxies manage service-to-service traffic.
Provides mTLS encryption.
Handles service discovery, retries, load balancing.
Observability Features
Distributed tracing (Jaeger)
Metrics collection (Prometheus)
Traffic routing / blue-green / canary deployments
Benefits
Decouples networking concerns from application logic.
Standardizes resilience policies across services (retry, timeout, circuit breaker).
5️⃣ Observability Stack
Objective: Monitor microservices and hybrid environment for performance and failures.
Step-by-Step:
Metrics
Use Prometheus / Grafana to capture CPU, memory, request latency, error rates.
Tracing
Distributed tracing (Jaeger / OpenTelemetry)
Trace events from UI → Digital Layer → Adapters → Finacle
Logging
Centralized logging (ELK / EFK stack)
Include structured logs, correlation IDs, and request IDs.
Alerting
SLA / SLO violations trigger alerts.
PagerDuty / OpsGenie integration.
Business Observability
Track key banking KPIs (loan approvals, disbursements, failures) in real time.
6️⃣ SRE (Site Reliability Engineering) Model
Objective: Ensure 99.99% availability and operational excellence.
Step-by-Step:
Define SLIs / SLOs / SLAs
Example: Loan approval API latency < 2s 99.9% of the time.
Availability of digital layer = 99.99%.
Automated Incident Management
Self-healing microservices with retry + circuit breaker.
Reconciliation service detects mismatches and triggers ops tickets automatically.
Capacity Planning
Use auto-scaling based on load.
Pre-provisioned Kafka partitions & replicas for peak banking hours.
Change Management
Canary releases / blue-green deployments.
Reduce risk for production changes in hybrid setup.
Postmortems & Learning
Every outage / mismatch triggers blameless postmortem.
Feed improvements back to system design.
🔹 Combined Hybrid Architecture Flow (Axis Bank Style)
User Apps / UPI API
↓
API Gateway
↓
Digital Layer Microservices
↓
Event Bus (Kafka) ←→ Service Mesh (Istio)
↓
Adapters (LOS / LMS / CBS)
↓
On-Prem Systems (Finacle / LOS / LMS)
↓
Data Lake → Analytics → Feature Store (ML)
↓
Observability Stack + SRE dashboards
✅ This ensures:
Hybrid integration (on-prem + cloud)
Event consistency
Observability
High availability
Cloud-native microservice patterns
🏦 Typical ABC Bank-Style Hybrid Deployment Model
🔹 On-Prem (ABC Bank Data Center)
This is where regulated, core, and legacy systems stay.
Deployed On-Prem:
CBS (like Finacle)
LOS
LMS
Enterprise Service Bus (if legacy)
Core payment switch
Core DB clusters
Core adapters (often)
Security & HSM modules
Why On-Prem?
Regulatory requirements
Data residency
Tight control on financial ledger
Lower latency to core systems
Vendor support model
CBS is always treated as System of Record and rarely moved fully to cloud in traditional banks.
☁️ Cloud (Digital Modernization Layer)
This is where innovation happens.
Deployed in Cloud:
Digital lending microservices (loan-svc, kyc-svc, fraud-svc)
Orchestration service
Kafka (managed cluster)
API Gateway
ML scoring services
Feature store
Data lake
Redis / Postgres / CosmosDB
Observability stack
Reconciliation engine
Why Cloud?
Elastic scaling (UPI peaks, EMI days)
Faster deployments
Microservices-friendly
Lower infra management overhead
AI/ML workloads
🔄 How They Connect (Very Important)
You NEVER expose Finacle directly to cloud.
Instead:
Cloud Digital Layer
↓
Secure VPN / Private Link / ExpressRoute
↓
On-Prem Adapter Layer
↓
Finacle CBS
🧩 Where Should Adapter Be Deployed?
There are two common models:
✅ Model 1 (Most Common in Banks)
Adapter deployed On-Prem
Cloud Digital
↓
Secure Channel
↓
AxisCBSAdapter (On-Prem)
↓
Finacle CBS
Why?
Keeps Finacle insulated
Better latency (adapter close to CBS)
Easier protocol transformation
Centralized throttling & control
Security boundary protection
This is the safest enterprise model.
⚡ Model 2 (Less Common)
Adapter in Cloud.
But this increases:
Security exposure
Network dependency risk
Compliance complexity
Most large banks prefer adapter near core.
📍 Final Deployment Architecture
On-Prem Zone
Finacle CBS
LOS
LMS
CBS Adapter
LOS Adapter
LMS Adapter
Cloud Zone
API Gateway
Digital Microservices
Kafka
ML Services
Data Lake
Reconciliation Engine
Monitoring stack
🛡 Security Considerations
Between Cloud ↔ On-Prem:
Mutual TLS
IP whitelisting
Private connectivity (not public internet)
Strict firewall rules
Rate limiting at adapter layer
🎯
Where would you deploy Finacle and adapters in hybrid setup?
“In a hybrid model, Finacle CBS remains on-prem as the financial system of record. We deploy the CBS adapter on-prem as well, close to Finacle, to handle protocol transformation, resiliency, and security control. The digital microservices layer runs in cloud for scalability and agility, connected via secure private network links.”
How to ensure Reliability in Banking
====
🎯 Step 1: Define What Reliability Means in Banking
In banking, reliability means:
No data loss
No duplicate transactions
No financial inconsistencies
High availability (99.99%+)
Predictable performance
Fast recovery from failure
So reliability = Correctness + Availability + Resilience + Recoverability
🏗 Step 2: Reliability at Each Layer
We ensure reliability across 7 layers.
1️⃣ Infrastructure Reliability
In Cloud (Digital Layer)
Multi-AZ deployment
Auto-scaling groups
Managed Kubernetes (AKS / EKS)
Load balancers with health checks
Distributed Kafka cluster (replication factor ≥ 3)
On-Prem (CBS Side)
Active-passive or active-active CBS
Database replication
Redundant network paths
Dual firewalls
2️⃣ Service-Level Reliability (Microservices)
Each microservice must:
✅ Be Stateless
No session stored locally.
✅ Use Circuit Breaker
If CBS is slow:
Stop calling
Return fallback response
Prevent cascading failure
✅ Timeouts + Retries
Set strict timeout (e.g., 2s)
Retry with exponential backoff
Max retry threshold
✅ Bulkhead Pattern
Separate connection pools for:
CBS calls
LOS calls
LMS calls
Prevents one failure from affecting entire system.
3️⃣ Data Reliability
This is most critical in banking.
🔐 Idempotency
Every request includes:
X-Idempotency-Key = LoanID or TransactionID
Prevents duplicate loan creation.
🔄 Outbox Pattern
When service updates DB:
Save business data
Save event in outbox table
Background process publishes event to Kafka
Guarantees:
No event loss
No partial update
📦 Kafka Reliability
Replication factor ≥ 3
Acknowledgment level = ALL
Dead letter queues for failed messages
Consumer offset tracking
4️⃣ Transaction Reliability (Saga Pattern)
Loan creation involves:
Loan approval
CBS account creation
LMS loan schedule
Disbursement
We use Orchestrated Saga Pattern:
If CBS fails:
Compensate → mark loan as FAILED
Do not proceed to LMS
This prevents inconsistent state.
5️⃣ Hybrid Network Reliability
Between Cloud ↔ On-Prem:
Private connectivity (ExpressRoute / MPLS)
Mutual TLS
Retry logic at adapter
Secondary failover endpoint
If primary CBS endpoint fails:
Switch to secondary
6️⃣ Monitoring & Observability
Reliability without visibility is impossible.
Metrics
API latency
Error rate
CBS response time
Kafka lag
Tracing
Track full journey:
User → Gateway → LoanSvc → Adapter → CBS
Alerting
SLA breach alerts
Kafka consumer lag alerts
DB replication lag alerts
7️⃣ Recovery & Reconciliation
Even with best design, failures happen.
So we implement:
Near Real-Time Reconciliation
Digital vs LOS vs LMS vs CBS comparison.
End-of-Day ARB
Financial reconciliation.
Replay Capability
Kafka allows replaying events from offset.
Manual Override Dashboard
Ops team can:
Retry
Reconcile
Re-trigger events
🛡 Reliability Example Scenario
Scenario:
Loan account created in CBS but response lost.
Without reliability:→ Duplicate loan risk.
With reliability:
Adapter uses idempotency key.
If retry happens:
CBS detects duplicate
Returns existing loan account
Reconciliation confirms consistency.
No financial corruption.
📊
How will you ensure reliability in hybrid banking architecture?
You say:
“I design reliability across infrastructure, service, data, and operational layers. I ensure stateless microservices with circuit breakers, idempotent APIs, outbox-based event publishing, Kafka replication, saga-based transaction management, and continuous reconciliation between digital and core systems. Additionally, we implement observability and automated recovery mechanisms to maintain 99.99% availability.”
That answer shows maturity.
🧠 Bonus: Reliability Pyramid
Infrastructure Stability
↓
Service Resilience
↓
Data Consistency
↓
Transaction Integrity
↓
Monitoring & Recovery
1️⃣ How Do You Run ARB? (Architecture Review Board)
ARB is not a meeting.It’s a governance mechanism to control architecture quality, risk, cost, and alignment.
🎯 Step 1: Define ARB Charter
Clearly define:
Scope (All new systems? Only Tier-1 changes?)
Review triggers:
New platform
Major integration
Cloud adoption
Vendor onboarding
Regulatory-impacting change
Without scope clarity, ARB becomes chaos.
🎯 Step 2: Standardized Submission Template
Every proposal must include:
Business objective
Current architecture
Proposed architecture diagram
NFRs (availability, performance, RTO/RPO)
Security model
Data classification
Integration points
Cost estimate
Build vs Buy analysis
Risk assessment
This prevents emotional decisions.
🎯 Step 3: Structured Review Dimensions
ARB evaluates across 7 pillars:
Pillar | What We Check |
Alignment | Does it align with enterprise target architecture? |
Security | IAM, encryption, PII handling |
Reliability | HA, DR, resiliency patterns |
Integration | Event-driven? APIs? Tight coupling? |
Data | Duplication? Data ownership? |
Cost | Capex/Opex impact |
Compliance | RBI/GDPR/SOX implications |
🎯 Step 4: Decision Outcomes
ARB decisions should be:
Approved
Approved with conditions
Rework required
Rejected
And everything documented.
No informal approvals.
🎯 Step 5: Post-Approval Governance
ARB doesn’t end after approval.
You track:
Design compliance
Drift detection
Production alignment
Otherwise teams deviate later.
2️⃣ What Do You Evaluate?
This is critical.
🏗 Architecture Evaluation Areas
1️⃣ Technical Fit
Does it align with hybrid architecture?
Does it reuse shared services (Kafka, API Gateway)?
Is it introducing a new stack unnecessarily?
2️⃣ NFR Coverage
Availability target?
Scaling model?
Latency expectations?
DR plan?
3️⃣ Integration Strategy
REST vs Event?
Synchronous dependencies?
Risk of cascading failures?
4️⃣ Data Governance
Source of truth defined?
Data duplication?
Audit trail available?
5️⃣ Operational Readiness
Monitoring defined?
SLOs documented?
Support model defined?
6️⃣ Vendor Risk (If Buy)
Lock-in risk?
Exit strategy?
SLA commitment?
7️⃣ Long-Term Sustainability
Tech roadmap?
Skills availability?
Community support?
3️⃣ How Do You Prevent Tech Sprawl?
Tech sprawl = uncontrolled tools, frameworks, vendors.
It kills maintainability and increases cost.
🎯 Step 1: Define Approved Technology Stack
Example:
Backend: Java / .NET
Messaging: Kafka
Cache: Redis
DB: Postgres
Observability: Prometheus + Grafana
API Gateway: Standardized
No arbitrary new tools without ARB approval.
🎯 Step 2: Platform Engineering Model
Provide shared platforms:
Shared CI/CD
Shared Kafka cluster
Shared Kubernetes
Shared observability stack
When platform is easy, teams won’t build their own.
🎯 Step 3: Reuse-First Principle
Before approving new tech, ask:
Can existing platform solve this?
Is 80% solution acceptable?
🎯 Step 4: Periodic Rationalization
Every 6–12 months:
List all tools in use
Identify duplicates
Decommission low-usage systems
This prevents entropy.
4️⃣ How Do You Handle Deviations?
Deviations are inevitable.
What matters is how you control them.
🎯 Step 1: Categorize Deviation
Type | Example |
Minor | Version mismatch |
Medium | Using alternate DB |
Major | Introducing new event bus |
🎯 Step 2: Temporary vs Permanent
If deviation is needed:
Document reason
Define sunset timeline
Define rollback plan
Never allow undocumented permanent deviation.
🎯 Step 3: Risk Assessment
Evaluate:
Security impact?
Operational complexity?
Compliance violation?
Future integration risk?
🎯 Step 4: Compensating Controls
If deviation allowed:
Additional monitoring
Extra documentation
Restricted scope
Periodic review
🎯 Step 5: Governance Tracking
Maintain:
Architecture Deviation Register
Include:
System name
Deviation type
Risk level
Expiry date
Owner
Without tracking, deviations become architecture debt.
🎤
How do you run ARB and prevent sprawl?
“We operate ARB with structured submission templates and evaluate proposals across alignment, security, NFRs, integration, cost, and compliance. We enforce a standardized enterprise tech stack and promote reuse via platform engineering. Deviations are documented, risk-assessed, time-bound, and tracked in a deviation register to prevent architectural entropy.”
Modernization Around MainFrame
-------
🎯 Strong Leadership Answer Framework
1️⃣ Acknowledge the Reality
“Absolutely — mainframe teams are usually protective because they’ve built and maintained the system for 15–20 years. There is institutional knowledge and natural resistance.”
2️⃣ Address the “You Don’t Understand” Concern
“We never approached modernization with the mindset that we knew better. In fact, the first phase was reverse engineering and knowledge acquisition.”
you actually did:
Conducted application portfolio assessment
Identified COBOL modules, JCL jobs, CICS transactions
Mapped batch vs online flows
Created dependency maps
Captured data lineage
Built functional decomposition
This shows depth.
3️⃣ Did You Learn It Yourself?
This is where you differentiate as a leader.
Do NOT say “I learned COBOL myself.”
Say this:
“As a technology leader, I didn’t try to become a COBOL expert. Instead, I ensured we had hybrid squads — legacy SMEs + modern engineers — and I personally drove structured knowledge capture workshops.”
“However, I invested time to understand high-level mainframe architecture, batch scheduling, VSAM datasets, and integration touchpoints so that modernization decisions were informed.”
That balances hands-on awareness + leadership positioning.
4️⃣ How Did You Handle Non-Cooperation?
“Resistance reduced significantly when we reframed modernization from ‘replacement’ to ‘risk reduction and performance uplift’. We involved mainframe SMEs in design reviews and made them co-owners of the future state.”
Psychology > Technology.
Also mention:
Incentivized knowledge documentation
Formal KT sign-offs
Shadow runs
Parallel production runs
5️⃣ Who Validated If You Didn’t Know?
This is the most powerful part.
Say:
“Validation was not individual-driven — it was systemic.”
Then explain:
Automated regression testing
Data reconciliation scripts
Parallel run comparisons
Output parity validation
Audit logs
Business sign-off checkpoints
For BFSI especially:
Reconciliation at field level
Batch output comparison
T+1 reconciliation reports
That shows governance maturity.
6️⃣ Address Time Consumption
Say:
“Yes, modernization is time-consuming if done blindly. We reduced risk and timeline by using a strangler pattern approach rather than big-bang replacement.”
Mention:
API wrapping
Incremental module migration
Domain-driven segmentation
Event-driven extraction
.
💎
“Mainframe modernization always comes with knowledge concentration and resistance. We addressed this by first building deep system understanding through reverse engineering workshops and dependency mapping. Rather than positioning modernization as replacement, we positioned it as risk reduction and performance uplift. We formed hybrid squads combining mainframe SMEs and cloud-native engineers. Validation was systematic — automated regression suites, parallel run comparisons, and reconciliation at data and transaction levels ensured parity. Instead of big-bang migration, we adopted a strangler approach — gradually extracting domains via APIs and microservices. My role was to ensure technical correctness, stakeholder alignment, and controlled risk — not to become a COBOL developer, but to orchestrate a structured transition.”
.png)

Comments