DIGITAL LENDING RFP Solution
- Anand Nerurkar
- Mar 23
- 12 min read
šÆ RFP Proposal SOLUTION PRESENTATION ā DIGITAL LENDING (WITH COLOR-CODED ARCHITECTURE)
1ļøā£ Opening
āThank you for the opportunity. Iāll walk you through our approach to building a next-generation digital lending platform, leveraging hybrid multi-cloud, AI/ML, and GenAI, while ensuring resilience, compliance, and cost optimization.ā
Executive Summary
We propose a next-generation digital lending platformĀ built on:
Hybrid multi-cloud architecture
Primary: Azure (Mumbai)
DR / Failover: GCP (Chennai)
On-prem core banking systems: LOS, LMS, CBS with low-latency adapters
Real-time fraud detection:Ā ML pipelines with offline & online feature stores, real-time scoring (<100ms)
GenAI copilots:Ā Underwriter Copilot, Borrower Assistant, Lending Agreement Reviewer, powered by enterprise Knowledge Hub (LLMOps + RAG Layer)
Regulatory complianceĀ via Fenergo (KYC/CDD/EDD) and NICE Actimize (AML)
ā Key Business Outcomes
40% reduction in underwriting effort
Real-time fraud detection
High availability with multi-cloud resilience
Regulatory compliance & audit readiness
2ļøā£ Business Challenges
āWe understand the key challenges are:
Fraud losses
Regulatory compliance (KYC / AML)
High availability & DR readiness
Scaling to 150k concurrent users
Leveraging AI & GenAI for efficiencyā
3ļøā£ Core Design Principles
1. Business-first active-active
Active-active applied to critical lending journey only
Not every component (avoids over-engineering)
2. Hybrid architecture
Core banking + compliance ā on-prem
Digital + AI/ML + GenAI ā cloud
3. Event-driven architecture
Loose coupling
Resilience + replay capability
4. Cost-optimized resilience
Active-active (critical)
Active-passive (non-critical + ML DR)
5. Failover = Activation, Not Restart
GCP doesnāt āwaitāš It takes over instantly
4ļøā£ Architecture Walkthrough
Legend:
[šµ] Critical / Active-Active
[š¢] Non-Critical / Active-Passive
[š”] AI / ML Layer
[š ] GenAI Layer
[š¾] Data Platform
[ā”] Event Layer
[š¢] On-Prem Core & Compliance
[š”] Security / IAM
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā CUSTOMER CHANNELS ā
ā Web / Mobile / RM Portal ā
āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāā¼āāāāāāāāāāāāāā
ā GLOBAL TRAFFIC LAYER ā
ā DNS / Traffic Manager ā
āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā
āāāāāāāāā¼āāāāāāāāāā āāāāāāāāāāāāā¼āāāāāāāāāā
ā Azure (Mumbai) ā ā GCP (Chennai) ā
ā Primary Region ā ā DR / Failover ā
āāāāāāāāāāāāāāāāāā⤠āāāāāāāāāāāāāāāāāāāāāāā¤
ā [šµ] Digital Appsā ā [šµ] Digital Apps ā
ā [šµ] APIs + UI ā ā [š¢] Passive Apps ā
ā ā ā ā
ā [š¾] CosmosDB ā ā [š¾] DR DB ā
ā Lending Timelineā ā ā
ā ā ā ā
ā [š¾] Azure Data ā ā [š¾] GCP Data Lake ā
ā Lake ā ā ā
ā RawāCuratedāFE ā ā ā
ā ā ā ā
ā [š”] Feature ā ā [š”] Feature Store ā
ā Store (Online) ā ā (Replicated) ā
ā ā ā ā
ā [š”] Azure ML ā ā [š”] GKE / Vertex AIā
ā Inference ā ā ML DR Endpoint ā
ā ā ā ā
ā [š ] GenAI ā ā [š ] GenAI DR ā
ā Copilots ā ā ā
āāāāāāā¬āāāāāāāāāāāā āāāāāāāāāāā¬āāāāāāāāāāāā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā [ā”] EVENT LAYER ā
ā Kafka / BDR / Redis ā
āāāāāāāāāāāāā¬āāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā [š¢] ON-PREM CORE ā
ā LOS / LMS / CBS ā
ā Fenergo / Actimize ā
āāāāāāāāāāāāā¬āāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā [š”] SECURITY & IAM ā
ā Keycloak + Azure AD ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā
āAt a high level:
Customers access via web/mobile ā routed through global traffic layer
Azure (Mumbai)Ā acts as primary region
GCP (Chennai)Ā acts as DR / failover
Design Choices
[šµ] Critical services ā Active-Active
[š¢] Non-critical ā Active-Passive
[ā”] Event layer ensures consistency
[š¾] Data platform powers AI/ML
[š”] ML handles real-time decisions
**[š ] GenAI enhances business efficiencyā
5ļøā£ Digital Lending Flow
Digital Lending Layer (Azure Primary / GCP DR)
Customer-facing services:Ā login, consent, KYC, AML checks, income stability, decision engine, agreement, loan-account setup, disbursement
Critical services active-active:Ā KYC, AML, income verification, decision engine
Non-critical services active-passive for cost optimization:Ā e.g., document storage, reporting, analytics
Failover:
Users routed to nearest region; in case of outage, traffic flows to DR region
Data consistency ensured via Kafka + Postgres BDR + Redis
ā Design Choice
Area | Decision | Why |
Critical services | Active-active | Business continuity |
Non-critical services | Active-passive | Cost optimization |
User routing | Geo-based | Avoid data conflicts |
š” Key Insight
āActive-active is applied at business capability level, not every service.ā
š Avoids data conflicts and ensures continuityā
Functional Requirements
Capability | Response |
Customer onboarding | Supported |
Document upload | Supported |
KYC/AML integration | Supported |
Loan processing | Supported |
Multi-channel access | Supported |
Core banking integration | Supported |
6ļøā£ Data Platform & ML (Highlight Strongly)
āWe built a modern data platform:
Unified data ingestion:
Transactions
Customer behavior
Device / session data
Data lake + streaming pipelines
CosmosDB ā event timeline
Azure Data Lake:
Raw ā Curated ā Analytics
Feature Engineering
Feature Strategy:
Feature Engineering Pipeline
Build features like:
Transaction velocity
Device fingerprint
Behavioral patterns
Feature Store Architecture
Offline Feature Store
Historical data
Model training
Online Feature Store
Low-latency feature access
Used during inference
Feature Materialization
Sync offline ā online store
Real-Time Inference
Deployed as:
REST/gRPC endpoint
Latency:
< 50ā100 ms
GCP DR
Replicated:
Critical datasets
Feature store (online)
CosmosDB timeline
Event & Data Layer
Kafka MirrorMaker:Ā cross-region event replication
Postgres BDR:Ā database replication
Redis Enterprise:Ā cache replication
Purpose:Ā Ensure HA, DR, and active-active consistency
ā Design Choice
Area | Decision | Why |
Data Lake replication | Selective (not full) | Cost optimization |
Feature replication | Real-time (Kafka) | ML consistency |
Offline data | Batch replication | Not latency sensitive |
š„ Key Line:
āWe replicate features in real-time, not just data ā ensuring ML accuracy during failover.ā
Multi-Cloud Connectivity
Connection | Primary | DR / Secondary |
Cloud ā On-prem | Azure ExpressRoute | GCP Cloud Interconnect |
Fallback | VPN | VPN |
7ļøā£ ML Deployment Strategy
āAzure ML is primary for:
Training
Model Registry
Real time Inference
GCP (DR)
Containerized ML endpoints:
Models are containerized and deployed on GCP (GKE / Vertex AI)
Online features are replicated
Activated on failover
ā Design Choice
Area | Decision | Why |
ML deployment | Active-passive | Avoid complexity |
Model replication | Container-based | Cloud portability |
Feature sync | Streaming | Real-time accuracy |
š” Key Insight
āWe avoided full active-active ML to reduce complexity while ensuring DR readiness.ā
š Ensures:
Cost optimization
DR readinessā
š„ Strong Line:
āSo during failover, GCP has both:
Model
Features
Which ensures real-time decisioning continues without disruption.ā
8ļøā£ GenAI / Enterprise Knowledge Hub
LLMOps Pipeline:Ā Model orchestration, versioning, prompt management
RAG Layer:Ā Fetch regulatory rules, loan policies, past knowledge
Copilots:
Underwriter: highlights high-risk cases, recommends actionā reduces effort by 40%
Borrower Assistant: guides loan applicantsā improves customer experience
Lending Agreement Reviewer: summarizes payment terms, EMI, affordabilityā summarizes contracts
Deployment:Ā Hybrid (Azure cloud for LLM inference, on-prem for sensitive knowledge
We introduce GenAI through an enterprise knowledge hub:
Architecture
LLMOps pipeline
RAG layer
Hybrid deployment:
Cloud ā inference
On-prem ā sensitive knowledge
ā Design Choice
Area | Decision | Why |
GenAI deployment | Hybrid | Data security |
Knowledge base | On-prem | Regulatory compliance |
š Ensures compliance + scalabilityā
9ļøā£ Core Banking & Compliance (On-Prem)
Core Systems:Ā LOS, LMS, CBS
Adapters:Ā LOS Adapter, LMS Adapter, CBS Adapter for low latency
Compliance:
Fenergo: KYC/CDD/EDD
NICE Actimize: AML & Fraud
Integration:Ā Event-driven ā digital layer sends transactions ā core & compliance
Benefit:Ā Regulatory workflow, reporting, audit-ready
ā Design Choice
Keep core & compliance on-prem for regulatory control and stability
9ļøā£ Resilience & DR
āResilience is multi-layered:
Active-active ā critical services
Active-passive ā ML + non-critical
Kafka + BDR ā data sync
Failover:
āIf Azure fails:
Traffic ā GCP
GCP uses:
Replicated features
ML models
š Lending continues seamlesslyā
š Security & Regulatory Compliance
āWe use:
IAM:Ā Keycloak integrated with Azure AD
Protocols:Ā OIDC / OAuth2 / SAML for browser-based apps
JWT Tokens:Ā Used for service-to-service and user authentication
Zero-trust
Encryption:
TLS / AES-256
Compliance:
Data residency
Auditability
Compliance workflows via:
Fenergo
NICE Actimize
Data Consistency Strategy
PostgreSQL BDR ā transactional replication
Kafka MirrorMaker ā event replication
Redis ā cache sync
š Supported by:
Idempotency
Versioning
Controlled writes
RFP WAR ROOM & HYPERSCALER PARTNERSHIP
War Room Setup
Solutioning, Finance, Delivery, Leadership
Hyperscaler SMEs (Azure + GCP)
Hyperscaler Contributions
Area | Contribution |
Azure | ML, CosmosDB, Data Lake |
GCP | DR, ML inference, interconnect |
Both | Reference architectures, security, CI/CD |
š” Key Insight
āHyperscalers reduced solution risk and accelerated design by ~20%.
Delivery Plan & Model
Phase | Duration |
Foundation + Data Platform | 4 months |
Core + Integration | 6 months |
AI/Fraud Implementation | 6 months |
UAT & Compliance | 3 months |
š Total: 15ā18 months
ā5 squads, 7-8 members each
Timeline: 18ā22 months
POD Team: Digital, AI/ML, GenAI, Integration, DevSecOpsā
MLOps:Ā Azure ML for training & deployment
LLMOps:Ā Hybrid (Azure + on-prem sensitive data)
Program governance:Ā Agile, sprint-based delivery with KPIs
Governance
Hybrid Governance
Program Governance Weekly program reviews
Architecture review board (EARB/ARB)
Risk tracking & escalation
Commercials
Component | Cost ($M) |
Implementation | 40ā50 |
Cloud (Azure + GCP) | 12ā15 |
AI/ML | 15ā20 |
GenAI | 15ā20 |
COTS(Fenergo + Actimize ) | 25ā30 |
Support | 20ā25 |
š Total: $110ā150M
Assumptions & Dependencies
Core APIs exposed via adapters
Network stability
Regulatory approvals
Risks & Mitigation
Risk | Mitigation |
ML model drift | Continuous monitoring & retraining |
Fraud latency | Online feature store & low-latency inference |
Compliance delays | Async event-driven workflow |
Data conflicts | Controlled writes + idempotency |
Cloud outage | Multi-cloud failover |
Feature inconsistency | Real-time sync |
āWhat differentiates our solution:
Business-driven active-active (not over-engineered)
Realistic ML DR strategy
Feature-level consistency for AI accuracy
Hybrid compliance-ready architecture
Strong hyperscaler-backed design
āThis architecture balances:
Resilience (multi-cloud DR)
Cost (selective active-active)
Intelligence (AI/ML + GenAI)
While ensuring a future-ready, compliant digital lending platform.
āThis solution integrates cloud-native digital lending with on-prem core banking and compliance platforms, while also introducing real-time fraud detection using advanced AI/ML capabilities. Customer onboarding is handled in the cloud, while compliance workflows such as KYC and AML are executed on-prem via event-driven integration with platforms like Fenergo and NICE Actimize through low-latency adapters.. The fraud detection layer uses a feature store architecture with real-time inference to detect risk within milliseconds.GenAI-powered copilots for underwriting, borrower assistance, and agreement analysis, built on an enterprise knowledge hub using LLMOps and RAG. The platform is deployed across Azure and GCP with resilient connectivity to on-prem systems, ensuring high availability and regulatory compliance. This design enables a scalable, intelligent, secure digital lending ecosystem, compliance, and a superior customer and operation experience.ā
Panel may ask few questions please be ready with your answer
ā 1. If both regions are active, why do you still need DR?
āActive-active ensures availability, but DR is still required for catastrophic failure scenarios.ā
Explain clearly:
Active-active = both regions serve traffic
But:
Cloud-wide outage
Data corruption
Cyber attack
DR ensures:
Clean recovery point
Isolation from failure
š Punchline:
āActive-active is for availability; DR is for survivability and recovery.ā
ā 2. If ML is not active-active, is this a true active-active system?
āYes, because active-active is applied at the business capability level, not every component.ā
Break it down:
Critical user journey (loan processing) ā active-active
ML inference ā active-passive (but fast failover)
Trade-off:
Avoid complexity
Optimize cost
š Punchline:
āWe prioritize active-active for business continuity, not for every technical component.ā
ā 3. How do you avoid data conflicts in active-active?
āWe prevent conflicts by design, not by resolution.ā
Steps:
Geo-routing ā user sticks to one region
Session affinity / token routing
Idempotent APIs
Event ordering (Kafka partitioning)
š Punchline:
āInstead of resolving conflicts later, we design the system to avoid them upfront.ā
ā 4. What if CosmosDB and Postgres become inconsistent?
āWe treat them as different sources of truth.ā
Explain:
Postgres ā transactional truth
CosmosDB ā event timeline / projection
Sync via events (event sourcing pattern)
š If mismatch:
Rebuild CosmosDB from event logs
š Punchline:
āCosmosDB is eventually consistent and rebuildable; Postgres is the source of truth.ā
ā 5. Why not keep everything on one cloud?
āWe chose multi-cloud for risk diversification and regulatory alignment, not trend.ā
Explain:
Avoid vendor lock-in
Regulatory requirements (data locality / resilience)
DR isolation (true independence)
š Punchline:
āMulti-cloud is a strategic risk decision, not just a technology choice.ā
ā 6. How do you test DR?
āWe follow structured DR testing.ā
Steps:
Planned failover drills
Partial failure testing (ML / DB / API)
Data validation post failover
RTO / RPO measurement
š Punchline:
āDR is validated continuously, not assumed.ā
ā 7. What is your RTO and RPO?
āDefined based on business criticality.ā
Example:
Critical services:
RTO: few minutes
RPO: near-zero
Non-critical:
RTO: hours
RPO: acceptable lag
š Punchline:
āRTO/RPO are business-driven, not technology-driven.ā
ā 8. How do you control cloud cost in this architecture?
āCost optimization is built into architecture.ā
Steps:
Active-active only for critical services
Active-passive for non-critical
ML not active-active
Storage tiering (hot / cold data)
Reserved instances / committed usage
š Punchline:
āWe balance resilience and cost through selective activation.ā
ā 9. What if Kafka replication fails?
āWe design for failure.ā
Steps:
Retry + backpressure
Dead Letter Queue (DLQ)
Replay from offset
Monitoring & alerts
š Punchline:
āEvent-driven systems are resilient because they support replay and recovery.ā
ā 10. How do you ensure security across multi-cloud?
āWe enforce centralized identity with federated control.ā
Steps:
Keycloak + Azure AD federation
OIDC / SAML for authentication
JWT tokens for services
Zero-trust principles
Encryption in transit & at rest
š Punchline:
āIdentity is centralized, enforcement is distributed.ā
āWhy didnāt you make ML fully active-active across Azure and GCP?
āFull active-active ML across clouds adds significant complexity ā especially around model consistency, feature synchronization, and latency. Instead, we designed active-active at the application layerĀ and active-passive for ML inference, where: Azure ML handles primary inference GCP hosts containerized DR endpoints This ensures resilience without unnecessary cost and operational overhead, while still meeting real-time fraud detection SLAs.ā
ā How do you ensure feature consistency between Azure and GCP?
āWe separate offline and online feature flows: Offline features (training) are replicated in batch. Online features are synchronized using Kafka-based streaming Feature materialization ensures the same feature definitionsĀ are used across regions. This guarantees that ML predictions remain consistent during failover.ā
āWhat happens if feature replication lags? Wonāt ML predictions be wrong?
āGood point ā we handle this with: SLA-based lag monitoringĀ for feature pipelines Graceful degradationĀ (fallback rules or last known features) Critical features prioritized for real-time sync So even in lag scenarios, we ensure controlled and explainable decisions, which is important for banking.ā
ā How do you deploy ML models from Azure to GCP?
āModels trained in Azure ML are: Serialized (e.g., ONNX / pickle / containerized format) Packaged into Docker containers Deployed on GKE or Vertex AI endpoints in GCP CI/CD pipelines ensure that every model version in Azure is replicated to GCP, maintaining DR readiness.ā
ā How fast is your failover for ML inference?
āFailover is near real-time: Traffic rerouted via DNS / API gateway GCP already has: Latest model container Synced online features So inference resumes almost immediately, with minimal latency impact.ā
ā Why use CosmosDB for lending timeline? Why not Postgres?
āCosmosDB is ideal because: It handles high-volume, event-based, semi-structured data Provides low-latency reads/writes globally Supports flexible schema for evolving lending events Postgres is used for transactional consistency, while CosmosDB is optimized for event timeline and journey tracking.ā
āHow do you ensure data consistency in active-active setup?
āWe use a combination of: Kafka (event streaming)Ā for eventual consistency Postgres BDRĀ for database replication Idempotent APIsĀ to prevent duplicate processing Also, user operations are typically region-local, which reduces conflict scenarios.ā
āWhat if both regions process the same user request?
āWe avoid that using: Geo-routing (user sticks to one region) Session affinity / token-based routing Idempotency keys for APIs So duplicate processing is prevented at design level.ā
ā How did hyperscaler partnership really help here?
āHyperscaler collaboration was key: Azure team helped with ML architecture, feature store, and CosmosDB patterns GCP team validated DR strategy, containerized ML inference, and interconnect setup We leveraged reference architectures and accelerators, reducing solutioning time by ~20% This ensured the design was validated, scalable, and production-ready.ā
ā What is your biggest risk in this architecture?
āThe biggest risk is feature inconsistency across regions impacting ML decisions. We mitigate this via: Real-time feature sync for critical features Monitoring & alerting Fallback strategies This ensures decision reliability even during DR scenarios.ā
š Multi-Cloud Data & ML Replication Strategy
1ļøā£ Azure Data Lake ā GCP Data Lake Replication
Purpose:Ā DR / failover of raw, curated, and analytics data for digital lending pipeline
Approach:
Step | Description |
1. Raw Data Ingestion | All lending events, transactions, and KYC/AML logs are ingested into Azure Data LakeĀ (raw layer). |
2. Curated / Analytics Layer | Transform raw data into curated + aggregated datasetsĀ for ML feature engineering. |
3. Feature Engineering | Offline features generated here ā materialized for online feature store. |
4. Cloud-to-Cloud Replication | Use cross-cloud replication pipelines: Ā - Option 1:Ā Scheduled / streaming data export from Azure Blob StorageĀ ā GCP Cloud Storage / Data Lake. Ā - Option 2:Ā Use Apache Spark / Dataflow pipelinesĀ with connectors for Azure ā GCP. Ā - Option 3:Ā Hybrid Kafka topicsĀ that stream transformed features ā GCP feature store in near real-time. |
5. Online Feature Store in GCP | Updated features consumed by containerized ML inference endpointsĀ on GCP during DR. |
Key Principle:Ā Only critical datasets & featuresĀ are replicated to DR (cost optimization). Non-critical analytics can be rebuilt in DR on-demand.
2ļøā£ ML Model & Inference Replication
Primary Region (Azure ML)
Training, online inference (<100ms latency), feature access from Azure feature store
Generates ML models, serialized artifacts, and endpoint containers
DR Region (GCP / GKE)
Containerized ML inferenceĀ deployed to GKEĀ for DR failover
Feature replication:Ā Online features synced from Azure ā GCP using streaming / event-driven pipelinesĀ (Kafka MirrorMaker or Dataflow)
Offline features / model artifactsĀ replicated using cloud storage syncĀ (Blob ā GCS)
DR endpoint becomes active only if Azure ML goes down
Notes:
Azure ML itself doesnāt run natively on GCP; we export models as containerized endpointsĀ and deploy on GKE
Features must be kept in syncĀ to ensure inference correctness. This is done via streaming replication or event-driven pipelines.
Non-critical model artifacts (training datasets, offline analytics) can be stored in GCP cold storage; real-time inference uses synced online features + model container.
3ļøā£ Data Consistency & DR
Event Layer (Kafka / Postgres BDR / Redis)Ā replicates transactional & real-time eventsĀ across regions
Single Writer Principle:
Active-active for critical servicesĀ ensures both regions can serve traffic
Features / ML pipelines are reconciled continuouslyĀ to handle eventual consistency
Failover Scenario:
Azure primary goes down ā users routed to GCP
Containerized ML endpoints in GCP + replicated online features serve real-time inference
Core digital lending workflow continues uninterrupted
āFor multi-cloud DR, Azure Data Lake feeds our ML feature pipelineĀ ā offline features are materialized, online features are synced via streaming pipelines to GCP. ML models trained on Azure ML are exported as containerized endpointsĀ and deployed to GKE in GCP. In DR, the online feature store + model container serve inference for sub-ms latency, ensuring critical lending workflows continueĀ even if Azure region is down. Event-driven replication (Kafka / Postgres BDR / Redis) ensures data consistency, and we replicate only critical datasetsĀ to optimize cost. Non-critical analytics can be rebuilt in DR on-demand.ā
.png)

Comments