top of page

Modernization Legacy Mutual Fund

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • May 18
  • 11 min read

Updated: May 21

🏛️ Mutual Fund Platform Modernization: Enterprise-Scale Architecture

Business Vision


  1. Modernize a legacy mutual fund transaction and investment management platform into a high-performance, resilient, cloud-native, intelligent system.

  2. Supports 150K+ concurrent investors, 6000+ TPS, multi-tenant onboarding, SEBI/GST/FATCA compliance,real-time insights and real-time NAV processing.

  3. Enabling distributors and admin operations via a secure and scalable platform.

  4. Build with Spring Boot-based microservices architecture hosted on Azure Cloud using AKS, Istio, and Kafka.


Target Architecture Overview

  • Microservices: 40+ Spring Boot services

  • Cloud: Azure

  • Container Orchestration: AKS (Azure Kubernetes Service)

  • Service Mesh: Istio (traffic control, mTLS, policy enforcement)

  • Messaging: Kafka (event-driven processing)

  • Database: Azure SQL, Cosmos DB

  • Monitoring: Prometheus, Grafana, Azure Monitor

  • Logging: ELK Stack

  • Authentication: Azure AD B2C

  • CI/CD: Azure DevOps


Business Outcomes (Before vs. After Modernization)

Metric

Legacy System

Modernized Platform

Concurrent Users Supported

~10,000

150,000+

Transactions per Second

~500 TPS

6000+ TPS, burstable to 8000+

NAV Update Frequency

3x/day

Every 5–15 mins, validated + cached

Audit Readiness

Manual + fragmented

Immutable, append-only CosmosDB + SEBI auto-reports

Deployment Downtime

High (2–4 hrs/month)

<10 mins, zero-downtime via Istio + Helm

Support Resolution Time

Manual, 1–2 days

GenAI-assisted, reduced to hours

Investor Experience

Static, transactional

Conversational, intelligent, and contextual


Tech Strategy (Aligned to Business)

Business Objective

Tech Strategy


Scale to millions of investors and Scalability to 150K users

AKS with HPA, Kafka, Redis, CosmosDB


High Throughput

Kafka event-driven microservices (Spring Boot)


Regulatory compliance

Immutable logs in CosmosDB for audit, RBAC, Azure Key Vault, SEBI/FATCA export


Investor intelligence

Embedded GenAI advisor, RAG-powered chat, recommendation engine


Reduced time-to-market

Azure DevOps + Helm + Istio (canary/blue-green CI/CD)


Real-time portfolio and NAV

Kafka streaming + Redis caching + CosmosDB historical logs


Multi-tenant support

Istio VirtualServices + JWT claims + scoped RBAC


🔹✅ Enterprise-Scale Architecture Principles

  • Event-Driven Microservices using Kafka

  • Real-Time Ingestion + Async Flow with fallback

  • Immutable Audit Logs in Cosmos DB

  • Azure AKS Active-Active (2 Regions) + Kafka MirrorMaker

  • DevSecOps Gates: CVE scans, key vault integration, RBAC policies

  • Observability: Prometheus (HPA, latency), Grafana, ELK

  • SEBI/FATCA Compliance: Scheduled, auditable reports

  • RBAC + Azure AD: Fine-grained access across personas


💡 Bonus Impact Metrics (Before vs. After)

Metric

Legacy

Modernized

Order Processing Time

10–12s

3–4s

NAV Refresh

3x/day (batch)

Every 15 min

Downtime

5–7 hrs/year

<1 hr/year

Deployment Risk

High

Zero-downtime

SEBI Audit Cycle

Manual, 3–5 days

Automated, <2 hours

Capability Map (Functional + GenAI)

🎯 Functional Capabilities

  • Investor Onboarding (via Admin/Distributor)

  • Risk Profiling

  • Fund Discovery & NAV Access

  • Transaction Management (Buy/Sell/Switch)

  • Payment Gateway Integration

  • Portfolio Management

  • NAV Feed Ingestion & Publication

  • Distributor Management

  • Commission Calculation

  • Support Ticketing (CRM)

  • Document Management (T&Cs, CAS)

  • Analytics & Insights

  • SEBI, FATCA, GST Compliance

  • Audit & Trail Logging

  • Notifications (SMS, Email, App)

🤖 GenAI-Enhanced Capabilities

  • GenAI Conversational Assistant (Investor, Distributor)

  • NAV & Fund Comparison Chatbot (RAG)

  • KYC Auto-fill via OCR/NLP

  • Document Summarization (T&Cs, Factsheets)

  • Portfolio Health Advisor

  • Transaction Anomaly Explanation

  • Auto-generated SEBI/GST report summary

  • CRM Ticket Reply Drafting

  • AML Alert Explainer (compliance flow)


Capability to Microservice Mapping

Capability

Microservices Involved

Investor Registration

InvestorService, KYCService, CredentialService

Risk Profiling

RiskEngineService (questionair,risk scoring,MF suitaibility)

Fund Discovery & NAV

ProductCatalogService, (fund explorer,nav view,factsheets , offer docuemnts)NAVPublisherService

Order Management

OrderService, PaymentService, TransactionEngine (Buy, Sell, Switch, Modify, Cancel Orders)

Portfolio Mgmt

HoldingService, ReportService

NAV Feed Processing

NAVIngestorService, NAVCalculator, NAVPublisherService (Real-time NAV Ingestion, Validation, Publication, Redis Caching)

Distributor Ops

DistributorService, LeadService, CRMAdapterService - Lead Management, Commission Payouts, Hierarchy, CRM Integration

Commission Mgmt

CommissionEngine, LedgerService - Slab-wise, event-based, referral-based computation

Notifications

NotificationService, AlertService - sms.email.alert

Compliance & Reporting

ComplianceService, AuditTrailService - SEBI/FATCA Reporting, Audit Trail, RBAC, Immutable Logs

GenAI Advice

GenAIAdvisorService, RAG Orchestrator

Doc Summarization

DocSummarizerService

Fraud & AML Detection

FraudDetectionService, AMLScreeningAdapter

Document Management

Onboarding Docs, Fund Factsheet, Statement Storage (Blob)

Analytics & BI

Fund Performance, Investor Behavior, Operational KPIs

Support & Ticketing

CRM Integration (Zendesk, Freshdesk), Ticket Tracking

Admin Configuration

Fund Setup, Limits, NAV triggers, Access Management

Fraud Monitoring

Anomaly Detection, Velocity Check, Manual Override

Loyalty & Referral

Referral Tracking, Campaigns, Rewards


🔗 EXTERNAL SYSTEM INTEGRATIONS – Data Pipelines & APIs

Area

System/API

Kafka Topics & Flow

purpose

integration mode

KYC

UIDAI / NSDL

kyc.completed ← from KYC Service

Aadhaar eKYC, XML auth

REST + Secure Callback

eSign

Digio, SignDesk

eSign callbacks → esign.completed

PAN Verification, e-Sign

API + Signed PDF exchange

NAV Feed

CAMS / KFinTech (SFTP)

NAVIngestorService → nav.raw-feeds → nav.validated → nav.broadcast

NAV Feed, Folio Sync, Order Status

SFTP / API + file ingest

Payments

UPI / BillDesk

Webhook → payment.success / payment.failed

Payment Collection (UPI, Mandate, NetBanking)

API + Webhook

CRM

Distributor CRM APIs

Pull investor-lead mapping, lead status

Ticket sync, Distributor inquiries

API + oauth

SEBI/AMFI

Audit exports

compliance.report.generated / API export

Reporting API / XML Upload

Batch file + webhook

Blob Storage

Azure Blob

Statements and reports from ReportService

Reporting API / XML Upload

Batch file + webhook

Monitoring

Prometheus / Grafana

Scrapes Istio sidecars, shows TPS, CPU, NAV delay



Audit

Cosmos DB

AuditTrailService consumes all key events



Income Tax Portal / PAN DB



PAN validation and linking with Aadhaar

REST / SOAP

GSTN / E-Invoice Portal



Commission invoices, distributor GST compliance

REST API + GSP

NSDL CAS (Consolidated Account Statement)



Consolidated investor holdings reporting

File Upload or API

Banking APIs (ICICI/HDFC Axis)




Account Aggregator / API Gateway

Azure Active Directory



Admin access control, RBAC, and Just-In-Time access

SAML / OAuth

Azure AD B2C



Investor authentication and authorization

OAuth2 / OpenID

CRM (e.g., Zendesk, Salesforce)



Ticketing, distributor support, lead management

REST API

SMS / Email Gateways (MSG91, Twilio, SendGrid)



Notifications

REST / SMTP

Analytics / BI Platform (Power BI, Azure Synapse)



Business reporting, fund insights

Data Export + ETL / Kafka Connect

✅ Top 20 Enterprise Risks and Mitigations

#

Risk Description

Category

Mitigation Strategy

1

Incorrect NAV pricing

Business

NAV validation rules; Prometheus alerts for deviation beyond threshold

2

Duplicate order execution

Business

Enforce unique orderId; TransactionEngine idempotency logic

3

Commission miscalculation

Business

CommissionEngine logic + reconciliation audits; audit logs persisted in Cosmos DB

4

Delayed NAV feed

Operations

TTL check on NAV in Redis; fallback to Cosmos; alert via nav_age_seconds metric

5

Failed deployment causing downtime

Operations

Istio canary rollout, Helm rollback; Spring Boot health checks + synthetic testing

6

Manual SEBI reporting errors

Operations

Auto-generated SEBI-compliant reports with approval workflow and blob backups

7

Kafka consumer lag or topic overload

Technology

Partitioned topics, Prometheus lag alerts, autoscaler for consumer groups

8

Redis cache failure

Technology

Read-through fallback to Cosmos DB; Redis cluster with high-availability failover

9

Pod restarts due to memory/cpu spikes

Technology

Liveness/readiness probes; resource requests/limits; HPA tuning

10

PII data exposure

Security

AES-256 encryption at rest/in-transit; field masking; Key Vault tokenization

11

Public API abuse or denial-of-service

Security

Azure API Management throttling; rate limiting; JWT validation + RBAC

12

Hardcoded secrets or leaked credentials

Security

Use of Azure Key Vault + sealed secrets; pipeline security scanning

13

Audit trail tampering or loss

Governance

Write-once logs in Cosmos DB; append-only policy; RBAC-controlled access

14

SEBI/FATCA non-compliance

Governance

Automated scheduled exports; report APIs; policy-driven audit templates

15

Missing user activity logs

Governance

AuditTrailService with Kafka hooks + correlation ID logging

16

Admin/staff misuse of elevated privileges

People

Azure AD RBAC enforcement; scoped access levels; Just-In-Time role elevation

17

Fund configuration error

People

Admin UI with validations; dual-approval workflow for sensitive changes

18

Inconsistent CI/CD across teams

Process

Unified Azure DevOps pipeline templates; Helm-based release strategy

19

Missed disaster recovery drill

Process

Quarterly DR simulations; failover dashboards; observability alerts post-switch

20

Unauthorized access to audit data

Governance

Role-based export APIs; encryption of exports; audit logging of report generation


🔹 🔁 Inbound Data Pipelines (Batch + Real-Time)


Source System

Data Pipeline

Purpose

CAMS / KFinTech (RTA)

Daily NAV File → Kafka → Redis

NAV Updates every 15 mins

Razorpay / PG

Webhook → Kafka payment.success

Realtime payment event ingestion

UIDAI / NSDL

API Polling / Webhook → Kafka

eKYC and eSign event updates

Distributor CRM

API to create lead.created

Lead sync from distributor CRM

SEBI Data Pull

Scheduled batch download

Fund-level reports

Kafka Ingestor for Events

Kafka → Data Lake (ELT jobs)

Analytics, compliance tracking

Internal Scheduler

Cron → NAVPublisher

NAV push to Redis/Cosmos every 5–15 mins

🔹 Real-Time Capabilities (Streaming + Cache)

Feed

Mechanism

NAV Feed

Kafka + Redis + Cosmos DB

Order Events

Kafka order.placed → transaction.completed

Notifications

Kafka notification.sent + Async UI push

Compliance Audit Logs

Kafka → Cosmos append-only

📊 NAV Pipeline: Every 15 Minutes

  1. NAV file drop (SFTP) → NAVIngestorService

  2. File parsed → Kafka nav.raw-feeds

  3. NAVCalculatorService computes final NAV → nav.validated

  4. NAVPublisherService:

    • Cache in Redis (nav.current)

    • Persist in Cosmos DB

    • Kafka nav.broadcast to downstream

  5. Alerts if delay > 900s → nav.alert.raised


    🧠 NAVPublisherService — Redis Update Strategy

    ⏱️ Update Interval: Every 15 minutes (configurable via cron/scheduler)

    🔁 Flow Summary:

    1. NAVIngestorService picks up NAV feed file from SFTP / Blob

    2. Each NAV record is published to Kafka topic: nav.raw-feeds

    3. NAVCalculatorService consumes and processes NAV values:

      • Applies rounding, fee rules, currency conversion

      • Publishes to nav.validated

    4. NAVPublisherService listens to nav.validated:

      • Persists the NAV to Cosmos DB (for historical reference)

      • ✅ Updates Redis cache with the latest NAV every 15 minutes

        • Key: nav:<fundCode>

        • TTL: e.g., 20 minutes to prevent stale reads

      • Publishes to Kafka nav.broadcast for real-time use (e.g., alerting, UI push)

    💡 Why Redis Cache is Updated Every 15 Minutes?

    • NAV values are typically refreshed by RTAs (like CAMS/KFinTech) every 15 minutes.

    • Redis provides low-latency access for:

      • Transaction Engine (unit calculation)

      • Portfolio Service (current valuation)

      • Investor Dashboard UI (live NAV)

    Example Redis Entry:

json

Key: nav:HDFC123

Value: {

  "fundCode": "HDFC123",

  "nav": 55.1247,

  "currency": "INR",

  "timestamp": "2025-05-15T11:00:00Z"

}


🤝 Commission Calculation

  1. On transaction.completed → CommissionEngine invoked

  2. Payout calculated based on distributor mapping

  3. LedgerService updated → Kafka commission.calculated

  4. Distributor dashboard updated


🛡️ Fraud Detection

  1. TransactionEngine publishes transaction.completed

  2. FraudDetectionService listens → applies velocity rule

  3. If suspicious → Kafka fraud.alert.raised → Admin alert

  4. Manual review triggered via Admin portal


🔹 ✅ Enhanced Real-World Features

Category

Feature Example

SLAs

NAV data freshness < 900 seconds, Order latency < 2s

Auditability

Investor order trace from UI → Kafka → Transaction → DB

SLA Breach Alert

NAV ingestion delay → Prometheus → PagerDuty + UI Banner

NAV Fall-back

Redis → Cosmos DB fallback with alerting

Data Sync

Folio reconciliation with RTA → SFTP file → Kafka ingestion

Multi-tenant Ops

Separate fund house access + Istio gateway segmentation


🔂 CROSS-SERVICE EVENT MAP (Kafka Topics)

Topic Name

Produced By

Consumed By

investor.registered

AdminService

KYCService, AccountService

kyc.completed

KYCService

CredentialService

credentials.issued

CredentialService

NotificationService

fund.created

AdminService

ProductCatalogService

order.placed

OrderService

PaymentService, TransactionEngine

payment.success

PaymentService

TransactionEngine

transaction.completed

TransactionEngine

HoldingService, CommissionEngine

portfolio.updated

HoldingService

PortfolioService

nav.raw-feeds

NAVIngestorService

NAVCalculatorService

nav.validated

NAVCalculatorService

NAVPublisherService

nav.broadcast

NAVPublisherService

Redis Cache, NotificationService

notification.sent

NotificationService

N/A

commission.calculated

CommissionEngine

DistributorDashboardService

lead.created

DistributorService

CRMService

🔹 🧩 Example Microservice Inventory (~40+ services)

Service Name

Domain Area

InvestorService

Investor profile, preferences

KYCService

Aadhaar/PAN validation, UIDAI/NSDL

OrderService

Order placement, status tracking

TransactionEngine

NAV allocation, validation, settlement

NAVIngestorService

Ingest file from RTA (CAMS/KFinTech)

NAVCalculatorService

Apply rounding, formula

NAVPublisherService

Cache to Redis, store in Cosmos

PaymentService

Mandate, UPI, webhook handlers

HoldingService

Portfolio state, holding snapshot

ReportService

Monthly, quarterly statements

CommissionEngine

Event-driven commission calculator

NotificationService

SMS, Email, App push

DistributorService

Lead mgmt, hierarchy, commissions

CRMAdapterService

Integrates Freshdesk/Zendesk

AdminService

Admin access, user management

AuthService

AuthZ/AuthN, Azure AD & AD B2C

AuditTrailService

Kafka event logger to Cosmos DB

ComplianceReportService

SEBI/FATCA audit generator

FraudDetectionService

Velocity rules, anomaly alerts

DataLakeIngestor

Kafka to data lake ingestion

✅ Real-World Mutual Fund Platforms Have Multiple Data Pipelines, Not Just One

🔎 Why Multiple Pipelines?

Enterprise mutual fund platforms operate in a highly integrated, regulated, and data-rich ecosystem. Different types of data — with different SLAs, formats, sources, and consumers — demand specialized and decoupled pipelines for performance, compliance, and observability.

🔹 Examples of Independent Real-World Pipelines

Pipeline

Purpose

Characteristics

NAV Feed Ingestion

Ingest fund NAV from RTA

SFTP/API → Kafka → Redis/Cosmos

Transaction Audit Trail

Immutable logs for SEBI, FATCA

Kafka → CosmosDB append-only

Commission Calculation

Track commission for each transaction

Kafka → CommissionEngine → Ledger

SEBI Reporting

Periodic audit submission

Cosmos → CSV generator → Secure Upload

Fraud Monitoring

Real-time fraud pattern detection

Kafka → FraudDetectionService → Alert

Notification Pipeline

SMS, Email, Push for events

Kafka → NotificationService

Analytics & BI Pipeline

PowerBI or Azure Synapse integration

Kafka → DataLake → BI Export

CRM/Ticketing Feed

Support tickets, lead mgmt

CRM API → Kafka → TicketService

Payment Events Feed

Razorpay/PG webhook events

API → Kafka → PaymentService

✅ Each pipeline has unique:

  • SLAs (e.g., NAV < 15 min, alerts < 1 min, reports daily)

  • Data formats (CSV, JSON, binary)

  • Sources (SFTP, REST APIs, Webhooks)

  • Destinations (Redis, CosmosDB, Data Lake, Email/SMS)


🧩 Cluster Sizing Calculation (BFSI Standard)


✅ What’s the Standard TPS per Pod?

In BFSI-grade production environments, the typical sustained TPS per Spring Boot pod (with Kafka, Istio, Redis, logging, etc.) is:

Complexity of Microservice

Realistic TPS per Pod (Sustained)

Lightweight stateless service

150–200 TPS

Medium complexity (with Kafka, Redis)

80–120 TPS

Heavy logic or I/O-bound (e.g., TransactionEngine)

40–80 TPS

✅ Cluster Sizing Principles (BFSI-Standard Aligned)

Parameter

Industry Standard / Best Practice

TPS per Spring Boot Pod

50–100 TPS depending on complexity

Pods per Node (AKS)

6–8 pods per node (max 10 in controlled use cases)

Service Replication

2–3 replicas minimum for HA (zone fault tolerance)

System Overhead Pods

+40–60% for Istio, logging, Kafka, observability

Node Sizing Buffer

Always round up for peak load + HPA headroom

CPU/Memory Requests

Aligned to JVM heap sizing, Istio + metrics agents

Industry Standard Range: 50–100 TPS for core services under realistic latency + durability SLAs.

🔎 Industry Standard for Pod Density in BFSI Workloads

Context

Typical Pod Density (Pods per Node)

BFSI-grade workloads with Istio, Kafka, monitoring, encryption

6–8 pods per node (recommended)

Lightweight stateless apps

10–15 pods per node (rare in BFSI)

📌 Especially with Istio sidecars, JVM-based Spring Boot apps, and heavy observability, 7 pods per node is the most reliable target for BFSI.

✅ Calculation Based on Realistic TPS/Pod

Let’s recalculate based on a safer 75 TPS per pod, which is very reasonable for BFSI-grade transaction microservices under load with Istio, Kafka, and security instrumentation.


AKS Cluster Sizing – BFSI Industry Standard

✅ Final AKS Cluster Sizing (Based on 7 Pods/Node)

Metric

Value

Explanation

Concurrent Users Target

150,000

Investor + distributor workload

TPS Target

6,000

Transaction per second goal

TPS per Pod

75

Safe, BFSI-compliant throughput per pod

Estimated Pods for TPS

80

6000 / 75

Microservices

30

Business-domain aligned

Replicas per Service

3

For HA, load distribution

Adjusted Total Service Pods

90

30 services * 3

Infra/System Pods (50%)

45

Kafka, Redis, Istio, logging, tracing, agents

Total Pods Required

135

Core + infra

Pods per Node (BFSI conservative)

7

Aligns with Istio overhead and JVM resource usage

Final Estimated Node Count

20

135 / 7 rounded up



🧮 Step-by-Step AKS Cluster Sizing Calculation (BFSI Industry Standard)

Target: 150K+ concurrent users, 6000+ TPS, Spring Boot + Kafka + Istio on Azure AKS (active-active)

🔹 1. Define Core Input Metrics

Metric

Value

Justification

Concurrent Users

150,000

Real-world scale during market open

TPS Required

6,000

Order + NAV + notifications

TPS per Pod (BFSI standard)

75

With Istio, Kafka, metrics, JVM

No. of Core Microservices

30

Order, KYC, NAV, Transaction, etc.

Replicas per Service (HA)

3

Zone-level HA standard

Pods per Node (BFSI std)

7

After accounting for sidecars & infra overhead

🔹 2. Calculate Required Pods

🧩 a. Pods needed for TPS

bash

CopyEdit

6000 TPS ÷ 75 TPS/pod = 80 pods needed to meet demand

🧩 b. Estimate App Pods (30 services x 3 replicas)

java

CopyEdit

30 services × 3 replicas = 90 app pods (standard HA requirement)

🧩 c. Add System/Infra Overhead (50% extra)

java

CopyEdit

90 × 1.5 = 135 total pods (incl. Istio, Kafka, Redis, monitoring)

🔹 3. Calculate Node Requirement

bash

CopyEdit

135 pods ÷ 7 pods/node = ~19.3 → round up → 20 nodes per region

➡️ Final: 20 nodes/region × 2 regions (active-active) = 40 nodes total

🔹 4. Kafka, Redis, Istio Config Sizing

Component

Configuration

Kafka Brokers

5 brokers × 100 partitions each (active-active)

Redis

Premium cache × 3 shards with geo-replication

Istio

Enabled globally, sidecars auto-injected per pod

Cosmos DB

Multi-region write, 10k RU/s per partition

Ingress

Azure Front Door + Istio Gateway for multi-region routing


✅ This sizing matches real-world BFSI benchmarks (e.g., from AMCs, NBFCs, retail banking) and ensures:

  • Performance headroom

  • Predictable latency under load

  • HA/DR readiness

  • Compliance scalability (NAV, order processing, KYC)


🔹 Observability & Governance

Area

Detail

Audit Log Retention

Cosmos DB, 7-year TTL, write-once policy

Transaction Traceability

Correlation ID with logs per event

Prometheus Metrics

nav_age_seconds, tx_latency, HPA_scale_trigger

Alerts & Dashboards

Grafana (real-time), ELK, Teams/PagerDuty for ops alerts

SEBI/FATCA Reporting

Auto CSV/JSON reports, API download, access logs enabled

 Deployment & DR Strategy

Feature

Implementation

CI/CD

Azure DevOps + Helm + Istio + rollback

Canary Deployment

10% → 25% → 100% traffic shift via Istio

Blue/Green for Core Services

Parallel clusters with manual cutover

Active-Active Cluster

AKS South + West India, Kafka MM2 + Cosmos geo-write

Monthly DR Drill

Redis rehydration, failover reroute, Cosmos failproof


🔹 BFSI Compliance Highlights

  • Aadhaar/PAN encrypted & masked

  • SEBI reports: auto-generated JSON/CSV via APIs

  • Cosmos DB with 7-year TTL for audit

  • Role-based views: Admin, Investor, Distributor

  • DR: Kafka MM2 + Cosmos geo-write + Redis HA


Why This is Enterprise-Grade

1. High Concurrency and TPS Ready

  • Designed for 10,000–150,000 concurrent users

  • Built to handle 3,000–5,000+ transactions per second (TPS)

  • Uses horizontal scaling via AKS + Istio with HPA (pods) and CA (nodes)

2. Event-Driven Architecture (Kafka Backbone)

  • Fully asynchronous, decoupled microservices

  • Each critical domain publishes/consumes Kafka events (e.g., transaction.completed, nav.broadcast, portfolio.updated)

  • Supports high-throughput processing with guaranteed ordering and fault tolerance

3. Real-Time Data Flows

  • NAV updates every 15 minutes to 1 minute, flowing through SFTP → Kafka → Redis → UI in near real time

  • Portfolio values and alerts update live based on NAV changes

  • Commission tracking and distributor dashboards update in real time

4. Role-Based Separation & Flow Control

  • Clearly segmented flows:

    • Investor: login, fund browse, order, portfolio

    • Distributor: lead creation, commission calculation

    • Admin: registration, KYC, fund setup, audit, DR

  • Enables RBAC, observability, and scaling per role boundary

5. External System Integration

  • KYC APIs (UIDAI, NSDL), Payment Gateways, eSign (Digio), NAV Feed (CAMS, KFinTech)

  • CRM APIs, SEBI/AMFI reporting

  • Secured with Azure AD B2C / RBAC, Key Vault, and Istio mTLS

6. Compliance & Observability

  • Cosmos DB as immutable audit store

  • Prometheus + Grafana for detailed metrics (latency, TPS, resource usage)

  • ELK stack for centralized logging

  • SEBI compliance via automated report generation and trail visibility

7. Disaster Recovery & Multi-Region Setup

  • Active-Active AKS Clusters (e.g., South India + West India)

  • Kafka MirrorMaker2 ensures topic replication across regions

  • DBs (Azure SQL, Cosmos DB) in geo-redundant setup

  • Failover readiness tested via DR simulation flows

8. DevOps-Driven Continuous Delivery

  • Azure DevOps Pipelines for CI/CD

  • Helm + Kubernetes for deployment

  • Automated rollbacks, blue/green or canary releases supported


✅ BFSI Standards Alignment – Breakdown

🔹 1. Scalability & Performance

BFSI Expectation

Your Architecture

Handle 100K–200K concurrent users

✔ Designed for 150K+ users, 5K+ TPS

Multi-region HA

✔ Active-active AKS in South & West India

Zero-downtime deployment

✔ Istio canary & blue/green with rollback

Horizontal scaling

✔ HPA & Cluster Autoscaler with proactive scaling

🔹 2. Security & Compliance (SEBI / RBI / IRDAI)

BFSI Standard Requirement

Your Approach

7+ years of audit log retention

✔ Cosmos DB + immutability + TTL enforcement

Role-based access (RBAC)

✔ Azure AD, Istio policies, scoped APIs

PII Encryption

✔ PAN, Aadhaar encrypted + masked via Key Vault

DR readiness & tested failover

✔ Kafka MM2, Cosmos geo-replication, monthly drills

Secure deployments

✔ DevSecOps gates (SonarQube, CVE scan, Key Vault)

Immutable logs

✔ Write-once, append-only Cosmos setup

UIDAI / NSDL / Digio integration

✔ External KYC/eSign services integrated securely

🔹 3. Observability & Operations

BFSI Observability Practices

Your Design

Real-time infra + app monitoring

✔ Prometheus + Grafana dashboards

Log traceability

✔ ELK with contextual enrichment

Business SLA tracking (e.g., NAV)

✔ nav_age_seconds + Prometheus alerts

Region failover visibility

✔ Redis + Cosmos + Kafka readiness and alerting

🔹 4. BFSI Domain Patterns

Key Industry Pattern

Your Coverage

Event-driven transaction systems

✔ Kafka-based asynchronous orchestration

Idempotent transaction engines

✔ Unique orderId + deduplication logic

Real-time portfolio updates

✔ Redis + Kafka + NAV recalculations

SEBI reporting

✔ Scheduled compliant report generation

Payment reconciliation workflows

✔ Payment → Kafka → Transaction → Audit trail

📌 Final Verdict: ✅ BFSI-Grade Architecture

  • Complies with BFSI performance, resilience, and audit standards

  • ✔ Built-in observability, compliance, and HA

  • ✔ Well-positioned for SEBI inspections, cyber audits, and RBI IT governance reviews

  • ✔ Aligns with architectures used by top AMCs, NBFCs, insurers, and banks



👥 Persona-Based Architecture Walkthrough (Mutual Fund Platform)

👤 1. Investor Persona

"Retail investor engaging with funds for investment, redemption, or viewing portfolio."

🔄 Key Journeys:

  • Login & View Funds

  • Place Order (Buy/Sell/Switch)

  • View Portfolio & Transaction History

  • Receive Notifications & Statements

🧩 Microservices Involved:

  • AuthService (Azure AD B2C)

  • ProductCatalogService

  • OrderService, PaymentService

  • TransactionEngine

  • HoldingService, ReportService

  • NotificationService, PreferenceService

🔁 Event Flow:

  1. Login (OAuth via Azure AD B2C) → token with investorId

  2. Views NAV → served by NAVPublisher via Redis

  3. Places order → Kafka order.placed → PaymentService

  4. PG callback → Kafka payment.success

  5. TransactionEngine allocates units using NAV

  6. Kafka transaction.completed → triggers:

    • Portfolio update

    • Commission payout (if referred)

    • Email/SMS from NotificationService

🔗 External Integrations:

  • Razorpay (payment)

  • MSG91/Twilio (notification)

  • UIDAI/NSDL (onboarding via distributor/admin)

  • NSDL CAS (CAS sync)

🤝 2. Distributor Persona

"Advisors or agencies facilitating investor onboarding and earning commission."

🔄 Key Journeys:

  • Add Investor Leads

  • Track Commission

  • Download Investor Reports

  • Submit Support Requests

🧩 Microservices Involved:

  • DistributorService

  • LeadService, InvestorService

  • CommissionEngine, LedgerService

  • CRMAdapterService

  • ReportService, AuthService

🔁 Event Flow:

  1. Distributor logs in (Azure AD) → scoped RBAC

  2. Adds lead → Kafka lead.created

  3. Initiates registration → investor.registered

  4. When investor transacts → transaction.completed triggers:

    • CommissionEngine → commission.calculated

    • LedgerService logs payout

  5. Support ticket via CRM → CRMAdapterService sends to Zendesk

🔗 External Integrations:

  • GST Portal (commission invoice)

  • CRM (Zendesk/Salesforce)

  • SMS/Email

  • Aadhaar Vault (tokenized storage)

👨‍💼 3. Admin Persona

"Operations user managing fund setup, NAV, investor creation, and DR oversight."

🔄 Key Journeys:

  • Create Funds, Set NAV

  • Create Investors (Direct)

  • Monitor System Health

  • Manage Access & Reports

🧩 Microservices Involved:

  • AdminService, FundSetupService

  • NAVIngestorService, NAVPublisherService

  • KYCService, CredentialService

  • ComplianceReportService, AuditTrailService

  • DocumentService, AlertService

🔁 Event Flow:

  1. Admin logs in (Azure AD) → creates fund → Kafka fund.created

  2. NAV file drop → SFTP → NAVIngestorService

  3. Kafka nav.raw-feeds → calculated → nav.validated

  4. NAVPublisher pushes to Redis + Cosmos + Kafka nav.broadcast

  5. Admin creates investor (back office) → kyc.initiated → kyc.completed

🔗 External Integrations:

  • RTA (CAMS/KFinTech) for NAV

  • UIDAI/NSDL

  • Aadhaar Vault

  • Azure Monitor/Log Analytics

  • SEBI Gateway

🛡️ 4. Compliance Officer Persona

"Responsible for regulatory reports, data audits, and access governance."

🔄 Key Journeys:

  • Access Audit Trails

  • Generate Regulatory Reports

  • Verify User Actions & Data Flows

🧩 Microservices Involved:

  • AuditTrailService (event-sink from Kafka)

  • ComplianceReportService

  • ReportService, AccessLogService

🔁 Event Flow:

  1. Compliance logs in (Azure AD, scoped role)

  2. Requests audit report → AuditTrailService pulls from Cosmos DB

  3. SEBI/FATCA reports → auto-generated or on-demand → report.generated

  4. Any breach (e.g., delayed NAV, failed KYC) → alert.raised → escalated

🔗 External Integrations:

  • SEBI Upload Portal

  • Cosmos DB (7-year retention)

  • Azure RBAC Logs

  • Power BI (for dashboards)


🔐 System-Wide Architecture Enforcement

Concern

Approach

Security & RBAC

Azure AD/AD B2C, Istio AuthorizationPolicies, microservice-level auth

Scalability

AKS + Kafka + Redis; HPA/CA + pod/node isolation

Compliance

Cosmos DB audit, immutable logs, SEBI reporting automation

Observability

Prometheus + Grafana + ELK + Azure Monitor

Failover/DR

Active-active AKS, Kafka MM2, Cosmos geo-replication


✅ Real-World Mutual Fund Platform – Full Sequence & Event-Driven Flow

👤 Investor Persona Flow – Place Order, View Portfolio

Step

Initiator

Action

Event Produced

Produced By

Consumed By

Outcome

1

Investor

Logs in via portal

AuthService (AD B2C)

JWT token with investorId, role, tenantId issued

2

Investor

Browses fund list

ProductCatalogService

NAV and fund metadata fetched from Redis/cache

3

Investor

Places order

order.placed

OrderService

PaymentService

PaymentService initiates PG request (e.g. Razorpay)

4

Payment Gateway

Sends callback (after investor pays)

payment.success

PGWebhookHandler

PaymentService

Validates payment, marks it success, sends next event

5

PaymentService

Validates order + payment

transaction.ready

PaymentService

TransactionEngine

TransactionEngine begins NAV validation and unit allocation

6

TransactionEngine

Allocates units

transaction.completed

TransactionEngine

HoldingService, AuditTrailService

Units credited, event recorded in audit DB

7

HoldingService

Updates portfolio

portfolio.updated

HoldingService

PortfolioService

Portfolio cache and DB updated

8

NotificationService

Sends alert

notification.sent

NotificationService

Twilio/MSG91/Email

Investor gets SMS/email

9

Investor

Asks: “Compare Fund A vs B”

GenAIChatOrchestrator

ProductCatalogService, NAVPublisher

Response using RAG (factsheet, NAV, perf)

🤝 Distributor Persona Flow – Onboard Investor, Track Commission

Step

Initiator

Action

Event Produced

Produced By

Consumed By

Outcome

1

Distributor

Adds new lead

lead.created

LeadService

CRMAdapterService

Lead stored in CRM (Zendesk/Salesforce)

2

Distributor

Converts lead to investor

investor.registered

InvestorService

KYCService

Initiates KYC flow with Aadhaar/PAN

3

KYCService

Completes KYC

kyc.completed

KYCService

CredentialService

Login credentials issued via SMS/email

4

Investor

Places order

order.placed

OrderService

PaymentService

Starts order lifecycle

5

TransactionEngine

Order finalized

transaction.completed

TransactionEngine

CommissionEngine

Commission calculated, event published

6

CommissionEngine

Commission calculated

commission.calculated

CommissionEngine

LedgerService

Ledger updated, dashboard refreshed

7

Distributor

Asks: “Why is my payout low?”

GenAICommissionExplainer

LedgerService

GenAI explains based on ledger entries

👨‍💼 Admin Persona Flow – Fund Setup, NAV, User Management

Step

Initiator

Action

Event Produced

Produced By

Consumed By

Outcome

1

Admin

Creates a new fund

fund.created

AdminService

FundSetupService

Fund metadata persisted, fund listed

2

RTA

Uploads NAV file (every 15 mins)

nav.raw-feeds

NAVIngestorService

NAVCalculatorService

Parses file, schema & threshold validation

3

NAVCalculator

Valid NAV published

nav.validated

NAVCalculatorService

NAVPublisherService

NAV pushed to Redis, Cosmos, Kafka

4

NAVPublisher

NAV made public

nav.broadcast

NAVPublisherService

UI, PortfolioService

Latest NAV visible in UI and used for order processing

5

Admin

Downloads audit trail

audit.export.triggered

AuditTrailService

CosmosDB Exporter

Audit logs (immutable) exported

6

Admin

Uploads document

document.signed

DocumentService

Blob Storage, UI

Signed docs stored in Blob with hash verification

🛡️ Compliance Officer Persona Flow – AML, Reporting, Audit

Step

Initiator

Action

Event Produced

Produced By

Consumed By

Outcome

1

KYCService

Starts Aadhaar PAN validation

kyc.initiated

KYCService

AMLScreeningService

AML/PEP scan started

2

AML API

Raises a match

aml.alert.raised

AMLScreeningAdapter

AlertService

Compliance officer notified, ticket created

3

Compliance

Runs SEBI/FATCA report

report.scheduled

ComplianceService

CosmosDB, ReportService

CSV/JSON generated and uploaded to SEBI gateway

4

Compliance

Reviews alert summary

GenAIAMLExplainer

Cosmos + AML flags

Human-readable GenAI explanation of AML match

5

Officer

Verifies admin logs

audit.accessed

AuditTrailService

CosmosDB + UI

Immutable audit trail checked

📡 NAV Feed Pipeline Flow (Realtime)

Step

Action Source

Event Produced

Produced By

Consumed By

Outcome

1

RTA SFTP Upload

File placed

NAVIngestorService

File picked from blob or SFTP

2

NAV file processed

nav.raw-feeds

NAVIngestorService

NAVCalculatorService

Validated, cleaned

3

NAV calculated

nav.validated

NAVCalculatorService

NAVPublisherService

Rounding, formula applied

4

NAV cached/broadcasted

nav.broadcast

NAVPublisherService

UI, PortfolioService, TransactionEngine

Real-time NAV available for display & transactions

5

Monitoring

Metrics emitted

Prometheus Exporter

Grafana, Alerting

Alert if NAV delayed (nav_age_seconds > 900)

👤 INVESTOR FLOW — Place Order and View Portfolio

🧭 Use Case: Investor places an order and tracks it end-to-end

  1. Investor logs in→ AuthService authenticates via Azure AD B2C→ JWT token is issued with investorId, tenantId, and roles→ No event — handled via stateless auth

  2. Investor searches for funds→ UI calls ProductCatalogService→ Fund metadata and NAV fetched from Redis (cached by NAVPublisher)→ No event — real-time REST

  3. Investor places an order→ OrderService validates inputs→ Publishes order.placed to Kafka

  4. PaymentService consumes order.placed→ Initiates payment via Razorpay API→ Registers webhook endpoint→ Order marked as "Awaiting Payment"

  5. Razorpay sends callback (payment success)→ PGWebhookHandler receives response→ Publishes payment.success event

  6. PaymentService consumes payment.success→ Validates signature and transaction ID→ Publishes transaction.ready event

  7. TransactionEngine consumes transaction.ready→ Reads latest NAV from Redis→ Allocates fund units→ Publishes transaction.completed

  8. HoldingService consumes transaction.completed→ Updates portfolio state in DB and cache→ Publishes portfolio.updated

  9. NotificationService consumes portfolio.updated→ Fetches user contact preferences→ Sends SMS/Email→ Publishes notification.sent (for audit trail)

  10. Investor visits dashboard→ PortfolioService queries updated holdings→ Latest data shown in UI

🤝 DISTRIBUTOR FLOW — Onboard Investor and Track Commission

🧭 Use Case: Distributor creates a lead, completes KYC, tracks commissions

  1. Distributor logs in (Azure AD)→ Role: distributor, JWT token issued

  2. Adds lead→ LeadService creates entry→ Publishes lead.created

  3. CRMAdapterService consumes lead.created→ Syncs with Zendesk/Salesforce via API→ Lead created in CRM

  4. Distributor initiates registration→ InvestorService creates new investor→ Publishes investor.registered

  5. KYCService consumes investor.registered→ Triggers Aadhaar/PAN via Digio/NSDL→ On success → Publishes kyc.completed

  6. CredentialService consumes kyc.completed→ Issues login credentials via SMS/email→ Access activated

  7. Investor places transaction later→ Flow from "Investor" kicks in→ On transaction.completed → CommissionEngine invoked

  8. CommissionEngine publishes commission.calculated→ Includes payout type, hierarchy, tax→ Consumed by LedgerService

  9. LedgerService updates records→ Dashboard refreshed

👨‍💼 ADMIN FLOW — Fund Setup, NAV Publication, DR & Logs

🧭 Use Case: Admin configures a new fund, uploads NAV, and audits logs

  1. Admin logs in (Azure AD)→ JWT with admin role

  2. Creates fund→ AdminService triggers fund creation→ Publishes fund.created

  3. FundSetupService consumes fund.created→ Stores metadata, assigns default categories

  4. NAV file dropped by CAMS/KFinTech (every 15 mins)→ Blob trigger → NAVIngestorService reads file→ Parses schema, checks timestamp→ Publishes nav.raw-feeds

  5. NAVCalculatorService consumes nav.raw-feeds→ Computes final NAV using formula→ Publishes nav.validated

  6. NAVPublisherService consumes nav.validated→ Pushes to Redis (for UI), Cosmos DB (history)→ Publishes nav.broadcast

  7. All relevant consumers (UI, PortfolioService, TransactionEngine)→ Fetch updated NAV

  8. Admin triggers audit export→ AuditTrailService reads Cosmos append-only logs→ Publishes audit.export.triggered→ CSV/JSON made available for download

🛡️ COMPLIANCE FLOW — AML, SEBI Reporting, Audit Trail

🧭 Use Case: AML alert raised during KYC, reports submitted

  1. KYCService triggers PAN/AadhaarPublishes kyc.initiated

  2. AMLScreeningService consumes kyc.initiated→ Checks with AML/PEP APIs→ If flagged → Publishes aml.alert.raised

  3. AlertService consumes aml.alert.raised→ Shows alert on Compliance dashboard→ Ticket auto-generated

  4. ComplianceService runs daily job→ Reads audit logs + transactions→ Publishes report.scheduled

  5. ReportExporter consumes report.scheduled→ Generates SEBI-compliant output→ Secure file uploaded to SEBI gateway

  6. Compliance Officer requests AML explanation→ Query sent to GenAIAMLExplainerService→ Returns plain-English justification from JSON + logs

📡 NAV FEED PIPELINE FLOW – Every 15 mins

  1. NAV file dropped by RTA (CAMS/KFinTech)→ Stored in blob or SFTP folder

  2. NAVIngestorService picks up file→ Validates schema, checksum→ Publishes nav.raw-feeds

  3. NAVCalculatorService consumes nav.raw-feeds→ Applies rounding, threshold rules→ Publishes nav.validated

  4. NAVPublisherService consumes nav.validated→ Updates:

    • Redis cache (UI)

    • Cosmos DB (audit)

    • Kafka nav.broadcast

  5. Consumers (UI, Portfolio, TransactionEngine)→ Subscribe to nav.broadcast→ React in near real-time

  6. Prometheus tracks nav_age_seconds→ Alerts if >900 seconds old



  7. GEN AI Capability

  8. ===


🔹 1. GenAI Capabilities by Use Case

GenAI Capability

Target Persona

Purpose

Conversational Assistant (RAG)

Investor, Distributor

Answer fund questions, NAV, portfolio insights

Goal-based Investment Advisory

Investor

Recommend funds based on risk, age, goal

Document Summarization (Offer Docs)

Investor

Simplify long PDFs (factsheets, T&Cs)

Portfolio Health Analysis

Investor

Natural language explanation of holdings/performance

Anomaly Detection & Root Cause

Admin, Ops

Explain abnormal system or NAV behavior using logs

Auto-fill KYC & Form Validation

Admin, Distributor

Extract Aadhaar/PAN data and auto-fill during onboarding

Email/Support Reply Drafting

CRM Agent

Draft contextual replies using past tickets and metadata

AML/PEP Screening Summarizer

Compliance Officer

Explain alerts and matches from AML APIs

Fund Comparison Bot

Investor

Chatbot to compare funds with reasoning (RAG + LLM)

Commission Breakdown Explainer

Distributor

“Why did I get this commission?” — explainable GenAI

🔹 2. New GenAI Microservices to Introduce

Microservice

Function

GenAIAdvisorService

LLM-based fund advice and recommendations

KYCDataExtractorService

Extract Aadhaar/PAN data via OCR/NLP

DocSummarizerService

Summarize PDF factsheets, offer documents

PortfolioExplainerService

Use GenAI to explain investor’s gains/losses

ConversationOrchestrator

Multimodal chat interface orchestrating fund/NAV/portfolio lookups

AnomalyExplainerService

Use logs/metrics + LLM to explain failures or spikes (e.g., NAV delay)

SupportReplyGenerator

Generate email replies based on ticket + previous resolutions

🔹 3. Integration Architecture (Text Version)

Investor Chat Journey:

  1. Investor: “Show me best-performing funds in tech for 3 years”

  2. Chat UI → ConversationOrchestrator

  3. Orchestrator:

    • Hits RAG layer (Vector DB of fund factsheets)

    • Calls ProductCatalogService for NAV history

    • LLM composes explanation: “Fund A outperformed due to X, Y...”

  4. GenAI reply sent to UI (with visual + text)

NAV Spike Investigation:

  1. Ops observes NAV jump

  2. Sends query: “Why was NAV for Fund X high yesterday?”

  3. AnomalyExplainerService:

    • Pulls NAV feed logs, recent events, Redis trend

    • Calls LLM: “NAV spike due to large inflow from corporate investor X...”

  4. Reply published to Alert Dashboard

🔹 4. GenAI Architecture Components

Component

Role

LLM Backend

Azure OpenAI / private GPT / Claude / Gemini

RAG Layer

Vector DB (e.g., Pinecone, Azure Search with Embeddings)

Prompt Orchestrator

Dynamically compose prompts per use case

Chat UI

Angular/React with session memory, contextual tool invocation

Auth Hook

JWT-based tenant+user scoping for chat sessions

🔹 5. Observability & Governance for GenAI

Concern

Control

Data Leakage

Use private LLM or tokenized prompts; redact sensitive input

Prompt Injection

Validate and limit prompt inputs

Compliance Logging

Store prompt-response trace in Cosmos or Blob (7-year retention)

Output Explainability

RAG citations + fund sources

RBAC Access

Restrict chat tools per user role

✅ Summary: Business Value of GenAI Integration

Persona

Added Value

Investor

Smarter, advisory experience with explainable insights

Distributor

Faster commission clarity, lead scoring

Admin

Self-explaining system behaviors, onboarding automation

Compliance

AML/PEP rationalization, report drafting

CRM/Support

Faster, accurate, personalized responses







Resilience Strategy for External API Integration

To ensure high availability and fault-tolerance when interacting with external systems (UIDAI, PAN DB, SEBI, CRM, LLM APIs), the platform implements the following resilience strategies:

🔄 Circuit Breaker Pattern

  • Prevents cascading failures when external services are slow or unavailable.

  • Implemented using Resilience4j at the service layer.

  • Automatically blocks requests when failure threshold is breached.

🔁 Retry with Backoff

  • Uses Retry pattern with exponential backoff.

  • Configurable retry attempts and delay using Spring Boot + Resilience4j.

⏱ Timeout Handling

  • API calls are wrapped with a TimeLimiter to enforce 2–3 second limits.

  • Ensures downstream services are not held up indefinitely.

🚨 Fallback Methods

  • If external API fails after retries, a graceful fallback method returns a default response.

  • Example: “We’re unable to verify PAN at the moment. Please try later.”

📥 Asynchronous Integration

  • Long-running workflows use Kafka + webhook model.

  • API accepts request → processes async → external system calls back via webhook → flow resumes.

🧠 Caching Valid Responses

  • Frequently called but static APIs (like PAN validation) are cached in Redis with TTL.

  • Reduces dependency on external availability.

🧱 Bulkhead Pattern

  • Isolates external API interaction within limited thread pools.

  • Prevents one API failure from starving other service resources.

📈 Monitoring & Alerting

  • API health metrics exposed via Prometheus.

  • Dashboards and alerts configured in Grafana/Azure Monitor.

  • Alerts triggered on high failure rate, open circuit breakers, or timeouts.

🧾 Queue Fallback for Retry

  • Failed external interactions pushed to Kafka retry topic.

  • Delayed consumers handle retries with backoff logic.

This ensures the platform remains responsive and stable even when integrated systems are degraded or unavailable.


 
 
 

Recent Posts

See All
Ops Efficiency 30 % improvement

how did you achieve 30 % operational efficiency Achieving 30% operational efficiency in a BFSI-grade, microservices-based personal...

 
 
 

Comentários

Avaliado com 0 de 5 estrelas.
Ainda sem avaliações

Adicione uma avaliação
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page