AI 1st Automation -Digital LEnding journey
- Anand Nerurkar
- Nov 23, 2025
- 16 min read
Updated: Mar 3
⭐ End-to-End Digital Lending Architecture – Borrower Journey (Ramesh)
Scenario
Borrower: RameshProduct: Personal Loan – ₹5 Lakhs
PHASE 1 — Borrower Interaction (GenAI + Frontend)
1. Ramesh logs in & applies for a personal loan
UI triggers Borrower GenAI Assistant (LLM-based conversational layer).
Ramesh asks:
“Am I eligible? What documents should I upload?”
GenAI retrieves policy rules from the RAG Layer (Policies, SOPs, KYC rules, eligibility matrix).
2. Upload of documents
Ramesh uploads:
Aadhaar
PAN
Salary slips
Bank statements (PDF)
Selfie (optional)
3. Application ID generated → Event emitted
application.receivedPayload stored in:
Blob Storage (Raw Zone)
Metadata in PostgreSQL
Document hashes in Cosmos DB
PHASE 2 — Document AI + MLOps Pipelines
4. Document AI (MLOps Pipeline #1)
Triggered event:documents.uploaded
AI model performs:
OCR + Layout understanding
Entity extraction (Name, DOB, PAN, Address, Salary, Employer name)
Document classification (KYC / Income / Bank statement / Noise docs)
Fraud signals: signature mismatch, tampering
Output stored in:
Curated zone (Blob)
Structured fields → PostgreSQL
Features → Feature Store
Note: Document AI is a trained model (MLOps pipeline, deployed on AKS via Azure ML runtime)
PHASE 3 — KYC/CDD/EDD
5. KYC Service consumes event
kyc.triggered
This performs:
Aadhaar XML / DigiLocker verification
PAN → NSDL
Face match (selfie vs Aadhaar)
Address consistency check
CDD → occupation, employer risk, geo-risk
EDD → high-risk occupation, mismatch in identity, multiple PAN matches
Fraud check → duplicate applications
6. If KYC fails
Event:kyc.failed
GenAI Borrower Assistant uses:
ContextAPI → timeline
RAG Layer → KYC SOPsto explain:
“Your KYC failed because your Aadhaar address does not match your PAN. Please upload updated Aadhaar.”
No policy or PII embedded — only policy text is in vector DB.
If Ramesh uploads corrections → pipeline restarts.
PHASE 4 — Parallel Risk Engines (Event-Driven)
Once KYC passes:kyc.completed
This triggers 3 parallel microservices:
A. Credit Risk Microservice (MLOps Pipeline #2)
Event: creditRisk.triggered
Actions:
Calls CIBIL/Experian API
Internal Credit ML model (PD, LGD estimation)
Stability of past liabilities
Delinquency prediction
Output → Feature Store + Timeline DB
B. Income Stability Service (MLOps Pipeline #3)
Event: incomeStability.triggered
Consumes data already extracted by OCR—no re-parsing.
Calculates:
Income-to-debt ratio
FOIR
Salary volatility
Employer risk score
Cash flow signal (from bank statements)
C. Fraud & AML/ Sanctions Service (MLOps Pipeline #4)
Event: fraudAndAML.triggered
Performs:
AML model scoring (internal)
Sanctions & PEP checks (API-based)
Hunter/Experian Fraud API
Anomaly detection (ML)
Device/browser fingerprint
Geo-location check
Outputs → Feature Store + Timeline
PHASE 5 — AI-Augmented Decisioning
Event: risk.allCompleted
Rule Engine + Model Fusion
Inputs:
Credit Score + ML PD
Income Stability Score
Fraud Score
AML Score
Policy constraints (interest rate caps, risk tiers)
Rule Engine outputs:
Auto-Approve
Auto-Reject
Manual Review
GenAI Underwriter Copilot
(LLM-based, for internal bank use)
Fetches:
All risk outputs via ContextAPI
Policies / SOP from RAG
AML/credit rules
Document AI results
And generates:
Risk summary
Policy deviations
Reasons for decision
Questions to ask borrower
Recommendation for final approval
The underwriter edits the summary →Human-in-the-loop feedback captured →Goes to LLMOps pipeline for reinforcement tuning.
PHASE 6 — Borrower Experience by GenAI
At every stage Ramesh can ask:
“Why is my loan delayed?”
“What is FOIR?”
“What happens after KYC?”
“Why did fraud score increase?”
GenAI responds using:
ContextAPI (application status, reasons)
RAG Layer (policy text)
Domain prompting (explain in simple terms)
No PII stored in vector DB.
PHASE 7 — Loan Agreement + e-Sign + CBS Account Creation
If approved:
Loan Agreement Generation
Uses traditional template engine
Optional: GenAI summary of agreement terms
EMI
Prepayment rules
Penalties
Tenure
Total cost of credit
Borrower reviews
Asks GenAI:
“Explain this loan agreement in simple terms.”
GenAI uses RAG over SOP/Policy + contextual loan data.
e-Sign Service
Event: loanAgreement.ready
OTP-based / Aadhaar eSign
Signed PDF → Blob Storage
CBS Integration
Event: esign.completed
CBS API creates loan account
Schedules repayment
Disbursal triggered automatically
Borrower Notification
SMS + email + app notification.
PHASE 8 — Analytics Layer (Bank Internal)
Operational Dashboards
Funnel drop-offs
TAT per step (KYC, ML, AML)
Fraud heatmap
Agent productivity
Risk Analytics
PD/LGD trends
NPA prediction
Early warning indicators
AML suspicious patterns
GenAI Governance Analytics
Prompt logs
Toxicity & bias monitoring
Red team insights
PHASE 9 — LLMOps Pipeline (Policies/SOP Only)
When a new regulatory policy arrives:
Ingestion
OCR + purification
Chunking
Embedding
Indexing into vector DB
Versioning + approval
Deploy updated RAG index
Red team testing
Promotion to production
(No PII is ever embedded.)
Summary — AI Models Used (Total 6 ML + 1 LLM)
ML Models (MLOps)
Document AI Model
Credit Risk Model
Income Stability Model
Fraud/Anomaly Model
AML Risk Model
Sanction/PEP ML Model
GenAI (LLMOps)
Borrower Assistant (LLM)
Underwriter Copilot (LLM)
A. End-to-end text architecture — one borrower journey (Ramesh)
Context: Ramesh logs in and applies for a personal loan. This is the full flow (event-driven). I name events and indicate which teams/infra own each step.
User action — Application created
Ramesh logs into portal → fills form → uploads Aadhaar, PAN, payslip, bank statement → application.created published.
Stored: raw files → ADLS Gen2 (raw); metadata + masked pointers → Postgres; timeline entry → Cosmos DB (context store).
Document ingestion & Document-AI
Event: docs.uploaded → Document-AI service consumes.
Document-AI (LayoutLM/ViT + NER + tamper & face-match models) extracts structured fields (name, dob, pan, salary, transactions) and produces confidences.
Outputs: docs.parsed (pointer to curated JSON in ADLS + masked fields in Postgres).
Owner: Feature/Data + Document AI team (MLOps owns model lifecycle for these models).
KYC / Identity validation
Event: kyc.triggered → KYC microservice validates against APIs (PAN / Aadhaar / CKYC) and checks liveness/face match.
Emits: kyc.completed with status {OK | SUSPICIOUS | FAIL_DEFINITE} and coded reasons (no raw PII in event).
If FAIL_DEFINITE → pipeline stops → decision.made = AUTO_REJECT. GenAI Borrower Assistant generates masked explanation and instructs Ramesh on next steps.
AML / Sanctions / PEP checks
Event: aml.triggered → AML microservice checks vendor lists (World-Check/Refinitiv), EU/UN/OFAC, PEP lists, adverse media.
Emits: aml.completed {CLEAR | POTENTIAL_HIT | HIGH_HIT} with reasonCodes and sourceRefs.
If HIGH_HIT → decision.made = AUTO_REJECT. If POTENTIAL_HIT → route to EDD (cdd.triggered).
Parallel predictive checks (after KYC+AML pass)
Orchestrator publishes simultaneously:
creditRisk.triggered → Credit microservice: calls CIBIL + calls Credit Risk ML endpoint → emits credit.completed (bureauScore, pdScore, modelVersion, shapTop).
fraudCheck.triggered → Fraud microservice: vendor call + Fraud ML endpoint → emits fraud.completed.
incomeStability.triggered → Income microservice: consumes parsed JSON → computes DTI, EMI capacity; optionally calls Income Stability ML → emits income.completed.
All model outputs are written to Feature Store (online) and snapshots to ADLS feature zone.
Owner: MLOps + application microservices.
Decision Engine (Rules + ML inputs)
Event: upon receiving credit.completed, fraud.completed, income.completed → Decision Engine (DMN/Drools) executes rules combining thresholds + ML scores.
Produces decision.made = {AUTO_APPROVE | AUTO_REJECT | MANUAL_REVIEW}. Includes ruleVersion, modelVersions, and evidencePointers (doc ids, shapTop).
Persisted to audit store (append-only) with traceId.
GenAI Underwriting Copilot & Borrower Assistant
If MANUAL_REVIEW or upon borrower request: Context API aggregates masked timeline + scores + evidence pointers (from Cosmos/Postgres) and calls LLMOps orchestrator.
RAG retrieves relevant SOP/policy chunks (policy KB stored in vector DB — NO PII).
LLM produces evidence-backed brief: summary, top risks, policy citations, recommended action. Emit underwriter.brief.created.
GenAI also drives the Borrower Assistant: Ramesh can ask “Why my KYC failed?” or “Explain the agreement”, and the assistant responds using Context API + RAG (masked info and SOPs).
Human-in-loop (if required)
Underwriter reviews the brief and documents, updates decision. Event: decision.confirmed (includes underwriterId, changes).
Edits/labels are stored for labeling pipeline.
Post-approval automation
If approved: agreement.generated via DocGen (templating); esign.triggered → Digital eSign provider returns esign.completed.
loan.account.create call to CBS (Finacle/Temenos) → loan.account.created → disbursement → notification to Ramesh.
Audit, Training & Monitoring
Every model inference, LLM prompt/response, and decision is logged (prompt + retrieved policy ids + LLM output) to immutable audit store for compliance.
MLOps monitors model drift, triggers retrain; LLMOps monitors retrieval quality, hallucination rates, and triggers red-team cycles.
Important operational notes for the journey:
PII never flows into vector DB. LLM sees only masked or derived context from Context API.
All events include traceId and auditPointer for full traceability.
Teams: App teams (microservices/orchestration), MLOps (training/serving), LLMOps (RAG/prompt ops), DevOps/SRE, Risk/Compliance.
B. The LLMOps pipeline (policy/SOP → RAG → reasoning)
Explain this sequence in interview terms:
SOP/Policy ingestion
Source: PDFs, DOCX, regulatory circulars, credit policy docs, SOPs.
Preprocess: clean, normalize (remove headers/footers), canonicalize.
Chunking (policy-aware)
Chunk by clause/section boundaries (preserve legal context).
Each chunk carries metadata: docId, sectionId, effectiveDate.
Embedding
Apply embedding model (governed & versioned). For in-house LLMs or Azure OpenAI embeddings.
Store vectors in a vector DB (Pinecone/Milvus/pgvector) with metadata.
Indexing
Build retrieval index and store mapping chunk → clause id → source.
RAG Retrieval
When Context API requests reasoning, LLM orchestrator:
Accepts masked context JSON (scores, reason codes, timeline).
Retrieves relevant policy chunks via vector DB + hybrid lexical checks (to guarantee precision).
Supplies context + snippets to LLM with system prompt that enforces citation & no hallucination.
Prompt orchestration & guardrails
Prompt templates are versioned by LLMOps.
Enforce rule: always cite policy chunk id(s) (evidence pins).
Enforce PII masking / safe-response templates.
Log prompt + retrieved snippets + response.
Response templating & audit
LLM output structured to include: summary, top risks, policy citations, recommended action.
Persist everything in audit store.
Monitoring / Feedback
Track retrieval recall/precision, hallucination incidents, response latency.
Run red-team tests and safety checks regularly.
C. Human-in-loop & retrain lifecycle
Human edits / approvals
Underwriters change decisions or annotate reasoning in the UI. Those edits become labeled data.
Label pipeline
Labeled cases are ingested into training datasets (feature store + label tables). Data is versioned and stored in ADLS (training zone).
Retraining & release
MLOps builds retrain pipelines, evaluates fairness/explainability (SHAP), runs validation, and stores candidate models in model registry (MLflow).
Models pass governance board before production rollout (canary/blue-green).
When to retrain
Retrain is triggered by: drift detection metrics, periodic schedule, or significant label accumulation (e.g., >X manual reviews for a cohort).
Impact of human edit on single application
If underwriter edits and resubmits, that application’s final decision is persisted immediately (no blocking), and it is stored as label for batch retrain. Optionally, a “fast re-score” can be triggered to update downstream counters or portfolio metrics.
D. Policy update (SOP/policy change) handling
Ingest new/updated policy into SOP ingestion pipeline (chunk → embed → index). This updates the vector DB and the mapping of clause ids.
Does NOT automatically re-run full upstream pipeline for all past applications by default (that’s expensive).
Re-evaluation strategies:
In-flight applications: Re-evaluate only open applications (re-query RAG & re-run Decision Engine if policy change affects thresholds). Emit decision.recheck.
Historical reprocessing: Run batch job to flag previously approved cases where compliance now requires review (audit use-case).
Audit: store policyVersion on all future decisions; retain old policy clause ids for historical auditability.
E. Where LLMs are deployed (deployment pattern)
Options depending on model choice and governance:
Managed cloud LLM (Azure OpenAI)
Pros: managed infra, lower ops, compliance contracts available.
Use when vendor models acceptable.
Private LLM (self-hosted) deployed via Azure ML / AKS
Deploy model container to AKS or Azure ML managed endpoints (KServe or Azure ML Real-time endpoints).
Use when tighter control/privacy required (on-prem/data residency).
LLMOps is responsible for container images, autoscaling, GPU scheduling, rate limiting, and prompt caching.
Hybrid
Use managed LLM for non-sensitive user interactions (templates) and private self-hosted smaller LLMs for sensitive, high-control reasoning.
Operational notes: LLM endpoints must be fronted by the LLM Gateway, which enforces prompt templates, quotas, PII masking, and logs every request/response for audit.
F. Red Team testing (what it is & why)
Red Team testing = adversarial testing of LLM/GenAI systems to surface safety and security failures:
Prompt injection tests (attempt to force LLM to reveal hidden data or ignore guards).
Data leakage tests (ensure LLM never reconstructs PII from masked input).
Hallucination benchmarks (give edge-case queries and measure factuality).
Adversarial content / jailbreak attempts (malicious queries to bypass policies).
Bias / fairness tests (measure differential outputs across demographic cohorts).
Load & failure tests (LLM under heavy load, fallback correctness).
Outcome: fix prompts, update guardrails, improve retrieval (RAG), patch model or fallback to templates. Red-team runs are a mandatory LLMOps task before production and periodically after.
G. Is AML+Sanction ML or API?
Hybrid pattern (practical):
Deterministic API checks against vendor lists (World-Check, Refinitiv, OFAC) for exact matches.
ML is used to score fuzzy matches, reduce false positives, detect linkages (graph ML linking aliases / shell companies) and to surface adverse media signals.
So AML service typically combines vendor API + ML for ranking/triage.
H. How many MLOps pipelines are required?
At minimum for this architecture:
Document-AI model pipeline (document classification + field extraction + face match).
Credit Risk model pipeline.
Fraud Detection model pipeline.
Income Stability model pipeline.
AML (if ML components used) pipeline.
(Optional) Behavioral/Intent model pipeline.
Monitoring / retrain pipeline (shared infra for all the above) — drift detection, data pipelines, labeling.
LLMOps is separate (prompt ops, RAG ingestion, embedding lifecycle) and has its own CI/CD hygiene but is not counted as a classic MLOps "model training" pipeline — still needs versioning and QA.
I. Summary — “AI-first automation in lending”
“Our platform is event-driven and AI-first: documents land in ADLS Gen2 and a Document-AI pipeline converts unstructured proofs into trusted structured features. Those features feed multiple MLOps-deployed models — credit, fraud, income stability and AML triage — which run in parallel and push results to a rule-based Decision Engine. The Decision Engine makes a deterministic judgement (auto-approve/reject/manual review) but every recommendation is accompanied by an evidence package: model scores, SHAP explainability, and policy references. For explainability and customer UX we use LLMOps: policies and SOPs are chunked, embedded into a RAG index, and a governed LLM synthesizes evidence-backed briefs for underwriters and natural-language explanations for customers. Human decisions are stored as labels for retrain; MLOps handles model lifecycle, drift detection and governance, while LLMOps manages prompt/versioning, red-team testing and retrieval quality. PII never goes into the vector store — the LLM only sees masked context via a Context API. This design delivers faster decisions, fewer false positives, measurable NPA improvements, and transparent explanations for customers and regulators.”
J. Quick Q&A bullets
Q: Does LLM ever see raw PII? — No. LLM reads masked context from Context API; vector DB only holds policies/SOPs.
Q: What triggers retrain? — drift detection, label accumulation from human edits, or periodic cadence + evaluation.
Q: What to do when policy updates? — ingest policy into RAG; re-evaluate in-flight applications selectively; batch reprocess historical cases if required for compliance.
Q: Where are LLMs hosted? — managed (Azure OpenAI) or private (AKS/AzureML endpoint) depending on governance and latency needs.
Q: How many MLOps pipelines? — minimum 5–7 (document AI, credit, fraud, income, AML + shared monitoring).
🏦 FOIR = Fixed Obligations to Income Ratio
FOIR measures how much of a borrower’s monthly income is already committed to fixed obligations.
It helps banks decide:
“Can this customer afford a new loan?”
📌 FOIR Formula
FOIR=Total Monthly Fixed Obligations * 100
Net Monthly Income
Example
Net Monthly Income = ₹1,00,000
Existing EMI (Home loan) = ₹25,000
Car loan EMI = ₹10,000
Credit card minimum due = ₹5,000
Total Obligations = ₹40,000
FOIR=(40,000/1,00,000)×100=40%
So this borrower’s FOIR = 40%
🧠 Why FOIR Is Important
Banks use FOIR to assess repayment capacity.
Higher FOIR = higher financial stress.
📊 Typical FOIR Thresholds (India – Indicative)
Borrower Type | Acceptable FOIR |
Salaried | 40%–50% |
Self-employed | 50%–60% |
High-income borrowers | Can go slightly higher |
Each bank defines its own internal policy.
🏦 Where FOIR Is Used
Personal Loans
Home Loans
Auto Loans
Credit Card eligibility
Loan top-ups
It is part of underwriting rules engine.
🔎 FOIR vs DTI (Debt-to-Income)
In many global markets, FOIR is similar to DTI ratio.
FOIR = Indian terminology
DTI = International terminology
Conceptually same idea.
🧩 In Digital Lending Architecture
FOIR calculation usually comes from:
Bureau report (existing EMIs)
Bank statement analysis
Internal exposure
Income documents (salary slip / ITR)
It becomes an input into:
Credit decision engine
Risk scoring model
Rule-based approval workflow
Chunking & Embeding ??
----
🔹 Simple Difference First
Concept | What It Does | Purpose |
Chunking | Splits large document into smaller pieces | Makes content searchable |
Embedding | Converts text into numerical vector | Makes semantic comparison possible |
So:
Chunking = Divide
Embedding = Convert
They are sequential steps, not alternatives.
🏦 Let’s Take a Sample Bank Policy Document
Imagine this credit policy:
Credit Policy v3.0
Clause 3.1 – Loan to Value (LTV)For salaried borrowers, maximum LTV shall not exceed 80%.
Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.
Clause 4.1 – FOIR NormTotal FOIR must not exceed 50% for salaried applicants.
Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.
Now let’s see what chunking does.
📌 Step 1: Chunking (Splitting the Document)
The full document is too large for efficient search.
So we split it like this:
Chunk 1:
Clause 3.1 – Loan to Value (LTV)For salaried borrowers, maximum LTV shall not exceed 80%.
Chunk 2:
Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.
Chunk 3:
Clause 4.1 – FOIR NormTotal FOIR must not exceed 50% for salaried applicants.
Chunk 4:
Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.
👉 Chunking simply created searchable blocks.
No math yet. No AI yet. Just structured splitting.
📌 Step 2: Embedding (Converting Each Chunk to Vector)
Now each chunk is converted into a numerical representation.
Example (simplified view):
Chunk 1 → [0.23, -0.44, 0.91, 0.11, … 1024 numbers]
Chunk 2 → [0.21, -0.48, 0.88, 0.09, … 1024 numbers]
Chunk 3 → [0.78, 0.15, -0.34, 0.65, … 1024 numbers]
These numbers represent the semantic meaning of the text.
You cannot read them — but the vector DB can compare them mathematically.
🧠 Now User Asks a Question
User query:
“What is the maximum funding for self-employed customers?”
Step 1 → Query is embedded into a vector.
Step 2 → Vector DB compares query vector with chunk vectors using cosine similarity.
Step 3 → It finds Chunk 2 is most similar.
Chunk 2 is retrieved and passed to LLM.
🔎 So What Each One Actually Does
🔹 Chunking
Without chunking:
Whole document becomes one vector
Retrieval becomes inaccurate
LLM receives too much irrelevant text
Hallucination risk increases
Chunking improves:
Precision
Clause-level accuracy
Citation support
🔹 Embedding
Embedding allows:
“LTV” and “funding cap” to match semantically
“Self-employed” and “business applicant” to match
Similar meaning to cluster in vector space
Without embedding:Search becomes keyword-based only.
🎯 Real-World Analogy
Imagine a huge law book.
Chunking = tearing it into individual sections.
Embedding = converting each section into a “meaning fingerprint”.
When someone asks a question:
You don’t read entire book.
You find the section whose fingerprint is closest to the question.
🏦 In Enterprise Knowledge Hub
Your pipeline becomes:
Document
↓
Chunking
↓
Embedding
↓
Vector Storage
↓
Similarity Search
↓
LLM Answer⚠️ Very Important Enterprise Insight
If chunking is bad:
Even best embedding model will fail.
If embedding model is weak:
Even perfect chunking won’t retrieve correctly.
Both are equally important.
📌 Summary in One Line
Chunking prepares content structure.Embedding makes semantic math possible.
Chunking Strategy
===
Chunking is not just “splitting text.”In banking systems, chunking directly affects:
Recall@K
Citation accuracy
Hallucination rate
Regulatory defensibility
Let’s go deep properly.
🧠 1️⃣ Different Chunking Strategies
We’ll use a sample credit policy to explain.
📄 Sample Policy
Credit Policy v3.0
Clause 3.1 – Loan to ValueFor salaried borrowers, maximum LTV shall not exceed 80%.
Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.
Clause 4.1 – FOIR NormTotal FOIR must not exceed 50%.
Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.
🔹 A. Fixed-Size Chunking
Split every X tokens (e.g., 500 tokens).
Example:
Chunk 1:Clause 3.1 + Clause 3.2
Chunk 2:Clause 4.1 + Clause 4.2
✅ Pros
Easy to implement
Fast
Works for generic corpora
❌ Cons
Breaks logical boundaries
Might merge unrelated clauses
Poor citation accuracy
For banking policies → not ideal.
🔹 B. Semantic / Heading-Based Chunking (Recommended)
Split based on:
Clause
Heading
Section markers
Legal numbering
So:
Chunk 1 → Clause 3.1Chunk 2 → Clause 3.2Chunk 3 → Clause 4.1Chunk 4 → Clause 4.2
✅ Pros
Clause-level retrieval
Clean citation
Higher precision
Better Recall@K for policy queries
❌ Cons
Requires document parsing logic
For Enterprise Knowledge Hub → This is best default.
🔹 C. Hierarchical Chunking (Advanced Enterprise Strategy)
This creates:
Level 1 → Section summaryLevel 2 → Clause chunksLevel 3 → Sub-clause details
Example structure:
Section 3 – LTV Policy→ Clause 3.1→ Clause 3.2
You embed all levels.
Why?
If query is broad:“What are LTV norms?”
Section-level chunk retrieved.
If query is specific:“What is LTV for self-employed?”
Clause-level chunk retrieved.
✅ Pros
Supports broad & narrow queries
Improves recall for mixed query types
Great for large policies
❌ More complex indexing
For large banks → highly recommended.
📊 2️⃣ How Chunk Size Affects Recall@K
Now let’s talk metrics.
Assume 1,000 chunks total.
🔹 Very Large Chunks (1500+ tokens)
Problem:
Contains too much information
Embedding becomes diluted
Query similarity reduces
Lower precision ranking
Recall might drop because:
Correct clause buried in large text block.
🔹 Very Small Chunks (50–100 tokens)
Problem:
Context lost
Meaning fragmented
Retrieval may return incomplete clause
LLM hallucinates missing pieces
Recall might look good… but answer quality poor.
🔹 Sweet Spot (Banking Context)
500–800 tokensOR logical clause-level segmentation
Why?
Enough context
Not too noisy
Clean citation
Stable embeddings
📉 3️⃣ How Bad Chunking Increases Hallucination
This is critical in banking.
❌ Scenario 1: Mixed Clause Chunk
Chunk contains:
LTV rule
FOIR rule
Exception clause
User asks:
“What is FOIR limit?”
LLM sees mixed chunk and may:
Blend LTV and FOIR rules
Misinterpret exception as general rule
Generate incorrect hybrid answer
This is hallucination induced by chunking.
❌ Scenario 2: Broken Clause
Clause split mid-sentence:
Chunk 1:“Total FOIR must not exceed…”
Chunk 2:“50% for salaried applicants.”
If only chunk 1 retrieved:
LLM guesses the number.
This is dangerous in lending context.
🏦 Why This Is Serious in Banking
Because wrong retrieval can lead to:
Wrong loan approval guidance
Regulatory misinterpretation
Credit decision support errors
Audit findings
In BFSI:
Bad chunking = indirect risk exposure.
🎯 Enterprise Recommendation for You
Given you’re building a Bank Knowledge Hub:
Use Hybrid Strategy:
Heading-based chunking
Clause-level granularity
Overlap 50–100 tokens
Add metadata:
Clause number
Version
Effective date
Policy type
🧠 Advanced Optimization
Test 3 chunk strategies:
Strategy | Recall@5 | MRR | Latency |
Fixed 500 | 0.87 | 0.65 | Fast |
Clause-based | 0.93 | 0.74 | Moderate |
Hierarchical | 0.95 | 0.80 | Slightly slower |
Pick the best balance.
Don’t assume.
Measure.
🔥 Powerful Insight
“How do you reduce hallucination in RAG?”
“We control hallucination at retrieval stage through clause-level semantic chunking and high Recall@5 benchmarking. Poor chunking is often the hidden cause of hallucination.”
Dimension size selection policy
===
How do we decide between 768, 1024, 1536 dimensions?
Let’s break this down practically — not academically.
🔹 First: What Does “Vector Dimension” Mean?
When we say:
768-dim → each chunk becomes 768 numbers
1024-dim → 1024 numbers
1536-dim → 1536 numbers
These numbers represent semantic meaning in high-dimensional space.
Higher dimension = more capacity to encode nuance.But also:
More storage
More memory
Slower search
Higher infra cost
So it's a trade-off.
🔎 Key Principle
You DO NOT choose dimension directly.
You choose the embedding model.
The model decides the dimension.
But you evaluate whether that dimension is appropriate for your system.
🧠 How to Decide? (Enterprise Method)
There are 4 decision axes:
1️⃣ Retrieval Accuracy (Primary Factor)
Run benchmarking:
Model | Dim | Recall@5 |
Model A | 768 | 0.88 |
Model B | 1024 | 0.93 |
Model C | 1536 | 0.94 |
If:
1024 gives 93%1536 gives 94%
Is +1% worth:
50% more storage?
Higher RAM?
More latency?
Sometimes yes. Sometimes no.
In banking, usually:
If recall improves meaningfully (>2–3%), higher dimension justified.
2️⃣ Corpus Size (Very Important)
Let’s calculate storage impact.
Storage formula:
Storage = Number_of_chunks × Dimension × 4 bytes(Each float = 4 bytes)
Example
Suppose:
1 million policy chunks
768 dim:
1,000,000 × 768 × 4 bytes≈ 3.07 GB
1536 dim:
1,000,000 × 1536 × 4≈ 6.14 GB
Double storage.
Now multiply across:
Backup
DR
Replication
Index overhead
Cost increases fast.
3️⃣ Latency & Vector DB Performance
Higher dimensions:
Slower similarity search
More RAM usage
Larger index size
If SLA is:
< 200ms retrieval
High concurrency (1000+ users)
768 or 1024 often sufficient.
1536 used when:
Deep semantic nuance required
Large knowledge base
Complex analytical retrieval
4️⃣ Use Case Complexity
Use Case | Recommended Dim Range |
Simple FAQ / SOP lookup | 384–768 |
Policy clause retrieval | 768–1024 |
Multi-document reasoning support | 1024–1536 |
Legal / research corpus | 1536+ |
For your banking Knowledge Hub:
👉 768–1024 is typically optimal.
🏦 Enterprise Decision Framework
Here’s how I would document it:
Step 1: Benchmark 2–3 models
Example:
768-dim model
1024-dim model
1536-dim model
Step 2: Measure
Recall@5
MRR
Retrieval latency
Memory usage
Step 3: Apply Decision Rule
Choose lowest dimension model that:
Meets Recall threshold (≥ 90%)
Meets SLA
Meets storage budget
Do NOT choose highest dimension by default.
🎯 Practical Recommendation for Banking Knowledge Hub
For:
Credit policies
SOP
Lending rules
Regulatory circulars
1024-dim is usually a sweet spot.
Why?
Good semantic richness
Manageable storage
Good recall performance
Stable for enterprise RAG
1536 is useful when:
Very large heterogeneous corpus
Cross-policy reasoning heavy
Legal/regulatory nuanced queries
⚠️ Important Misconception
Higher dimension ≠ better model.
A well-trained 768-dim model can outperform a weak 1536-dim model.
Quality of training > dimension size.
🧠 Architectural Insight
Dimension impacts:
Vector DB design
Memory sizing
Infra budget
DR replication size
Re-embedding cost
So dimension is an infra decision as much as AI decision.
🏦
“We selected a 1024-dimensional embedding model because it met 92% Recall@5 while maintaining optimal storage and sub-200ms latency. Higher dimension models showed marginal improvement but doubled infrastructure footprint.”
.png)

Comments