top of page

AI 1st Automation -Digital LEnding journey

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • Nov 23, 2025
  • 16 min read

Updated: Mar 3

End-to-End Digital Lending Architecture – Borrower Journey (Ramesh)


Scenario

Borrower: RameshProduct: Personal Loan – ₹5 Lakhs

PHASE 1 — Borrower Interaction (GenAI + Frontend)

1. Ramesh logs in & applies for a personal loan

  • UI triggers Borrower GenAI Assistant (LLM-based conversational layer).

  • Ramesh asks:

    “Am I eligible? What documents should I upload?”

  • GenAI retrieves policy rules from the RAG Layer (Policies, SOPs, KYC rules, eligibility matrix).

2. Upload of documents

Ramesh uploads:

  • Aadhaar

  • PAN

  • Salary slips

  • Bank statements (PDF)

  • Selfie (optional)

3. Application ID generated → Event emitted

application.receivedPayload stored in:

  • Blob Storage (Raw Zone)

  • Metadata in PostgreSQL

  • Document hashes in Cosmos DB

PHASE 2 — Document AI + MLOps Pipelines

4. Document AI (MLOps Pipeline #1)

Triggered event:documents.uploaded

AI model performs:

  • OCR + Layout understanding

  • Entity extraction (Name, DOB, PAN, Address, Salary, Employer name)

  • Document classification (KYC / Income / Bank statement / Noise docs)

  • Fraud signals: signature mismatch, tampering

Output stored in:

  • Curated zone (Blob)

  • Structured fields → PostgreSQL

  • Features → Feature Store

Note: Document AI is a trained model (MLOps pipeline, deployed on AKS via Azure ML runtime)

PHASE 3 — KYC/CDD/EDD

5. KYC Service consumes event

kyc.triggered

This performs:

  • Aadhaar XML / DigiLocker verification

  • PAN → NSDL

  • Face match (selfie vs Aadhaar)

  • Address consistency check

  • CDD → occupation, employer risk, geo-risk

  • EDD → high-risk occupation, mismatch in identity, multiple PAN matches

  • Fraud check → duplicate applications

6. If KYC fails

Event:kyc.failed

GenAI Borrower Assistant uses:

  • ContextAPI → timeline

  • RAG Layer → KYC SOPsto explain:

“Your KYC failed because your Aadhaar address does not match your PAN. Please upload updated Aadhaar.”

No policy or PII embedded — only policy text is in vector DB.

If Ramesh uploads corrections → pipeline restarts.

PHASE 4 — Parallel Risk Engines (Event-Driven)

Once KYC passes:kyc.completed

This triggers 3 parallel microservices:

A. Credit Risk Microservice (MLOps Pipeline #2)

Event: creditRisk.triggered

Actions:

  1. Calls CIBIL/Experian API

  2. Internal Credit ML model (PD, LGD estimation)

  3. Stability of past liabilities

  4. Delinquency prediction

Output → Feature Store + Timeline DB

B. Income Stability Service (MLOps Pipeline #3)

Event: incomeStability.triggered

Consumes data already extracted by OCR—no re-parsing.

Calculates:

  • Income-to-debt ratio

  • FOIR

  • Salary volatility

  • Employer risk score

  • Cash flow signal (from bank statements)

C. Fraud & AML/ Sanctions Service (MLOps Pipeline #4)

Event: fraudAndAML.triggered

Performs:

  • AML model scoring (internal)

  • Sanctions & PEP checks (API-based)

  • Hunter/Experian Fraud API

  • Anomaly detection (ML)

  • Device/browser fingerprint

  • Geo-location check

Outputs → Feature Store + Timeline

PHASE 5 — AI-Augmented Decisioning

Event: risk.allCompleted

Rule Engine + Model Fusion

Inputs:

  • Credit Score + ML PD

  • Income Stability Score

  • Fraud Score

  • AML Score

  • Policy constraints (interest rate caps, risk tiers)

Rule Engine outputs:

  • Auto-Approve

  • Auto-Reject

  • Manual Review

GenAI Underwriter Copilot

(LLM-based, for internal bank use)

Fetches:

  • All risk outputs via ContextAPI

  • Policies / SOP from RAG

  • AML/credit rules

  • Document AI results

And generates:

  • Risk summary

  • Policy deviations

  • Reasons for decision

  • Questions to ask borrower

  • Recommendation for final approval

The underwriter edits the summary →Human-in-the-loop feedback captured →Goes to LLMOps pipeline for reinforcement tuning.

PHASE 6 — Borrower Experience by GenAI

At every stage Ramesh can ask:

  • “Why is my loan delayed?”

  • “What is FOIR?”

  • “What happens after KYC?”

  • “Why did fraud score increase?”

GenAI responds using:

  • ContextAPI (application status, reasons)

  • RAG Layer (policy text)

  • Domain prompting (explain in simple terms)

No PII stored in vector DB.

PHASE 7 — Loan Agreement + e-Sign + CBS Account Creation

If approved:

Loan Agreement Generation

  • Uses traditional template engine

  • Optional: GenAI summary of agreement terms

    • EMI

    • Prepayment rules

    • Penalties

    • Tenure

    • Total cost of credit

Borrower reviews

Asks GenAI:

“Explain this loan agreement in simple terms.”

GenAI uses RAG over SOP/Policy + contextual loan data.

e-Sign Service

Event: loanAgreement.ready

  • OTP-based / Aadhaar eSign

  • Signed PDF → Blob Storage

CBS Integration

Event: esign.completed

  • CBS API creates loan account

  • Schedules repayment

  • Disbursal triggered automatically

Borrower Notification

SMS + email + app notification.

PHASE 8 — Analytics Layer (Bank Internal)

Operational Dashboards

  • Funnel drop-offs

  • TAT per step (KYC, ML, AML)

  • Fraud heatmap

  • Agent productivity

Risk Analytics

  • PD/LGD trends

  • NPA prediction

  • Early warning indicators

  • AML suspicious patterns

GenAI Governance Analytics

  • Prompt logs

  • Toxicity & bias monitoring

  • Red team insights

PHASE 9 — LLMOps Pipeline (Policies/SOP Only)

When a new regulatory policy arrives:

  1. Ingestion

  2. OCR + purification

  3. Chunking

  4. Embedding

  5. Indexing into vector DB

  6. Versioning + approval

  7. Deploy updated RAG index

  8. Red team testing

  9. Promotion to production

(No PII is ever embedded.)

Summary — AI Models Used (Total 6 ML + 1 LLM)

ML Models (MLOps)

  1. Document AI Model

  2. Credit Risk Model

  3. Income Stability Model

  4. Fraud/Anomaly Model

  5. AML Risk Model

  6. Sanction/PEP ML Model

GenAI (LLMOps)

  1. Borrower Assistant (LLM)

  2. Underwriter Copilot (LLM)


A. End-to-end text architecture — one borrower journey (Ramesh)

Context: Ramesh logs in and applies for a personal loan. This is the full flow (event-driven). I name events and indicate which teams/infra own each step.

  1. User action — Application created

    • Ramesh logs into portal → fills form → uploads Aadhaar, PAN, payslip, bank statement → application.created published.

    • Stored: raw files → ADLS Gen2 (raw); metadata + masked pointers → Postgres; timeline entry → Cosmos DB (context store).

  2. Document ingestion & Document-AI

    • Event: docs.uploaded → Document-AI service consumes.

    • Document-AI (LayoutLM/ViT + NER + tamper & face-match models) extracts structured fields (name, dob, pan, salary, transactions) and produces confidences.

    • Outputs: docs.parsed (pointer to curated JSON in ADLS + masked fields in Postgres).

    • Owner: Feature/Data + Document AI team (MLOps owns model lifecycle for these models).

  3. KYC / Identity validation

    • Event: kyc.triggered → KYC microservice validates against APIs (PAN / Aadhaar / CKYC) and checks liveness/face match.

    • Emits: kyc.completed with status {OK | SUSPICIOUS | FAIL_DEFINITE} and coded reasons (no raw PII in event).

    • If FAIL_DEFINITE → pipeline stops → decision.made = AUTO_REJECT. GenAI Borrower Assistant generates masked explanation and instructs Ramesh on next steps.

  4. AML / Sanctions / PEP checks

    • Event: aml.triggered → AML microservice checks vendor lists (World-Check/Refinitiv), EU/UN/OFAC, PEP lists, adverse media.

    • Emits: aml.completed {CLEAR | POTENTIAL_HIT | HIGH_HIT} with reasonCodes and sourceRefs.

    • If HIGH_HIT → decision.made = AUTO_REJECT. If POTENTIAL_HIT → route to EDD (cdd.triggered).

  5. Parallel predictive checks (after KYC+AML pass)

    • Orchestrator publishes simultaneously:

      • creditRisk.triggered → Credit microservice: calls CIBIL + calls Credit Risk ML endpoint → emits credit.completed (bureauScore, pdScore, modelVersion, shapTop).

      • fraudCheck.triggered → Fraud microservice: vendor call + Fraud ML endpoint → emits fraud.completed.

      • incomeStability.triggered → Income microservice: consumes parsed JSON → computes DTI, EMI capacity; optionally calls Income Stability ML → emits income.completed.

    • All model outputs are written to Feature Store (online) and snapshots to ADLS feature zone.

    • Owner: MLOps + application microservices.

  6. Decision Engine (Rules + ML inputs)

    • Event: upon receiving credit.completed, fraud.completed, income.completed → Decision Engine (DMN/Drools) executes rules combining thresholds + ML scores.

    • Produces decision.made = {AUTO_APPROVE | AUTO_REJECT | MANUAL_REVIEW}. Includes ruleVersion, modelVersions, and evidencePointers (doc ids, shapTop).

    • Persisted to audit store (append-only) with traceId.

  7. GenAI Underwriting Copilot & Borrower Assistant

    • If MANUAL_REVIEW or upon borrower request: Context API aggregates masked timeline + scores + evidence pointers (from Cosmos/Postgres) and calls LLMOps orchestrator.

    • RAG retrieves relevant SOP/policy chunks (policy KB stored in vector DB — NO PII).

    • LLM produces evidence-backed brief: summary, top risks, policy citations, recommended action. Emit underwriter.brief.created.

    • GenAI also drives the Borrower Assistant: Ramesh can ask “Why my KYC failed?” or “Explain the agreement”, and the assistant responds using Context API + RAG (masked info and SOPs).

  8. Human-in-loop (if required)

    • Underwriter reviews the brief and documents, updates decision. Event: decision.confirmed (includes underwriterId, changes).

    • Edits/labels are stored for labeling pipeline.

  9. Post-approval automation

    • If approved: agreement.generated via DocGen (templating); esign.triggered → Digital eSign provider returns esign.completed.

    • loan.account.create call to CBS (Finacle/Temenos) → loan.account.created → disbursement → notification to Ramesh.

  10. Audit, Training & Monitoring

    • Every model inference, LLM prompt/response, and decision is logged (prompt + retrieved policy ids + LLM output) to immutable audit store for compliance.

    • MLOps monitors model drift, triggers retrain; LLMOps monitors retrieval quality, hallucination rates, and triggers red-team cycles.

Important operational notes for the journey:

  • PII never flows into vector DB. LLM sees only masked or derived context from Context API.

  • All events include traceId and auditPointer for full traceability.

  • Teams: App teams (microservices/orchestration), MLOps (training/serving), LLMOps (RAG/prompt ops), DevOps/SRE, Risk/Compliance.

B. The LLMOps pipeline (policy/SOP → RAG → reasoning)

Explain this sequence in interview terms:

  1. SOP/Policy ingestion

    • Source: PDFs, DOCX, regulatory circulars, credit policy docs, SOPs.

    • Preprocess: clean, normalize (remove headers/footers), canonicalize.

  2. Chunking (policy-aware)

    • Chunk by clause/section boundaries (preserve legal context).

    • Each chunk carries metadata: docId, sectionId, effectiveDate.

  3. Embedding

    • Apply embedding model (governed & versioned). For in-house LLMs or Azure OpenAI embeddings.

    • Store vectors in a vector DB (Pinecone/Milvus/pgvector) with metadata.

  4. Indexing

    • Build retrieval index and store mapping chunk → clause id → source.

  5. RAG Retrieval

    • When Context API requests reasoning, LLM orchestrator:

      • Accepts masked context JSON (scores, reason codes, timeline).

      • Retrieves relevant policy chunks via vector DB + hybrid lexical checks (to guarantee precision).

      • Supplies context + snippets to LLM with system prompt that enforces citation & no hallucination.

  6. Prompt orchestration & guardrails

    • Prompt templates are versioned by LLMOps.

    • Enforce rule: always cite policy chunk id(s) (evidence pins).

    • Enforce PII masking / safe-response templates.

    • Log prompt + retrieved snippets + response.

  7. Response templating & audit

    • LLM output structured to include: summary, top risks, policy citations, recommended action.

    • Persist everything in audit store.

  8. Monitoring / Feedback

    • Track retrieval recall/precision, hallucination incidents, response latency.

    • Run red-team tests and safety checks regularly.

C. Human-in-loop & retrain lifecycle

  1. Human edits / approvals

    • Underwriters change decisions or annotate reasoning in the UI. Those edits become labeled data.

  2. Label pipeline

    • Labeled cases are ingested into training datasets (feature store + label tables). Data is versioned and stored in ADLS (training zone).

  3. Retraining & release

    • MLOps builds retrain pipelines, evaluates fairness/explainability (SHAP), runs validation, and stores candidate models in model registry (MLflow).

    • Models pass governance board before production rollout (canary/blue-green).

  4. When to retrain

    • Retrain is triggered by: drift detection metrics, periodic schedule, or significant label accumulation (e.g., >X manual reviews for a cohort).

  5. Impact of human edit on single application

    • If underwriter edits and resubmits, that application’s final decision is persisted immediately (no blocking), and it is stored as label for batch retrain. Optionally, a “fast re-score” can be triggered to update downstream counters or portfolio metrics.

D. Policy update (SOP/policy change) handling

  • Ingest new/updated policy into SOP ingestion pipeline (chunk → embed → index). This updates the vector DB and the mapping of clause ids.

  • Does NOT automatically re-run full upstream pipeline for all past applications by default (that’s expensive).

  • Re-evaluation strategies:

    • In-flight applications: Re-evaluate only open applications (re-query RAG & re-run Decision Engine if policy change affects thresholds). Emit decision.recheck.

    • Historical reprocessing: Run batch job to flag previously approved cases where compliance now requires review (audit use-case).

  • Audit: store policyVersion on all future decisions; retain old policy clause ids for historical auditability.

E. Where LLMs are deployed (deployment pattern)

Options depending on model choice and governance:

  1. Managed cloud LLM (Azure OpenAI)

    • Pros: managed infra, lower ops, compliance contracts available.

    • Use when vendor models acceptable.

  2. Private LLM (self-hosted) deployed via Azure ML / AKS

    • Deploy model container to AKS or Azure ML managed endpoints (KServe or Azure ML Real-time endpoints).

    • Use when tighter control/privacy required (on-prem/data residency).

    • LLMOps is responsible for container images, autoscaling, GPU scheduling, rate limiting, and prompt caching.

  3. Hybrid

    • Use managed LLM for non-sensitive user interactions (templates) and private self-hosted smaller LLMs for sensitive, high-control reasoning.

Operational notes: LLM endpoints must be fronted by the LLM Gateway, which enforces prompt templates, quotas, PII masking, and logs every request/response for audit.

F. Red Team testing (what it is & why)

Red Team testing = adversarial testing of LLM/GenAI systems to surface safety and security failures:

  • Prompt injection tests (attempt to force LLM to reveal hidden data or ignore guards).

  • Data leakage tests (ensure LLM never reconstructs PII from masked input).

  • Hallucination benchmarks (give edge-case queries and measure factuality).

  • Adversarial content / jailbreak attempts (malicious queries to bypass policies).

  • Bias / fairness tests (measure differential outputs across demographic cohorts).

  • Load & failure tests (LLM under heavy load, fallback correctness).

Outcome: fix prompts, update guardrails, improve retrieval (RAG), patch model or fallback to templates. Red-team runs are a mandatory LLMOps task before production and periodically after.

G. Is AML+Sanction ML or API?

  • Hybrid pattern (practical):

    • Deterministic API checks against vendor lists (World-Check, Refinitiv, OFAC) for exact matches.

    • ML is used to score fuzzy matches, reduce false positives, detect linkages (graph ML linking aliases / shell companies) and to surface adverse media signals.

  • So AML service typically combines vendor API + ML for ranking/triage.

H. How many MLOps pipelines are required?

At minimum for this architecture:

  1. Document-AI model pipeline (document classification + field extraction + face match).

  2. Credit Risk model pipeline.

  3. Fraud Detection model pipeline.

  4. Income Stability model pipeline.

  5. AML (if ML components used) pipeline.

  6. (Optional) Behavioral/Intent model pipeline.

  7. Monitoring / retrain pipeline (shared infra for all the above) — drift detection, data pipelines, labeling.

LLMOps is separate (prompt ops, RAG ingestion, embedding lifecycle) and has its own CI/CD hygiene but is not counted as a classic MLOps "model training" pipeline — still needs versioning and QA.

I. Summary — “AI-first automation in lending”

“Our platform is event-driven and AI-first: documents land in ADLS Gen2 and a Document-AI pipeline converts unstructured proofs into trusted structured features. Those features feed multiple MLOps-deployed models — credit, fraud, income stability and AML triage — which run in parallel and push results to a rule-based Decision Engine. The Decision Engine makes a deterministic judgement (auto-approve/reject/manual review) but every recommendation is accompanied by an evidence package: model scores, SHAP explainability, and policy references. For explainability and customer UX we use LLMOps: policies and SOPs are chunked, embedded into a RAG index, and a governed LLM synthesizes evidence-backed briefs for underwriters and natural-language explanations for customers. Human decisions are stored as labels for retrain; MLOps handles model lifecycle, drift detection and governance, while LLMOps manages prompt/versioning, red-team testing and retrieval quality. PII never goes into the vector store — the LLM only sees masked context via a Context API. This design delivers faster decisions, fewer false positives, measurable NPA improvements, and transparent explanations for customers and regulators.”

J. Quick Q&A bullets

  • Q: Does LLM ever see raw PII? — No. LLM reads masked context from Context API; vector DB only holds policies/SOPs.

  • Q: What triggers retrain? — drift detection, label accumulation from human edits, or periodic cadence + evaluation.

  • Q: What to do when policy updates? — ingest policy into RAG; re-evaluate in-flight applications selectively; batch reprocess historical cases if required for compliance.

  • Q: Where are LLMs hosted? — managed (Azure OpenAI) or private (AKS/AzureML endpoint) depending on governance and latency needs.

  • Q: How many MLOps pipelines? — minimum 5–7 (document AI, credit, fraud, income, AML + shared monitoring).


🏦 FOIR = Fixed Obligations to Income Ratio

FOIR measures how much of a borrower’s monthly income is already committed to fixed obligations.

It helps banks decide:

“Can this customer afford a new loan?”

📌 FOIR Formula

FOIR=Total Monthly Fixed Obligations * 100

Net Monthly Income ​

Example

  • Net Monthly Income = ₹1,00,000

  • Existing EMI (Home loan) = ₹25,000

  • Car loan EMI = ₹10,000

  • Credit card minimum due = ₹5,000

Total Obligations = ₹40,000

FOIR=(40,000/1,00,000)×100=40%

So this borrower’s FOIR = 40%

🧠 Why FOIR Is Important

Banks use FOIR to assess repayment capacity.

Higher FOIR = higher financial stress.

📊 Typical FOIR Thresholds (India – Indicative)

Borrower Type

Acceptable FOIR

Salaried

40%–50%

Self-employed

50%–60%

High-income borrowers

Can go slightly higher

Each bank defines its own internal policy.

🏦 Where FOIR Is Used

  • Personal Loans

  • Home Loans

  • Auto Loans

  • Credit Card eligibility

  • Loan top-ups

It is part of underwriting rules engine.

🔎 FOIR vs DTI (Debt-to-Income)

In many global markets, FOIR is similar to DTI ratio.

FOIR = Indian terminology

DTI = International terminology

Conceptually same idea.

🧩 In Digital Lending Architecture

FOIR calculation usually comes from:

  • Bureau report (existing EMIs)

  • Bank statement analysis

  • Internal exposure

  • Income documents (salary slip / ITR)

It becomes an input into:

  • Credit decision engine

  • Risk scoring model

  • Rule-based approval workflow


Chunking & Embeding ??

----

🔹 Simple Difference First

Concept

What It Does

Purpose

Chunking

Splits large document into smaller pieces

Makes content searchable

Embedding

Converts text into numerical vector

Makes semantic comparison possible

So:

Chunking = Divide
Embedding = Convert

They are sequential steps, not alternatives.


🏦 Let’s Take a Sample Bank Policy Document

Imagine this credit policy:


Credit Policy v3.0

Clause 3.1 – Loan to Value (LTV)For salaried borrowers, maximum LTV shall not exceed 80%.

Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.

Clause 4.1 – FOIR NormTotal FOIR must not exceed 50% for salaried applicants.

Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.

Now let’s see what chunking does.


📌 Step 1: Chunking (Splitting the Document)

The full document is too large for efficient search.

So we split it like this:

Chunk 1:

Clause 3.1 – Loan to Value (LTV)For salaried borrowers, maximum LTV shall not exceed 80%.

Chunk 2:

Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.

Chunk 3:

Clause 4.1 – FOIR NormTotal FOIR must not exceed 50% for salaried applicants.

Chunk 4:

Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.

👉 Chunking simply created searchable blocks.

No math yet. No AI yet. Just structured splitting.

📌 Step 2: Embedding (Converting Each Chunk to Vector)

Now each chunk is converted into a numerical representation.

Example (simplified view):

Chunk 1 → [0.23, -0.44, 0.91, 0.11, … 1024 numbers]

Chunk 2 → [0.21, -0.48, 0.88, 0.09, … 1024 numbers]

Chunk 3 → [0.78, 0.15, -0.34, 0.65, … 1024 numbers]

These numbers represent the semantic meaning of the text.

You cannot read them — but the vector DB can compare them mathematically.

🧠 Now User Asks a Question

User query:

“What is the maximum funding for self-employed customers?”

Step 1 → Query is embedded into a vector.

Step 2 → Vector DB compares query vector with chunk vectors using cosine similarity.

Step 3 → It finds Chunk 2 is most similar.

Chunk 2 is retrieved and passed to LLM.

🔎 So What Each One Actually Does

🔹 Chunking

Without chunking:

  • Whole document becomes one vector

  • Retrieval becomes inaccurate

  • LLM receives too much irrelevant text

  • Hallucination risk increases

Chunking improves:

  • Precision

  • Clause-level accuracy

  • Citation support

🔹 Embedding

Embedding allows:

  • “LTV” and “funding cap” to match semantically

  • “Self-employed” and “business applicant” to match

  • Similar meaning to cluster in vector space

Without embedding:Search becomes keyword-based only.

🎯 Real-World Analogy

Imagine a huge law book.

Chunking = tearing it into individual sections.

Embedding = converting each section into a “meaning fingerprint”.

When someone asks a question:

You don’t read entire book.

You find the section whose fingerprint is closest to the question.

🏦 In Enterprise Knowledge Hub

Your pipeline becomes:

Document   
↓
Chunking
↓
Embedding   
↓
Vector Storage   
↓
Similarity Search   
↓
LLM Answer

⚠️ Very Important Enterprise Insight

If chunking is bad:

  • Even best embedding model will fail.

If embedding model is weak:

  • Even perfect chunking won’t retrieve correctly.

Both are equally important.

📌 Summary in One Line

Chunking prepares content structure.Embedding makes semantic math possible.


Chunking Strategy

===

Chunking is not just “splitting text.”In banking systems, chunking directly affects:

  • Recall@K

  • Citation accuracy

  • Hallucination rate

  • Regulatory defensibility

Let’s go deep properly.

🧠 1️⃣ Different Chunking Strategies

We’ll use a sample credit policy to explain.

📄 Sample Policy

Credit Policy v3.0

Clause 3.1 – Loan to ValueFor salaried borrowers, maximum LTV shall not exceed 80%.

Clause 3.2 – Self-employed LTVFor self-employed borrowers, maximum LTV shall not exceed 70%.

Clause 4.1 – FOIR NormTotal FOIR must not exceed 50%.

Clause 4.2 – Manual UnderwritingIf FOIR exceeds 50%, case must go for manual approval.

🔹 A. Fixed-Size Chunking

Split every X tokens (e.g., 500 tokens).

Example:

Chunk 1:Clause 3.1 + Clause 3.2

Chunk 2:Clause 4.1 + Clause 4.2

✅ Pros

  • Easy to implement

  • Fast

  • Works for generic corpora

❌ Cons

  • Breaks logical boundaries

  • Might merge unrelated clauses

  • Poor citation accuracy

For banking policies → not ideal.

🔹 B. Semantic / Heading-Based Chunking (Recommended)

Split based on:

  • Clause

  • Heading

  • Section markers

  • Legal numbering

So:

Chunk 1 → Clause 3.1Chunk 2 → Clause 3.2Chunk 3 → Clause 4.1Chunk 4 → Clause 4.2

✅ Pros

  • Clause-level retrieval

  • Clean citation

  • Higher precision

  • Better Recall@K for policy queries

❌ Cons

  • Requires document parsing logic

For Enterprise Knowledge Hub → This is best default.

🔹 C. Hierarchical Chunking (Advanced Enterprise Strategy)

This creates:

Level 1 → Section summaryLevel 2 → Clause chunksLevel 3 → Sub-clause details

Example structure:

Section 3 – LTV Policy→ Clause 3.1→ Clause 3.2

You embed all levels.

Why?

If query is broad:“What are LTV norms?”

Section-level chunk retrieved.

If query is specific:“What is LTV for self-employed?”

Clause-level chunk retrieved.

✅ Pros

  • Supports broad & narrow queries

  • Improves recall for mixed query types

  • Great for large policies

❌ More complex indexing

For large banks → highly recommended.

📊 2️⃣ How Chunk Size Affects Recall@K

Now let’s talk metrics.

Assume 1,000 chunks total.

🔹 Very Large Chunks (1500+ tokens)

Problem:

  • Contains too much information

  • Embedding becomes diluted

  • Query similarity reduces

  • Lower precision ranking

Recall might drop because:

Correct clause buried in large text block.

🔹 Very Small Chunks (50–100 tokens)

Problem:

  • Context lost

  • Meaning fragmented

  • Retrieval may return incomplete clause

  • LLM hallucinates missing pieces

Recall might look good… but answer quality poor.

🔹 Sweet Spot (Banking Context)

500–800 tokensOR logical clause-level segmentation

Why?

  • Enough context

  • Not too noisy

  • Clean citation

  • Stable embeddings

📉 3️⃣ How Bad Chunking Increases Hallucination

This is critical in banking.

❌ Scenario 1: Mixed Clause Chunk

Chunk contains:

  • LTV rule

  • FOIR rule

  • Exception clause

User asks:

“What is FOIR limit?”

LLM sees mixed chunk and may:

  • Blend LTV and FOIR rules

  • Misinterpret exception as general rule

  • Generate incorrect hybrid answer

This is hallucination induced by chunking.

❌ Scenario 2: Broken Clause

Clause split mid-sentence:

Chunk 1:“Total FOIR must not exceed…”

Chunk 2:“50% for salaried applicants.”

If only chunk 1 retrieved:

LLM guesses the number.

This is dangerous in lending context.

🏦 Why This Is Serious in Banking

Because wrong retrieval can lead to:

  • Wrong loan approval guidance

  • Regulatory misinterpretation

  • Credit decision support errors

  • Audit findings

In BFSI:

Bad chunking = indirect risk exposure.

🎯 Enterprise Recommendation for You

Given you’re building a Bank Knowledge Hub:

Use Hybrid Strategy:

  1. Heading-based chunking

  2. Clause-level granularity

  3. Overlap 50–100 tokens

  4. Add metadata:

    • Clause number

    • Version

    • Effective date

    • Policy type

🧠 Advanced Optimization

Test 3 chunk strategies:

Strategy

Recall@5

MRR

Latency

Fixed 500

0.87

0.65

Fast

Clause-based

0.93

0.74

Moderate

Hierarchical

0.95

0.80

Slightly slower

Pick the best balance.

Don’t assume.

Measure.

🔥 Powerful Insight


“How do you reduce hallucination in RAG?”


“We control hallucination at retrieval stage through clause-level semantic chunking and high Recall@5 benchmarking. Poor chunking is often the hidden cause of hallucination.”

Dimension size selection policy

===

How do we decide between 768, 1024, 1536 dimensions?

Let’s break this down practically — not academically.

🔹 First: What Does “Vector Dimension” Mean?

When we say:

  • 768-dim → each chunk becomes 768 numbers

  • 1024-dim → 1024 numbers

  • 1536-dim → 1536 numbers

These numbers represent semantic meaning in high-dimensional space.

Higher dimension = more capacity to encode nuance.But also:

  • More storage

  • More memory

  • Slower search

  • Higher infra cost

So it's a trade-off.

🔎 Key Principle

You DO NOT choose dimension directly.

You choose the embedding model.

The model decides the dimension.

But you evaluate whether that dimension is appropriate for your system.

🧠 How to Decide? (Enterprise Method)

There are 4 decision axes:

1️⃣ Retrieval Accuracy (Primary Factor)

Run benchmarking:

Model

Dim

Recall@5

Model A

768

0.88

Model B

1024

0.93

Model C

1536

0.94

If:

1024 gives 93%1536 gives 94%

Is +1% worth:

  • 50% more storage?

  • Higher RAM?

  • More latency?

Sometimes yes. Sometimes no.

In banking, usually:

If recall improves meaningfully (>2–3%), higher dimension justified.

2️⃣ Corpus Size (Very Important)

Let’s calculate storage impact.

Storage formula:

Storage = Number_of_chunks × Dimension × 4 bytes

(Each float = 4 bytes)

Example

Suppose:

  • 1 million policy chunks

768 dim:

1,000,000 × 768 × 4 bytes≈ 3.07 GB

1536 dim:

1,000,000 × 1536 × 4≈ 6.14 GB

Double storage.

Now multiply across:

  • Backup

  • DR

  • Replication

  • Index overhead

Cost increases fast.

3️⃣ Latency & Vector DB Performance

Higher dimensions:

  • Slower similarity search

  • More RAM usage

  • Larger index size

If SLA is:

  • < 200ms retrieval

  • High concurrency (1000+ users)

768 or 1024 often sufficient.

1536 used when:

  • Deep semantic nuance required

  • Large knowledge base

  • Complex analytical retrieval

4️⃣ Use Case Complexity

Use Case

Recommended Dim Range

Simple FAQ / SOP lookup

384–768

Policy clause retrieval

768–1024

Multi-document reasoning support

1024–1536

Legal / research corpus

1536+

For your banking Knowledge Hub:

👉 768–1024 is typically optimal.

🏦 Enterprise Decision Framework

Here’s how I would document it:

Step 1: Benchmark 2–3 models

Example:

  • 768-dim model

  • 1024-dim model

  • 1536-dim model

Step 2: Measure

  • Recall@5

  • MRR

  • Retrieval latency

  • Memory usage

Step 3: Apply Decision Rule

Choose lowest dimension model that:

  • Meets Recall threshold (≥ 90%)

  • Meets SLA

  • Meets storage budget

Do NOT choose highest dimension by default.

🎯 Practical Recommendation for Banking Knowledge Hub

For:

  • Credit policies

  • SOP

  • Lending rules

  • Regulatory circulars

1024-dim is usually a sweet spot.

Why?

  • Good semantic richness

  • Manageable storage

  • Good recall performance

  • Stable for enterprise RAG

1536 is useful when:

  • Very large heterogeneous corpus

  • Cross-policy reasoning heavy

  • Legal/regulatory nuanced queries

⚠️ Important Misconception

Higher dimension ≠ better model.

A well-trained 768-dim model can outperform a weak 1536-dim model.

Quality of training > dimension size.

🧠 Architectural Insight

Dimension impacts:

  • Vector DB design

  • Memory sizing

  • Infra budget

  • DR replication size

  • Re-embedding cost

So dimension is an infra decision as much as AI decision.

🏦

“We selected a 1024-dimensional embedding model because it met 92% Recall@5 while maintaining optimal storage and sub-200ms latency. Higher dimension models showed marginal improvement but doubled infrastructure footprint.”


 
 
 

Recent Posts

See All
RFP PRE/POST-PROPOSAL SUBMISSION FLOW

🏆 1. The 5 Pillars to Win a Large Strategic Deal 1. Understand the Client Better Than They Do 👉 Don’t just read RFP — decode it What is their real problem ? What is driving this deal? (compliance, c

 
 
 
DIGITAL LENDING RFP Solution

🎯 RFP Proposal SOLUTION PRESENTATION – DIGITAL LENDING (WITH COLOR-CODED ARCHITECTURE) 1️⃣ Opening “Thank you for the opportunity. I’ll walk you through our approach to building a next-generation dig

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page