Model Tiering- AI Cost Economics
- Anand Nerurkar
- 2 hours ago
- 2 min read
🧠 What is Model Tiering in GenAI?
Model tiering is an architectural strategy where multiple AI models of different sizes, costs, and capabilities are used together, and each request is routed to the most cost-effective model that can meet the requirement.
Not every query needs the most powerful (and expensive) model.
🎯 Why Model Tiering is Critical (Especially in BFSI)
Without tiering:
Every request hits a large LLM
Costs explode
Latency increases
Risk surface grows
With tiering:
60–75% of traffic handled by small models
Large models used only for complex cases
Predictable cost + better SLA
🏗️ Typical Model Tiers (Enterprise Reality)
Tier | Model Type | Usage |
Tier-0 | Rules / retrieval / templates | FAQs, static answers |
Tier-1 | Small / distilled LLMs | Summarization, classification |
Tier-2 | Medium LLMs | RAG, reasoning, analysis |
Tier-3 | Large / premium LLMs | Complex reasoning, edge cases |
🔀 How Do You Decide Which Tier to Use?
You decide based on 4 dimensions:
1️⃣ Task Complexity
Task | Tier |
Keyword lookup / FAQ | Tier-0 |
Simple summarization | Tier-1 |
Policy Q&A (RAG) | Tier-2 |
Multi-step reasoning | Tier-3 |
2️⃣ Risk & Compliance Sensitivity
Risk Level | Tier |
Low (internal ops) | Tier-1 / Tier-2 |
Medium (customer-facing) | Tier-2 |
High (credit, compliance) | Tier-2 + human |
Critical decisions | Human only |
In BFSI, GenAI supports decisions — it does not make them.
3️⃣ Latency & SLA
SLA | Tier |
<300 ms | Tier-0 / Tier-1 |
<800 ms | Tier-2 |
Async allowed | Tier-3 |
4️⃣ Cost Envelope
Cost Target | Tier |
<₹1 per inference | Tier-1 |
₹1–₹3 | Tier-2 |
₹5+ | Tier-3 |
🧭 Routing Logic (Enterprise Pattern)
Request →
Complexity Check →
Risk Classification →
SLA Requirement →
Budget Check →
Model Tier Selection →
Fallback / Escalation
📊 Realistic Banking Distribution (What Sounds Real)
Tier | Traffic % |
Tier-0 | 10–15% |
Tier-1 | 45–55% |
Tier-2 | 25–30% |
Tier-3 | 5–10% |
If someone says “most traffic goes to GPT-4”, they haven’t scaled GenAI.
💰 Impact of Model Tiering (Real Numbers)
Metric | Before | After |
Cost / inference | ₹3.8 | ₹1.9 |
Monthly AI spend | ₹5 Cr | ₹2.8 Cr |
P95 latency | 900 ms | 480 ms |
SLA breaches | Frequent | Rare |
🎤 Summary
“Model tiering is an architectural approach where we route requests to different AI models based on complexity, risk, SLA, and cost.Simple tasks go to small models or even rules, while only complex, high-value cases reach large LLMs.In production, 60–70% of our traffic was handled by Tier-1 models, 25–30% by Tier-2, and less than 10% by large models.This reduced cost per inference by ~40% while improving latency and maintaining compliance.”
.png)

Comments