Model Tiering- AI Cost Economics

Anand Nerurkar
Dec 18, 2025
2 min read

🧠 What is Model Tiering in GenAI?

Model tiering is an architectural strategy where multiple AI models of different sizes, costs, and capabilities are used together, and each request is routed to the most cost-effective model that can meet the requirement.

Not every query needs the most powerful (and expensive) model.

🎯 Why Model Tiering is Critical (Especially in BFSI)

Without tiering:

Every request hits a large LLM
Costs explode
Latency increases
Risk surface grows

With tiering:

60–75% of traffic handled by small models
Large models used only for complex cases
Predictable cost + better SLA

🏗️ Typical Model Tiers (Enterprise Reality)

Tier	Model Type	Usage
Tier-0	Rules / retrieval / templates	FAQs, static answers
Tier-1	Small / distilled LLMs	Summarization, classification
Tier-2	Medium LLMs	RAG, reasoning, analysis
Tier-3	Large / premium LLMs	Complex reasoning, edge cases

🔀 How Do You Decide Which Tier to Use?

You decide based on 4 dimensions:

1️⃣ Task Complexity

Task	Tier
Keyword lookup / FAQ	Tier-0
Simple summarization	Tier-1
Policy Q&A (RAG)	Tier-2
Multi-step reasoning	Tier-3

2️⃣ Risk & Compliance Sensitivity

Risk Level	Tier
Low (internal ops)	Tier-1 / Tier-2
Medium (customer-facing)	Tier-2
High (credit, compliance)	Tier-2 + human
Critical decisions	Human only

In BFSI, GenAI supports decisions — it does not make them.

3️⃣ Latency & SLA

SLA	Tier
<300 ms	Tier-0 / Tier-1
<800 ms	Tier-2
Async allowed	Tier-3

4️⃣ Cost Envelope

Cost Target	Tier
<₹1 per inference	Tier-1
₹1–₹3	Tier-2
₹5+	Tier-3

🧭 Routing Logic (Enterprise Pattern)

Request →
  Complexity Check →
  Risk Classification →
  SLA Requirement →
  Budget Check →
  Model Tier Selection →
  Fallback / Escalation

📊 Realistic Banking Distribution (What Sounds Real)

Tier	Traffic %
Tier-0	10–15%
Tier-1	45–55%
Tier-2	25–30%
Tier-3	5–10%

If someone says “most traffic goes to GPT-4”, they haven’t scaled GenAI.

💰 Impact of Model Tiering (Real Numbers)

Metric	Before	After
Cost / inference	₹3.8	₹1.9
Monthly AI spend	₹5 Cr	₹2.8 Cr
P95 latency	900 ms	480 ms
SLA breaches	Frequent	Rare

🎤 Summary

“Model tiering is an architectural approach where we route requests to different AI models based on complexity, risk, SLA, and cost.Simple tasks go to small models or even rules, while only complex, high-value cases reach large LLMs.In production, 60–70% of our traffic was handled by Tier-1 models, 25–30% by Tier-2, and less than 10% by large models.This reduced cost per inference by ~40% while improving latency and maintaining compliance.”