top of page

Model Tiering- AI Cost Economics

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • 2 hours ago
  • 2 min read

🧠 What is Model Tiering in GenAI?

Model tiering is an architectural strategy where multiple AI models of different sizes, costs, and capabilities are used together, and each request is routed to the most cost-effective model that can meet the requirement.

Not every query needs the most powerful (and expensive) model.

🎯 Why Model Tiering is Critical (Especially in BFSI)

Without tiering:

  • Every request hits a large LLM

  • Costs explode

  • Latency increases

  • Risk surface grows

With tiering:

  • 60–75% of traffic handled by small models

  • Large models used only for complex cases

  • Predictable cost + better SLA

🏗️ Typical Model Tiers (Enterprise Reality)

Tier

Model Type

Usage

Tier-0

Rules / retrieval / templates

FAQs, static answers

Tier-1

Small / distilled LLMs

Summarization, classification

Tier-2

Medium LLMs

RAG, reasoning, analysis

Tier-3

Large / premium LLMs

Complex reasoning, edge cases

🔀 How Do You Decide Which Tier to Use?

You decide based on 4 dimensions:

1️⃣ Task Complexity

Task

Tier

Keyword lookup / FAQ

Tier-0

Simple summarization

Tier-1

Policy Q&A (RAG)

Tier-2

Multi-step reasoning

Tier-3

2️⃣ Risk & Compliance Sensitivity

Risk Level

Tier

Low (internal ops)

Tier-1 / Tier-2

Medium (customer-facing)

Tier-2

High (credit, compliance)

Tier-2 + human

Critical decisions

Human only

In BFSI, GenAI supports decisions — it does not make them.

3️⃣ Latency & SLA

SLA

Tier

<300 ms

Tier-0 / Tier-1

<800 ms

Tier-2

Async allowed

Tier-3

4️⃣ Cost Envelope

Cost Target

Tier

<₹1 per inference

Tier-1

₹1–₹3

Tier-2

₹5+

Tier-3

🧭 Routing Logic (Enterprise Pattern)

Request →
  Complexity Check →
  Risk Classification →
  SLA Requirement →
  Budget Check →
  Model Tier Selection →
  Fallback / Escalation

📊 Realistic Banking Distribution (What Sounds Real)

Tier

Traffic %

Tier-0

10–15%

Tier-1

45–55%

Tier-2

25–30%

Tier-3

5–10%

If someone says “most traffic goes to GPT-4”, they haven’t scaled GenAI.

💰 Impact of Model Tiering (Real Numbers)

Metric

Before

After

Cost / inference

₹3.8

₹1.9

Monthly AI spend

₹5 Cr

₹2.8 Cr

P95 latency

900 ms

480 ms

SLA breaches

Frequent

Rare

🎤 Summary

“Model tiering is an architectural approach where we route requests to different AI models based on complexity, risk, SLA, and cost.Simple tasks go to small models or even rules, while only complex, high-value cases reach large LLMs.In production, 60–70% of our traffic was handled by Tier-1 models, 25–30% by Tier-2, and less than 10% by large models.This reduced cost per inference by ~40% while improving latency and maintaining compliance.”

 
 
 

Recent Posts

See All
AI Risk Metrices

🏦 KEY BANKING RISK METRICS (EXPLAINED SIMPLY) 🔍 What is AUC  (in Credit / Risk Models)? AUC = Area Under the ROC Curve In simple terms: AUC measures how well a model can distinguish between good and

 
 
 
Gen AI USe case Estimation

✅ How to Build Estimation for a GenAI Use Case Step 1: Identify the Use Case Scope What business problem are you solving? (Customer support, document processing, fraud detection, etc.) Who are the end

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page