Multi Agent & Agentic AI
- Anand Nerurkar
- Nov 24
- 6 min read
1. Agentic AI (What it means)
Agentic AI refers to AI systems that can take autonomous actions, not just generate text.These systems perceive, reason, plan, act, and learn — like a digital worker that executes end-to-end tasks with minimal human intervention.
Key capabilities:
Autonomy: Takes decisions without being prompted every time
Planning: Breaks goals into sub-tasks
Tool use: Calls APIs, databases, models
Reasoning loops: Self-critique, refine output
Learning: Improves from feedback
Example for BFSI:An “Agentic Fraud Investigator” that reads transactions → flags anomalies → gathers supporting evidence → recommends action → updates case notes.
2. What Is a Multi-Agent System?
A multi-agent system is a team of multiple AI agents, each with specialized skills, working together to complete a complex business workflow.
It mimics an enterprise function where multiple teams collaborate.
Example Structure:
Classification AgentReads a document, classifies into KYC/Loan/Invoice.
Extraction AgentExtracts fields using DocAI.
Validation AgentChecks data quality, RBI rules, business rules.
Decision AgentDetermines next best action.
Orchestrator AgentCoordinates all agents, maintains workflow state.
Why use Multi-Agent architecture?
Scalability
Clear separation of responsibilities
Easier troubleshooting / governance
Can plug-and-play individual models
Enables enterprise-wide reuse
🧠 Agent vs Multi-Agent (Interviewer-friendly comparison)
Feature | Agentic AI | Multi-Agent System |
Definition | A single autonomous AI that acts | Multiple agents collaborating |
Scope | Solves one complex task end-to-end | Solves a large workflow as a team |
Example | Loan eligibility agent | Full digital lending agents for KYC → Eligibility → Decision → Agreement |
Strength | Completes tasks independently | Division of labor, specialization |
🚀 How to Explain in an Enterprise Architecture Interview
Sample Answer:
“Agentic AI gives us digital workers that can autonomously take actions — not just generate text. Multi-agent architecture extends this by introducing a set of specialized agents (Document Classifier, Extraction Agent, Compliance Agent, Decision Agent) orchestrated through an agent framework like LangGraph, AutoGen, or Spring AI Agents. This design fits naturally into large BFSI workflows where responsibilities are distributed. It also enforces governance, observability, and performance isolation — important for regulated industries. In my architecture, agents interact through events (Kafka) and use shared memory (vector DB + metadata store) to maintain context. This ensures transparency, auditability, and consistency across decision steps.”
⭐ What is an “I don’t know” loop in AI agents?
An “I don’t know” loop happens when an AI agent repeatedly returns uncertainty responses instead of progressing, due to missing context, failed retrieval, or wrong state transitions.
✅ Definition (Very Clear)
An “I don’t know” loop is when the LLM or agent keeps responding with variations of:
“I’m not sure.”
“I don’t know the answer.”
“I don’t have enough information.”
“Please provide more details.”
…but it repeats this in a cycle, because the orchestrator triggers the agent again without fixing the root cause.
✅ Why it happens
RAG retrieval returned no relevant context→ Similarity scores too low→ The agent has nothing to reason with
Wrong agent/tool selected→ Agent can’t perform the task, so it returns “I don’t know”
Prompt missing required instructions→ Model stays uncertain
Temperature too high→ Random variations of “I’m not sure…”
Agent state machine stuck→ Orchestrator keeps re-calling the same agent
Guardrails instruct model to be conservative→ It keeps refusing rather than solving
🚫 How the loop looks in logs
Example log pattern:
Agent → Query: “Extract policy clause”
RAG → Returned 0 chunks (min similarity < 0.7)
LLM → “I don’t know, I do not have enough context.”
Orchestrator → Retry same agent
LLM → “I don’t know the answer.”
Orchestrator → Retry…
LLM → “I don’t have enough information.”
This is the I-don’t-know loop.
🔧 How to fix it
If similarity < threshold → return fail state instead of retry
Add fallback agent (e.g., “Missing Context Analyzer”)
Improve prompt → add required context or guidance
Stop agents after X retries
Add telemetry to detect repeated uncertain responses
Lower temperature to reduce randomness
⭐ One-line summary
An “I don’t know” loop happens when the agent repeatedly expresses uncertainty because retrieval, prompt, or state logic is broken — and the orchestrator keeps calling the same agent instead of resolving the root cause.
Example of an “I Don’t Know” Loop
User: Why did the API call return 404?Model: I’m not sure.User: The logs are above.Model: I don’t know.User: Path is /customers/123.Model: I’m not sure.User: Look at the payload.Model: I don’t have enough information.
Here, the model gets stuck recycling uncertainty rather than reasoning.
Why We Track This in Hallucination Testing
“I don’t know” loops indicate:
Model uncertainty → Not reasoning deeply
Context loss → RAG or memory chain issues
Safety filter stuck in over-trigger mode
Breaking of agent tool-calling chain
Failure to maintain conversation grounding
In observability dashboards, you track them under:
➤ Uncertainty patterns
➤ Repetitive refusal segments
➤ Low-confidence response cluster
How to Detect “I Don’t Know” Loops (Telemetry + Observability)
You monitor:
A. Token-level repetition pattern
Repeated substrings like:
“I don’t know”
“I’m not sure”
“I cannot determine”
“I don’t have enough info”
B. Confidence scores
Low logit confidence clusters across turns.
C. Context window failures
Too many “missing context” responses indicate:
vector retrieval failure
wrong RAG pipeline
chunk mismatch
context is being truncated
D. Safety system triggers
The agent might be stuck in a safety fallback block.
E. Agent execution traces
Look at:
tool call failures
empty tool results
exceptions in chain
timeout on retrieval layer
✅ 4. How to Fix “I Don’t Know” Loops
1. Fix Context Loss
Reduce chunk size
Improve chunk overlap
Upgrade retrieval strategy (MMR, hybrid search, semantic search)
Increase context window
Validate tool outputs
2. Fix Safety Misfiring
Adjust guardrail rules
Add structured exceptions
Create allow-list for safe enterprise terms
3. Fix Prompt Architecture
Add explicit fallback rules:
Bad:“I don’t know.”
Good (New Rule):“If context is missing, explicitly request information from user or tool-call again, do NOT loop.”
4. Add Confidence-Based Routing
If logit score < threshold:→ call RAG again→ call secondary model→ escalate to human fallback
5. Add Observability Alerts
Trigger alerts when:
3 repetitive refusals
2 missing context errors
2 empty embeddings
“I don’t know loops” are repetitive uncertainty patterns that occur when the model is stuck in a guardrail fallback, loses context, or receives empty tool-chain outputs.I monitor them using telemetry — token repetition, context window checks, tool-call logs, and confidence scoring.I eliminate them with improved prompt architecture, better retrieval strategies, guardrail tuning, and fallback logic.This ensures the agent is reliable, predictable, and safe in production.”
Failure Taxonomy for LLM Agents
(Hallucination vs Refusal vs Drift vs Looping — clear definitions + examples)
1. Hallucination
Definition:Model produces confident but incorrect information not grounded in retrieved data or prompt.
Symptoms:
Fabricated RBI rules, customer names, policy clauses
Overconfident tone
Missing citations
Root Causes:
Weak retrieval grounding
Poor prompt constraints
Low-quality embeddings
2. Refusal Failure
Definition:Model declines to answer even when allowed or expected.
Symptoms:
“I cannot help with that.”
“This seems unsafe.”
Over-triggering internal safety rails
Root Causes:
Safety alignment overly strict
Prompt phrasing ambiguous
Incorrect classification of task as unsafe
3. Context Drift
Definition:Model gradually deviates from the original user goal due to deteriorating context over multiple turns.
Symptoms:
Topic slowly shifts
Incorrect memory carried forward
Agents responding based on earlier context, not latest
Root Causes:
Insufficient context rehydration
Old messages overweighted
Wrong retrieval chunks
4. Looping (a.k.a. “I don’t know” Loop)
Definition:Agent repeats the same uncertainty statement or fallback logic in a cycle.
Symptoms:
“I’m not sure, let me check…” → retrieves nothing → “I’m still not sure…”
Repetitive tool calls
Infinite chains in LangGraph/LangChain
Root Causes:
Similarity < threshold → retrieval fails
No fallback policy
Faulty planner/orchestrator
4. Architecture: How to Prevent Loops in Multi-Agent Systems
(Event-driven + Orchestrator-controlled)
🔹 High-Level Components
User Proxy Agent→ Accepts query, normalizes intent.
Planner / Orchestrator Agent→ Decides which agent to call next.→ Applies loop-detection & halt rules.
Domain-Specific Agents (Fraud, Credit, Policy, Data)→ Perform tasks with tools (SQL, APIs, RAG).
Retrieval Layer (Vector DB + similarity threshold)→ Sets: min similarity, max results, confidence bands.
Loop Detection & Guardrails Engine→ Hard stop after N attempts.→ Tracks:
previous tool calls
previous final answers
repeated uncertainty patterns
Feedback Channel → Planner→ If retrieval fails twice → escalate to “Fallback agent”.
🔹 Loop Prevention Logic
1. Similarity Threshold Bands
if similarity < 0.70 → no retrieval, trigger fallback agent
if 0.70–0.80 → partial retrieval + uncertainty weighting
if > 0.80 → normal retrieval
2. Step-level Context Rehydration
Every agent call receives:
latest user question
top K retrieved chunks
last successful agent summary
error traces from previous step
Prevents drift.
3. Loop Detection
Track last 3 messages.
If pattern matches:
“not sure”,
“don’t have enough info”,
repeated tool calls with empty result…
→ Automatically terminate.
4. Orchestrator Fallback
Fallback agent options:
Ask Clarifying Question
Return Best-Effort Answer
Escalate to Human
5. Spring AI Code Snippet to Detect and Break Loops
(Practical, production-ready sample for interviews)
A. Configure Similarity Threshold
@Bean
public SearchRequest searchRequest() {
return SearchRequest.builder()
.topK(5)
.similarityThreshold(0.75) // <--- minimum similarity score
.build();
}
B. Apply Loop Detection – Custom Interceptor
@Component
public class LoopDetectionInterceptor implements ResponseInterceptor {
private final Deque<String> lastResponses = new ArrayDeque<>();
@Override
public String intercept(String response) {
lastResponses.add(response);
if (lastResponses.size() > 3) {
lastResponses.removeFirst();
}
boolean isLoop = lastResponses.stream()
.allMatch(r -> r.contains("I don't know")
|| r.contains("not sure")
|| r.contains("cannot find"));
if (isLoop) {
return "I am unable to retrieve the right information. "
+ "Let me switch to fallback logic.";
}
return response;
}
}
C. Context Rehydration in Every Step
public Prompt buildPrompt(String userMessage, List<Document> retrievedDocs, String lastSummary) {
String context = retrievedDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
String template = """
You are an enterprise agent.
Last summary:
{lastSummary}
Retrieved context:
{context}
User query:
{query}
If you lack confidence, return: "NEED_FALLBACK".
""";
return Prompt.fromTemplate(template)
.add("query", userMessage)
.add("context", context)
.add("lastSummary", lastSummary);
}
D. Fallback Handling
if (response.contains("NEED_FALLBACK")) {
return fallbackAgent.handle(query);
}
.png)

Comments