top of page

RAG

  • Writer: Anand Nerurkar
    Anand Nerurkar
  • Aug 29, 2025
  • 2 min read

🔹 Traditional RAG (Retrieval-Augmented Generation)

  • Definition: A pipeline where a Large Language Model (LLM) is combined with an external knowledge base (usually vector DB + embeddings).

  • Flow:

    1. User Query → Converted into an embedding.

    2. Retriever → Finds the most relevant documents/chunks from the knowledge base.

    3. Augmentation → Retrieved docs are appended to the user query.

    4. LLM → Generates the final answer using the context.

  • Purpose:

    • To overcome LLM’s knowledge cutoff.

    • To give grounded, fact-based responses (not hallucinations).

  • Limitations:

    • Only retrieves text-based documents.

    • Doesn’t learn or adapt — every query is stateless.

    • Context window limitations (too much retrieved text can overwhelm the LLM).

🔹 GenAI RAG (Next-Gen or Advanced RAG)

  • Definition: An evolved RAG approach where GenAI + multiple enhancements are added to improve retrieval and reasoning.

  • Enhancements over Traditional RAG:

    1. Multi-modal retrieval: Can fetch not just text, but also structured data (SQL, APIs, graphs, PDFs, images, audio).

    2. Agentic RAG: Uses AI agents to decide how and where to retrieve (e.g., from API, DB, knowledge graph, or vector DB).

    3. Re-ranking: Adds intelligent ranking (not just cosine similarity) — often uses cross-encoders or fine-tuned models for better relevance.

    4. Context compression: Summarizes long documents before passing them to the LLM (avoids token waste).

    5. Memory-augmented: Keeps past interactions (conversational memory), so queries aren’t stateless.

    6. Dynamic enrichment: Can trigger external tools, perform reasoning, or chain-of-thought before answering.

  • Purpose:

    • More accurate, domain-aware answers.

    • Better at handling complex enterprise scenarios (like BFSI, healthcare, legal).

    • Enables multi-agent collaboration (e.g., one agent retrieves from SQL, another from docs, another validates).

In short:

  • Traditional RAG = "Search + Stuff" → Retrieve docs → Give to LLM → Get answer.

  • GenAI RAG = "Intelligent Retrieval + Reasoning" → Adds multi-modal retrieval, agentic orchestration, re-ranking, context compression, memory, and tool usage for much smarter answers.

 
 
 

Recent Posts

See All
Best Chunking Practices

1. Chunk by Semantic Boundaries (NOT fixed size only) Split by sections, headings, paragraphs , or logical units. Avoid cutting a sentence or concept in half. Works best with docs, tech specs, policie

 
 
 
Future State Architecture

USE CASE: LARGE RETAIL BANK – DIGITAL CHANNEL MODERNIZATION 🔹 Business Context A large retail bank wants to “modernize” its digital channels (internet banking + mobile apps). Constraints: Heavy regul

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • Facebook
  • Twitter
  • LinkedIn

©2024 by AeeroTech. Proudly created with Wix.com

bottom of page