RAG Future Through a Parallax Lens

VZ editorial frame

Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.

VZ Lens

In VZ framing, the point is not novelty but decision quality under uncertainty. RAG strategy fails when viewed from a single technical angle. A parallax approach aligns product, governance, and knowledge operations in one frame. The practical edge comes from turning this into repeatable decision rhythms.

Module: PARALLAX (Multi-Perspective Research Engine) | GFIS
Date: March 9, 2026
Status: Complete — 4 streams synthesized
Evidence base: 80+ sources (academic papers, industry reports, engineering blogs, framework docs)

The Prague Window

I’m sitting on the windowsill of the coworking space. Rain streaks down the old-fashioned glass, and behind it, the Vltava flows gray. Four research streams are running in parallel on my laptop—Agentic RAG, Recursive LM, REPL convergence, future architectures. My coffee is already cold.

Outside, the bridge vanishes into the fog, and a question crystallizes within me: what happens when these lines are not just parallel, but converge? When retrieval ceases to be a passive data carrier and becomes an active participant in the thought cycle?

My finger scrolls through a line of code that no longer just queries, but asks back. This isn’t about what we can load. It’s about the system finally learning to process what it has loaded.

Stream 1: Agentic RAG
Stream 2: Recursive Language Models & REPL
Stream 3: The Agent-RAG-RLM Triangle
Stream 4: Future Architectures (2026-2028)
Cross-Stream Synthesis: The Convergence Map
Key Researchers & Labs
Evidence Gaps & Contradictions
Master Source Index

Stream 1: Agentic RAG

1.1 What Is Agentic RAG? How Does It Differ from Traditional RAG?

Traditional RAG is a five-stage stateless pipeline: user prompt —> retrieval query —> returned documents —> augmented prompt —> LLM generation. Each query is independent; no planning, no re-retrieval, no tool use [IBM RAG Primer; arXiv 2410.12837].

Agentic RAG augments this with autonomous agentic components:

Planners that decompose complex queries into sub-queries
Evaluators that assess retrieval quality and trigger re-retrieval
Tool registries enabling web search, calculators, DB queries, API calls
Persistent memory across steps and sessions
Control loops that iterate until task completion

Evidence strength: STRONG — Multiple peer-reviewed surveys (arXiv 2501.09136, arXiv 2602.03442), industry adoption documented by IBM, Microsoft, Arize, LangChain.

Pragmatic taxonomy (recommended):

Pattern	Core Mechanism	Latency	Complexity	Best For
Classic RAG	Single retrieve —> augment —> generate	Low	Low	Simple QA, customer support
Iterative RAG	Repeated retrieve+generate with loop policy	Medium	Medium	Multi-hop QA, legal research
Agentic RAG	Planner + actors + tools + memory + evaluators	High	High	Enterprise automation, multi-system workflows
Agent-with-RAG	External agent uses RAG as one bounded tool	Variable	Medium	Flexible agent architectures

1.2 Key Papers: Agents Using Retrieval as a Tool

Paper/System	Year	Core Contribution	Evidence
ReAct (Yao et al.)	2022/2023	Thought-action-observation loop with retrieval as an action	STRONG
Self-RAG (Asai et al., ICLR 2024)	2024	LLM trained to decide when to retrieve and self-critique via reflection tokens	STRONG
CRAG (Corrective RAG)	2024	Plug-and-play retrieval evaluator that triggers refinement	STRONG
Speculative RAG (Google Research)	2024	Small model drafts, large model verifies; ~12.97% accuracy gain, 51% latency reduction	STRONG
A-RAG (arXiv 2602.03442)	2025	Hierarchical retrieval interfaces scaling agentic RAG	MODERATE
AIR-RAG (Neurocomputing 2026)	2026	Adaptive iterative retrieval without retraining retriever	STRONG (peer-reviewed)
PlanRAG (arXiv 2601.19827)	2026	Plan-then-retrieve: structured plan targets retrieval	MODERATE

1.3 Multi-Step Retrieval Approaches

 Multi-Step RAG Variants
 
 Iterative RAG Self-RAG CRAG
  +-----------+     +-----------+     +-----------+
  | Retrieve  |     | LLM with  |     | Retrieve  |
  | Generate  |     | reflection|     | Evaluate  |
  | Loop until|     | tokens    |     | Re-fetch  |
  | confident |     | decides   |     | if poor   |
  +-----------+     +-----------+     +-----------+
       | | |
 v v v
  Speculative RAG    AIR-RAG PlanRAG
  +-----------+     +-----------+     +-----------+
  | Small LM  |     | Adaptive  |     | Plan first|
  | drafts    |     | iterative |     | Retrieve  |
  | Large LM  |     | refinement|     | to plan   |
  | verifies  |     | no retrain|     | Re-plan   |
  +-----------+     +-----------+     +-----------+

Key finding: Retrieval contributes ~41% of end-to-end latency and nearly doubles TTFT in production systems [arXiv 2412.11854v1]. Multi-step approaches multiply this cost but substantially improve correctness on complex queries.

1.4 When Does a RAG Pipeline Become an “Agent”?

Agenticity Checklist (synthesized from Microsoft, Arize, Towards Data Science):

A RAG system qualifies as “agentic” if it satisfies 4+ of these criteria, including items 1, 2, and 5:

Explicit planner/actor loop — can initiate multiple retrieval/generation/tool steps
Autonomous decision logic — decides to retrieve, re-retrieve, call tools without human prompts
Tool invocation APIs — can call external tools (web search, calculators, DBs)
Persistent memory/state — retains context across steps or sessions
Evidence evaluation & re-retrieval — validators that trigger corrective retrieval
Audit trail — logs queries, retrievals, tool calls, reasoning steps
Orchestration & fault tolerance — retry logic, observability, error handling

The boundary is fuzzy by design: The “RAG agent” vs “agent with RAG” distinction is functional rather than nominal. Sources use both phrasings inconsistently. The checklist above resolves the ambiguity for governance and architecture decisions.

1.5 Real-World Agentic RAG Deployments

Deployment	Domain	Architecture	Reported Results	Evidence
ALMA (AWS Bedrock)	Healthcare	Bedrock + custom RAG + SSO	98% accuracy on medical residency exam, 65% routine adoption, 98% satisfaction	MODERATE (vendor blog)
CFA Institute patterns	Finance	Vector stores (FAISS/Chroma) + SQL agents + web APIs	Internal retrieval reduces hallucination vs web retrieval	MODERATE (industry content)
Legal research agents	Legal	Agentic decomposition of statutes/cases/filings	Days-long research compressed to interactive sessions	WEAK (anecdotal)
Onyx workplace	Enterprise	Direct data indexing + agentic orchestration	High win rate on 99 workplace questions; avg ~34.7s response	MODERATE (product benchmark)

1.6 The “RAG Agent” vs “Agent with RAG” Distinction

Does it matter? Yes, for governance and architecture:

RAG Agent (agentic RAG): The RAG pipeline itself has agentic control — planner, evaluator, tool-use, memory are embedded in the retrieval-generation loop.
Agent with RAG: An external agent (AutoGPT, CrewAI, etc.) treats RAG as one bounded tool among many; the agent’s intelligence is separate from the retrieval system.

Practical implication: “RAG Agent” requires deeper integration of governance controls (audit trails, access controls) within the RAG pipeline itself. “Agent with RAG” can apply governance at the agent orchestration layer.

Stream 2: Recursive Language Models & REPL

2.1 What Are Recursive Language Models?

Definition (Zhang et al., MIT, late 2025): RLMs are an inference strategy — not a new model class — where language models recursively call themselves or other LLMs for intermediate computation. The key innovation: the full context is stored externally in a REPL-like environment, and the LLM emits code/sub-tasks to process it incrementally.

RLM Architecture (Zhang et al.)

  User Query + Context (10M+ tokens)
 |
 v
  +------------------+
  | RLM Controller   |  <-- Thin wrapper around LLM
  | (REPL Environment)|
  +------------------+
 |
    +----+----+----+
    | |    |
    v v    v
  Sub-task  Sub-task  Sub-task
  (chunk1)  (chunk2)  (chunk3)
    | |    |
    +----+----+----+
 |
 v
  Aggregate Results
 |
 v
  Final Answer

Three core primitives (confirmed across MIT paper, Google ADK implementation, practitioner reports):

Programmatic control over context via a REPL-like execution loop
Recursive delegation for task and context decomposition
Agent-mediated aggregation of partial results

State machine formalization: An RLM has transitions including CODE_EXEC, llm_query, and FINAL, with configuration parameters: max_depth, max_subcalls, max_cost, timeout_seconds.

Evidence strength: MODERATE — Single primary paper (Zhang et al., MIT) with growing implementation ecosystem (Google ADK, DSPy modules). Not yet widely replicated in peer-reviewed venues.

2.2 The REPL Paradigm Applied to LLMs

The REPL (Read-Eval-Print Loop) framing treats the LLM workflow as:

Read: Full document/context stored in a Python variable (external to LLM window)
Eval: LLM emits code to inspect, slice, filter, and analyze the context
Print: Intermediate results are aggregated
Loop: Re-enter with refined sub-tasks until answer is complete

Who proposed this? Alex Zhang and colleagues at MIT (paper published late 2025). The codebase is available as rlm_repl on GitHub. Google’s ADK community has adopted and extended the pattern.

Key claim: RLMs can process contexts of 10M+ tokens — far beyond any native context window — by decomposing the work recursively. Preliminary results show:

91.3% accuracy on multi-document retrieval tasks (GPT-5-level)
62% accuracy on LongBench-v2 CodeQA vs 22% for non-recursive baselines

Evidence strength: MODERATE — Impressive numbers from a single lab; awaiting independent replication.

2.3 How RLM/REPL Relates to Existing Paradigms

Paradigm	Relationship to RLM	Key Difference
Chain-of-Thought (CoT)	RLM can use CoT within each recursive step	CoT is linear; RLM is tree/graph-structured
Tree-of-Thought (ToT)	RLM naturally implements ToT via recursive branching	ToT explores thought units; RLM decomposes context
Graph-of-Thought (GoT)	RLM sub-tasks can form dependency graphs	GoT models thought dependencies; RLM models context dependencies
Self-reflection / Reflexion	RLM supports generate-critique-refine within loops	Reflexion is about output quality; RLM is about context management
Iterative RAG	RLM + RAG = recursive retrieval-generation cycles	Iterative RAG retrieves from external stores; RLM processes what’s already loaded

Critical distinction: CoT/ToT/GoT are reasoning strategies about how to think. RLM is a context management strategy about how to handle unbounded input. They are complementary, not competing.

2.4 RLM + RAG Integration

Recursive retrieval-generation cycles combine the best of both:

Query --> RLM decomposes into sub-queries
 |
 +--> Sub-query 1 --> RAG retrieval --> Generation
 |
 +--> Sub-query 2 --> RAG retrieval --> Generation 
 |
 +--> Sub-query 3 --> RAG retrieval --> Generation
          |
 v
 RLM aggregates sub-results
 |
 v
 Final synthesized answer

Evidence from multi-step retrieval benchmarks:

Multi-step retrieval shows >50% improvement over single-step on defined end-to-end evaluation tasks [FRAMES benchmark]
RT-RAG (hierarchical tree decomposition) achieves +7.0% F1 and +6.0% EM over SOTA on multi-hop QA benchmarks (MuSiQue, 2WikiMQA, HotpotQA)

2.5 Does Recursive Prompting Improve RAG Quality?

Yes, with caveats.

Benchmark	Improvement	Method	Evidence
FRAMES (end-to-end RAG)	0.408 —> 0.66 accuracy	Multi-step reasoning	STRONG
MuSiQue/2WikiMQA/HotpotQA	+7% F1, +6% EM	RT-RAG hierarchical decomposition	STRONG
Game of 24 (ToT)	4% —> 74% success	Tree-of-Thought vs CoT	STRONG
TruthfulQA (Reflexion)	Significant gains for smaller models	Generate-critique-refine loops	STRONG
LongBench-v2 CodeQA	22% —> 62%	RLM recursive processing	MODERATE

Caveats: Latency increases substantially with each recursive step. LLM generation latency often dominates total response time — retrieval speedups of ~100ms can be invisible if LLM response dominates.

2.6 The “Reasoning Loop” Models and RAG

Model Family	Reasoning Mechanism	RAG Relationship
DeepSeek R1	671B params, 37B active, 128K context (YaRN), MLA + MoE	Long-context architecture reduces need for retrieval; strong multi-hop performance natively
OpenAI o1/o3/o4-mini	RL-trained reasoning with CoT; tool use (web, Python, image gen)	Reasoning + tool use enables agentic RAG patterns within the model API itself
Claude Extended Thinking	128K internal token window for reasoning; “think” tool	Extended reasoning budget improves factuality; can invoke tools/retrieval within thinking
LoopLM (Ouro)	Recurrent transformer stack reuse (4 iterations)	1.4B model matches 12B SOTA on select benchmarks; recurrence substitutes for scale

Key insight: These reasoning models are making RAG both more powerful and less necessary simultaneously. Long context windows (128K-2M tokens) mean more knowledge can be preloaded (CAG pattern), while reasoning capabilities mean the model can better judge when retrieval IS needed and what to do with results.

Evidence strength: STRONG for DeepSeek R1 and OpenAI o-series benchmarks. MODERATE for Claude extended thinking (fewer public benchmarks). MODERATE for LoopLM (single-lab results).

Stream 3: The Agent-RAG-RLM Triangle

3.1 How Modern AI Agent Frameworks Use RAG

Framework	RAG Integration	Memory Model	Architecture
LangGraph	Deep — retrieval as tool nodes in state graphs; generate_or_query_or_respond decision nodes	State-based memory with checkpointing and persistence	Graph-based workflow orchestration
CrewAI	Built-in RAG tools; role-based memory with RAG	Structured, role-based memory with RAG	Role-playing multi-agent crews
AutoGen/AG2	RAG through tool registration; multi-turn retrieval	Conversational history storage	Multi-agent conversation framework
OpenAI Agents SDK	Built-in vector store tools (file_search); web search	Thread-based context persistence	Production-ready single-agent framework
LlamaIndex Agents	Native RAG — grounded, reliable retrieval-first design	Index-based retrieval with reranking	RAG-centric agent architecture

Key observation (DataCamp comparison, 2026): “LangGraph provides state-based memory with checkpointing and persistence. CrewAI uses structured, role-based memory with RAG, while AutoGen stores conversational history.” Each framework makes fundamentally different architectural choices about where RAG sits relative to the agent.

Evidence strength: STRONG — Well-documented in framework docs, extensive practitioner reporting, multiple comparison guides.

3.2 The Memory Problem in Agents

The memory taxonomy mirrors human cognition [arXiv 2512.13564, “Memory in the Age of AI Agents”]:

Agent Memory Architecture
+-------------------------------------------------------+
| WORKING MEMORY |
|  (Context window: 200K-2M tokens) |
|  Current conversation + active reasoning |
+-------------------------------------------------------+
         | | |
 v v v
+----------------+  +------------------+  +------------------+
| EPISODIC |  | SEMANTIC |  | PROCEDURAL |
| MEMORY |  | MEMORY |  | MEMORY |
| "What happened"|  | "What I know"    |  | "How to do it"   |
| Interaction    |  | Domain knowledge |  | Skills, workflows|
| history, |  | concepts, facts  |  | learned patterns |
| past sessions  |  | (RAG/vector DB)  |  | |
+----------------+  +------------------+  +------------------+

Critical 2026 insight (VentureBeat, Oracle): “Large context windows (200K-400K tokens in Claude Opus 4.5, GPT-5.2, up to 2M in Gemini 3 Pro) have NOT solved agent memory. Injecting full conversation history into every API call creates unsustainable cost and latency. Context windows are working memory — they’re not long-term storage.”

The spectrum is shifting: Traditional RAG —> Agentic RAG —> Full Memory Systems. VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026.

Key products in the memory space (2026):

Mem0: Memory-as-a-service for agents
Letta/MemGPT: Stateful memory server with explicit editable memory blocks
Cognee: Memory as pipeline (ingestion —> structuring —> recall)
Amazon Bedrock AgentCore Memory: Managed extraction, consolidation, retrieval
Graphiti (Zep/Neo4j): Knowledge graph memory for temporal agent state

Unsolved tension (Oracle/GDPR): GDPR right-to-be-forgotten requires data deletion, but EU AI Act (August 2026) requires 10-year audit trails for high-risk systems. This creates an architectural paradox for agent memory systems.

Evidence strength: STRONG — ICLR 2026 workshop proposal on MemAgents, comprehensive survey (arXiv 2512.13564), multiple production implementations.

3.3 Knowledge Graphs + RAG + Agents

GraphRAG (Microsoft Research, 2024):

Creates entity-centric knowledge graphs from input corpus
LLMs precompute community summaries
Dramatically improves reasoning over relationship-rich queries
Enables queries that require traversing relationships across data types

LightRAG (EMNLP 2025):

Dual retriever system: local retriever for entity-level questions, global retriever for complex subgraph reasoning
Lightweight, fast, suitable for production
Enhanced extraction accuracy for open-source LLMs (Qwen3-30B-A3B)

Graphiti (Neo4j/Zep):

Temporal knowledge graph memory for agents
Unlike GraphRAG’s static community summaries, Graphiti handles evolving, temporal data
Designed for “agentic world” where memory must update in real-time

Agentic-KGR (OpenReview): Co-evolutionary knowledge graph construction through multi-agent reinforcement learning. When integrated with GraphRAG, achieves superior QA performance with gains in both accuracy and knowledge coverage.

Knowledge Graph + RAG + Agents Integration

  Agent Layer (Planning, Reasoning, Tool Use)
 | |
 v v
  GraphRAG Layer Vector RAG Layer
  (Entity-Relationship  (Semantic Similarity
   Traversal) Search)
 | |
 v v
  Knowledge Graph Vector Database
  (Neo4j, etc.) (Qdrant, Pinecone)
 | |
 +-------- + ---------+
 |
 v
 Unified Knowledge
 Representation

Evidence strength: STRONG for GraphRAG (Microsoft Research paper + open-source). STRONG for LightRAG (EMNLP 2025). MODERATE for Graphiti and agentic KG construction.

3.4 Multi-Agent Systems with Different Knowledge Bases

Emerging pattern: Specialized agents with specialized retrieval.

The Multi-Agent RAG Framework for Entity Resolution (MDPI, 2025) demonstrates:

Modular coordination with specialized agents (direct matching, indirect matching, household clustering)
Each agent writes to logically disjoint sections of shared state
Orchestrator deterministically merges results from parallel branches
LangGraph-based unified orchestration

Oracle A2A Protocol + LangChain (2025): Scalable multi-agent RAG system where agents communicate via the Agent-to-Agent (A2A) protocol, each with different knowledge bases and capabilities.

The pattern: Different agents for different knowledge domains, connected through:

Shared state graphs (LangGraph)
Message-passing protocols (A2A, MCP)
Hierarchical orchestration (planner agent delegates to specialist agents)

Evidence strength: MODERATE — Documented in framework guides and early academic papers; few large-scale production case studies published.

3.5 The Convergence Thesis

Are RAG, agents, and RLM converging into a single architecture?

Evidence FOR convergence:

NStarX thesis (2026-2030): “RAG will undergo a fundamental architectural shift — from a retrieval pipeline bolted onto LLMs to an autonomous knowledge runtime that orchestrates retrieval, reasoning, verification, and governance as unified operations.”
Glean’s emerging agent stack (2026): Context engineering is the unifying discipline — “the delicate art and science of filling the context window with just the right information for the next step” (Andrej Karpathy). RAG, agent memory, and reasoning all serve this same purpose.
The agentic taxonomy (arXiv 2601.12560): Unified taxonomy breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration — with RAG as one tool within the Action layer and reasoning (including RLM-style recursion) within the Brain layer.
RLM in Google ADK (2026): Recursive Language Models are being implemented directly within agentic frameworks, treating context management as an agent capability.

Evidence AGAINST full convergence (or at least for sustained specialization):

The architecture fork (UCStrategies, 2026): “Standard RAG is dead” — the field is splitting into CAG (for static/small knowledge bases) and Agentic RAG (for complex reasoning), not converging into one.
Latency-accuracy tradeoff: Simple tasks don’t need agent overhead. CAG completes in 2.33s vs RAG’s 94.35s. One-size-fits-all is anti-pattern.
Framework fragmentation: LangGraph, CrewAI, AutoGen, OpenAI SDK all make fundamentally different architectural choices. No single winning pattern has emerged.

PARALLAX assessment: The convergence is happening at the conceptual level (unified knowledge runtime) but NOT at the implementation level (no single architecture dominates). The field is converging on a shared understanding that retrieval, reasoning, and agent control are interconnected concerns, while diverging on how to implement that understanding.

The Convergence Map (Conceptual)

 REASONING
 (RLM/REPL,
 CoT/ToT/GoT,
 Extended Thinking)
 /\
 /  \
 /    \
 /  The \
 / Knowledge\
 /  Runtime   \
 / \
     /________________\
  RETRIEVAL AGENCY
  (RAG, GraphRAG,    (Planners, Tool Use,
   Vector Search,     Memory, Multi-Agent,
   Hybrid Index) Orchestration)

Each vertex is pulling toward the center:
- Retrieval is becoming agentic (self-RAG, CRAG)
- Agents are becoming retrieval-aware (memory systems)
- Reasoning is becoming recursive (RLM, extended thinking)

Stream 4: Future Architectures (2026-2028)

4.1 Next-Gen RAG Architectures Being Proposed

Architecture	Description	Maturity	Evidence
Agentic RAG	Planner + evaluator + tools + memory within RAG loop	Production (early)	STRONG
Cache-Augmented Generation (CAG)	Preload entire knowledge base into extended context window; eliminate retrieval	Production	STRONG
GraphRAG	Knowledge graph + community summaries + vector retrieval	Production	STRONG
Adaptive/Router RAG	Dynamically selects retrieval strategy per query complexity	Production (early)	MODERATE
RLM-RAG	Recursive decomposition of retrieval tasks via REPL	Experimental	MODERATE
Federated RAG	Distributed knowledge bases with privacy-preserving retrieval	Research	MODERATE
Multimodal RAG	Images, video, audio retrieval in unified embedding space	Production (early)	STRONG
Personalized RAG	User-specific retrieval, preference-aware reranking/generation	Research/Early	MODERATE

4.2 Hybrid Retrieval: Sparse + Dense + KG + Structured

The production consensus for 2026: no single retrieval method suffices.

Production systems routinely maintain multiple knowledge representations:

Vector embeddings for semantic search (dense retrieval)
BM25/TF-IDF for keyword matching (sparse retrieval)
Knowledge graphs for relationship reasoning (graph traversal)
Hierarchical indexes for categorical navigation
Structured data (SQL, APIs) for factual lookups

The “10 Types of RAG” in 2026 (multiple sources confirm this diversification):

Naive/Standard RAG
Advanced RAG (with reranking, query rewriting)
Modular RAG (composable pipeline components)
Graph RAG (knowledge graph-augmented)
Agentic RAG (agent-orchestrated)
Adaptive/Router RAG (dynamic strategy selection)
Corrective RAG (CRAG pattern)
Self-RAG (model-internal retrieval decisions)
Speculative RAG (draft/verify pattern)
Multimodal RAG (cross-modal retrieval)

4.3 Personalized RAG

Survey finding (arXiv 2504.10147, “A Survey of Personalization: From RAG to Agent”):

Personalization spans three RAG stages:

Pre-retrieval: User-specific query expansion and reformulation
Retrieval: Personalized reranking based on user history and preferences
Generation: Adapting output style, depth, and focus to individual users

Key systems:

PersonaRAG: User-centric agents in the retrieval process [CEUR-WS 2024]
PGraphRAG: User-centric knowledge graphs for personalized retrieval [Au et al., 2025]
ARAG (arXiv 2506.21931): Agentic RAG for personalized recommendation — separates user understanding, semantic alignment, context synthesis, and ranking into specialized agents

Evidence strength: MODERATE — Active research area with growing paper count but few production deployments documented.

4.4 Multimodal RAG

The state of multimodal RAG in 2026:

Unified embedding spaces: CLIP, BLIP2, and custom dual-/multi-tower transformers encode text, images, and audio into shared vector spaces
Query planning modules: Classify retrieval need (text, image, audio, composite), decompose multi-hop queries, dynamically re-route
Production adoption: IBM, NVIDIA, multiple startups offering multimodal RAG platforms

Key challenge: Cross-modal alignment quality varies significantly. Text-to-image retrieval is mature; audio and video retrieval remain less reliable.

Evidence strength: STRONG for text+image (CLIP/BLIP2 ecosystem). MODERATE for video/audio (NVIDIA blog, early research).

4.5 Federated RAG: Distributed, Privacy-Preserving

Systematic mapping (arXiv 2505.18906): 18 primary studies identified (2020-2025) addressing federated RAG.

Key approaches:

HyFedRAG: Privacy-preserving + heterogeneous data; anonymization via Presidio masking, Eraser4RAG, TenSEAL encryption [arXiv 2509.06444]
Dual Federated RAG (DF-RAG): Separately federates retrieval and generation components
D-RAG: Blockchain-based decentralized RAG with privacy-preserving consensus protocol
Privacy-Preserving Federated Embedding Learning: Collaborative training of client-side RAG retrieval models with parameter aggregation on central server [arXiv 2504.19101]

Enterprise driver: EU AI Act + GDPR create dual pressures for both data privacy and auditability that federated approaches naturally address.

Evidence strength: MODERATE — Growing body of academic work but minimal production deployment evidence.

4.6 “RAG 3.0” / “Post-RAG” — What Comes After?

Three competing visions:

Vision 1: CAG replaces RAG for static workloads (UCStrategies, 2026)

Context windows expand to 10M+ tokens
CAG preloads entire knowledge bases, eliminating retrieval overhead
40.5x speed improvement over standard RAG on benchmarks
Prediction: Standard RAG dies; CAG handles static, Agentic RAG handles dynamic

Vision 2: The Knowledge Runtime (NStarX, 2026-2030)

RAG evolves from “retrieval pipeline bolted onto LLMs” to “autonomous knowledge runtime”
Orchestrates retrieval, reasoning, verification, access control, and audit trails
Analogous to Kubernetes for information flow
Driven by: EU AI Act compliance, institutional knowledge loss (retirement crisis), economic need for verifiable truth

Vision 3: Memory supersedes RAG (VentureBeat, Oracle, 2026)

“Contextual memory will surpass RAG for agentic AI in 2026”
RAG retrieves documents; Memory understands context
The winners will do both, but memory is the differentiator
Shift: RAG —> Agentic RAG —> Full Memory Systems

PARALLAX assessment: These visions are complementary, not competing. CAG handles the “known knowledge” tier. Agentic RAG handles “dynamic discovery.” Memory systems handle “learned experience.” The knowledge runtime is the orchestration layer that decides which to invoke.

Cross-Stream Synthesis

The Convergence Map

 2024 2026 2028
 | | |
  RETRIEVAL:    Basic RAG ---------> Agentic RAG + GraphRAG --> Knowledge Runtime
 + CAG fork + Federated RAG
                                                               + Multimodal
 | | |
  REASONING:    CoT + ReAct -------> Extended Thinking -------> RLM + Reasoning
 o1/o3/R1/Claude as native agent
 capability
 | | |
  AGENCY: Single agents -----> Multi-agent + Memory ----> Autonomous
 (AutoGPT v1) (CrewAI, LangGraph) Knowledge Agents
 + GraphRAG Memory + Self-improving
 memory
 | | |
  CONVERGENCE:  Separate ----------> Shared concepts ----------> Unified
 concerns (context engineering) Knowledge Runtime

Key Cross-Stream Connections

RLM enables better Agentic RAG: Recursive decomposition lets agents handle arbitrarily complex multi-hop queries by breaking them into manageable retrieval sub-tasks.
Agent memory IS the evolution of RAG: Episodic + semantic + procedural memory is RAG generalized. RAG provides semantic memory; the agent adds episodic and procedural layers.
Reasoning models reduce RAG dependence for some tasks: With 128K-2M token windows and strong reasoning, models can preload more context (CAG) and reason better over what they retrieve (fewer retrieval steps needed).
Knowledge graphs bridge all three: GraphRAG serves retrieval (structured search), reasoning (relationship traversal), and agency (dynamic memory updates via Graphiti).
The “context engineering” thesis unifies everything: All three streams are ultimately about one problem — putting the right information in the right format at the right time into the model’s context window.

Evidence Strength Summary by Stream

Stream	Overall Evidence	Strongest Area	Weakest Area
Agentic RAG	STRONG	Multi-step retrieval benchmarks	Enterprise deployment cost data
RLM/REPL	MODERATE	Conceptual framework, early benchmarks	Independent replication, production evidence
Agent-RAG-RLM Triangle	MODERATE-STRONG	Framework comparisons, memory taxonomy	Multi-agent heterogeneous RAG in production
Future Architectures	MODERATE	Multimodal RAG, hybrid retrieval	Federated RAG in production, personalized RAG

Key Researchers & Labs

Agentic RAG

Shunyu Yao (Princeton/OpenAI) — ReAct framework
Akari Asai (UW/Meta) — Self-RAG (ICLR 2024)
Microsoft Research — GraphRAG, A-RAG
Google Research — Speculative RAG
LangChain/LangGraph team — Production agentic RAG patterns
Arize AI — Agentic RAG evaluation and observability

RLM/REPL

Alex Zhang et al. (MIT) — Recursive Language Models paper (late 2025)
LoopLM/Ouro researchers — Recurrent depth substituting for scale
DeepSeek — R1 long-context reasoning architecture
OpenAI — o1/o3/o4-mini reasoning models
Anthropic — Claude extended thinking, “think” tool

Knowledge Graphs + RAG

Microsoft Research — GraphRAG
HKUDS — LightRAG (EMNLP 2025)
Neo4j/Zep — Graphiti temporal knowledge graph
DEEP-PolyU — Awesome-GraphRAG curation

Agent Memory

Guibin Zhang et al. — “Memory in the Age of AI Agents” (arXiv 2512.13564)
Letta/MemGPT team — Stateful memory architecture
Mem0 — Memory-as-a-service
AWS Bedrock team — AgentCore Memory
Cognee — Memory pipeline architecture

Future Architectures

NStarX — Knowledge runtime thesis
Glean — Emerging agent stack/context engineering
Various federated RAG teams (HyFedRAG, D-RAG, FairRAG)
PersonaRAG, PGraphRAG, ARAG teams — Personalized RAG

Evidence Gaps & Contradictions

Gaps (Where Evidence Is Thin)

Production cost data for agentic RAG: No standardized, public cost-vs-accuracy comparisons across large-scale deployments. Most data is vendor-selected metrics.
RLM independent replication: Zhang et al.’s results are impressive but from a single lab. Awaiting independent benchmarking and replication.
Cross-model RLM+RAG comparisons: No unified benchmark comparing RLM+RAG across model sizes (small/medium/large) and retrieval modalities (dense vs sparse vs graph) under latency constraints.
Federated RAG in production: Active academic research but near-zero documented production deployments.
Personalized RAG at scale: Growing paper count but few production case studies with measurable outcomes.
Long-horizon robustness: No longitudinal studies of model drift and retrieval decay in recursive/agentic RAG pipelines over time.
Safety under recursive prompting: Attribution pipelines and filtering are recommended but rigorous, reproducible studies of hallucination reduction across tasks and scales are sparse.

Contradictions

“RAG is dead” vs “RAG is evolving”: Some sources claim standard RAG is obsolete (UCStrategies), while others frame the same developments as RAG’s natural evolution (NStarX, RAGFlow). Resolution: standard/naive RAG is indeed being replaced, but the RAG concept broadens rather than disappears.
Context windows solve everything vs context windows solve nothing: Some argue expanding windows (2M+ tokens) eliminate RAG need; others argue windows are working memory, not storage. Resolution: both are right for different workload sizes. CAG works for <1M token corpora; beyond that, retrieval remains necessary.
Convergence vs specialization: NStarX and Glean argue for unified knowledge runtimes; UCStrategies argues for a fork (CAG vs Agentic RAG). Resolution: convergence at the conceptual/orchestration layer; specialization at the implementation layer.
Memory replaces RAG vs memory extends RAG: VentureBeat claims memory “surpasses” RAG; Oracle says “memory extends RAG.” Resolution: semantic memory IS RAG; episodic and procedural memory extend beyond RAG. The taxonomy matters.

Master Source Index

Peer-Reviewed / Strong Evidence

Self-RAG (ICLR 2024) — https://github.com/AkariAsai/self-rag
LightRAG (EMNLP 2025) — https://github.com/HKUDS/LightRAG
AIR-RAG (Neurocomputing 2026) — Adaptive iterative retrieval
ReAct (Yao et al., 2022) — https://arxiv.org/abs/2210.03629
Reflexion (OpenReview) — https://openreview.net/forum?id=FDG2G7JDWO
Memory in the Age of AI Agents (arXiv 2512.13564) — https://arxiv.org/abs/2512.13564
Agentic AI Architectures/Taxonomies (arXiv 2601.12560) — https://arxiv.org/html/2601.12560v1
A Survey of Personalization: From RAG to Agent (arXiv 2504.10147) — https://arxiv.org/html/2504.10147v1
Federated RAG Systematic Mapping (arXiv 2505.18906) — https://arxiv.org/abs/2505.18906
HyFedRAG (arXiv 2509.06444) — https://arxiv.org/abs/2509.06444

Preprints / Moderate Evidence

RLM (Zhang et al., MIT, late 2025) — https://alexzhang13.github.io/blog/2025/rlm/
A-RAG (arXiv 2602.03442) — https://arxiv.org/abs/2602.03442
Agentic RAG Survey (arXiv 2501.09136) — https://arxiv.org/abs/2501.09136
Speculative RAG (Google, arXiv 2407.08223) — https://arxiv.org/pdf/2407.08223
PlanRAG (arXiv 2601.19827) — https://arxiv.org/html/2601.19827v1
CRAG — https://github.com/HuskyInSalt/CRAG
RAG Latency Analysis (arXiv 2412.11854) — https://arxiv.org/html/2412.11854v1
DeepSeek-R1 — https://huggingface.co/deepseek-ai/DeepSeek-R1
LoopLM (arXiv 2510.25741v2) — https://arxiv.org/html/2510.25741v2
ARAG for Personalized Recommendation (arXiv 2506.21931) — https://arxiv.org/html/2506.21931v1
Agentic-KGR (OpenReview) — https://openreview.net/forum?id=7qQ50LrRn5
Multi-Agent RAG for Entity Resolution (MDPI) — https://www.mdpi.com/2073-431X/14/12/525

Industry / Engineering Sources

IBM RAG Primer — https://www.ibm.com/think/topics/retrieval-augmented-generation
IBM Agentic RAG — https://www.ibm.com/think/topics/agentic-rag
Microsoft GraphRAG — https://microsoft.github.io/graphrag/
Microsoft AI Agents for Beginners (Agentic RAG) — https://microsoft.github.io/ai-agents-for-beginners/05-agentic-rag/
Arize Understanding Agentic RAG — https://arize.com/blog/understanding-agentic-rag/
LangGraph Agentic RAG Docs — https://docs.langchain.com/oss/python/langgraph/agentic-rag
Google Speculative RAG Blog — https://research.google/blog/speculative-rag-enhancing-retrieval-augmented-generation-through-drafting/
OpenAI o3/o4-mini — https://openai.com/index/introducing-o3-and-o4-mini/
Glean Emerging Agent Stack 2026 — https://www.glean.com/blog/emerging-agent-stack-2026
NStarX Next Frontier of RAG (2026-2030) — https://nstarxinc.com/blog/the-next-frontier-of-rag-how-enterprise-knowledge-systems-will-evolve-2026-2030/
UCStrategies Standard RAG Is Dead — https://ucstrategies.com/news/standard-rag-is-dead-why-ai-architecture-split-in-2026/
VentureBeat 6 Data Predictions 2026 — https://venturebeat.com/data/six-data-shifts-that-will-shape-enterprise-ai-in-2026
Oracle Agent Memory — https://blogs.oracle.com/developers/agent-memory-why-your-ai-has-amnesia-and-how-to-fix-it
Neo4j Graphiti — https://neo4j.com/blog/developer/graphiti-knowledge-graph-memory/
AWS ALMA Healthcare — Referenced in Tavily research
Onyx Workplace Benchmark — https://www.onyx.app/blog/benchmarking-agentic-rag-on-workplace-questions
RAGFlow 2025 Year-End Review — https://ragflow.io/blog/rag-review-2025-from-rag-to-context
NVIDIA Multimodal RAG Blog — https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation-for-video-and-audio/
RLM in Google ADK — https://discuss.google.dev/t/recursive-language-models-in-adk/323523
Awesome-GraphRAG — https://github.com/DEEP-PolyU/Awesome-GraphRAG
Agent Memory Paper List — https://github.com/Shichun-Liu/Agent-Memory-Paper-List
Multimodal RAG Survey — https://github.com/llm-lab-org/Multimodal-RAG-Survey

Governance & Risk Sources

SAS RAG Governance (2025) — https://blogs.sas.com/content/sascom/2025/11/25/the-strategic-imperative-governance-for-retrieval-augmented-generation/
ISACA AI Safety/Risk Blog (2025) — https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/safeguarding-the-future-strategies-for-protecting-generative-ai-llms-and-agentic-ai
Dataiku Governance Notes — https://www.dataiku.com/stories/blog/the-risks-and-governance-requirements-of-agentic-ai

PARALLAX module note: This research synthesizes 80+ sources across 4 independent research streams. Evidence is strongest for Agentic RAG patterns and agent memory architectures, moderate for RLM/REPL (single-lab origin), and growing rapidly for future architectures (federated, personalized, multimodal). The convergence thesis is supported at the conceptual level but implementation remains fragmented across competing frameworks and approaches.

Strategic Synthesis

Define one owner and one decision checkpoint for the next iteration.
Measure both speed and reliability so optimization does not degrade quality.
Close the loop with one retrospective and one execution adjustment.

Next step

If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.

Start with AI Scorecard Browse Hungarian originals