VZ editorial frame
Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.
VZ Lens
In VZ framing, the point is not novelty but decision quality under uncertainty. RAG strategy fails when viewed from a single technical angle. A parallax approach aligns product, governance, and knowledge operations in one frame. The practical edge comes from turning this into repeatable decision rhythms.
Module: PARALLAX (Multi-Perspective Research Engine) | GFIS
Date: March 9, 2026
Status: Complete — 4 streams synthesized
Evidence base: 80+ sources (academic papers, industry reports, engineering blogs, framework docs)
The Prague Window
I’m sitting on the windowsill of the coworking space. Rain streaks down the old-fashioned glass, and behind it, the Vltava flows gray. Four research streams are running in parallel on my laptop—Agentic RAG, Recursive LM, REPL convergence, future architectures. My coffee is already cold.
Outside, the bridge vanishes into the fog, and a question crystallizes within me: what happens when these lines are not just parallel, but converge? When retrieval ceases to be a passive data carrier and becomes an active participant in the thought cycle?
My finger scrolls through a line of code that no longer just queries, but asks back. This isn’t about what we can load. It’s about the system finally learning to process what it has loaded.
Table of Contents
- Stream 1: Agentic RAG
- Stream 2: Recursive Language Models & REPL
- Stream 3: The Agent-RAG-RLM Triangle
- Stream 4: Future Architectures (2026-2028)
- Cross-Stream Synthesis: The Convergence Map
- Key Researchers & Labs
- Evidence Gaps & Contradictions
- Master Source Index
Stream 1: Agentic RAG
1.1 What Is Agentic RAG? How Does It Differ from Traditional RAG?
Traditional RAG is a five-stage stateless pipeline: user prompt —> retrieval query —> returned documents —> augmented prompt —> LLM generation. Each query is independent; no planning, no re-retrieval, no tool use [IBM RAG Primer; arXiv 2410.12837].
Agentic RAG augments this with autonomous agentic components:
- Planners that decompose complex queries into sub-queries
- Evaluators that assess retrieval quality and trigger re-retrieval
- Tool registries enabling web search, calculators, DB queries, API calls
- Persistent memory across steps and sessions
- Control loops that iterate until task completion
Evidence strength: STRONG — Multiple peer-reviewed surveys (arXiv 2501.09136, arXiv 2602.03442), industry adoption documented by IBM, Microsoft, Arize, LangChain.
Pragmatic taxonomy (recommended):
| Pattern | Core Mechanism | Latency | Complexity | Best For |
|---|---|---|---|---|
| Classic RAG | Single retrieve —> augment —> generate | Low | Low | Simple QA, customer support |
| Iterative RAG | Repeated retrieve+generate with loop policy | Medium | Medium | Multi-hop QA, legal research |
| Agentic RAG | Planner + actors + tools + memory + evaluators | High | High | Enterprise automation, multi-system workflows |
| Agent-with-RAG | External agent uses RAG as one bounded tool | Variable | Medium | Flexible agent architectures |
1.2 Key Papers: Agents Using Retrieval as a Tool
| Paper/System | Year | Core Contribution | Evidence |
|---|---|---|---|
| ReAct (Yao et al.) | 2022/2023 | Thought-action-observation loop with retrieval as an action | STRONG |
| Self-RAG (Asai et al., ICLR 2024) | 2024 | LLM trained to decide when to retrieve and self-critique via reflection tokens | STRONG |
| CRAG (Corrective RAG) | 2024 | Plug-and-play retrieval evaluator that triggers refinement | STRONG |
| Speculative RAG (Google Research) | 2024 | Small model drafts, large model verifies; ~12.97% accuracy gain, 51% latency reduction | STRONG |
| A-RAG (arXiv 2602.03442) | 2025 | Hierarchical retrieval interfaces scaling agentic RAG | MODERATE |
| AIR-RAG (Neurocomputing 2026) | 2026 | Adaptive iterative retrieval without retraining retriever | STRONG (peer-reviewed) |
| PlanRAG (arXiv 2601.19827) | 2026 | Plan-then-retrieve: structured plan targets retrieval | MODERATE |
1.3 Multi-Step Retrieval Approaches
Multi-Step RAG Variants
Iterative RAG Self-RAG CRAG
+-----------+ +-----------+ +-----------+
| Retrieve | | LLM with | | Retrieve |
| Generate | | reflection| | Evaluate |
| Loop until| | tokens | | Re-fetch |
| confident | | decides | | if poor |
+-----------+ +-----------+ +-----------+
| | |
v v v
Speculative RAG AIR-RAG PlanRAG
+-----------+ +-----------+ +-----------+
| Small LM | | Adaptive | | Plan first|
| drafts | | iterative | | Retrieve |
| Large LM | | refinement| | to plan |
| verifies | | no retrain| | Re-plan |
+-----------+ +-----------+ +-----------+
Key finding: Retrieval contributes ~41% of end-to-end latency and nearly doubles TTFT in production systems [arXiv 2412.11854v1]. Multi-step approaches multiply this cost but substantially improve correctness on complex queries.
1.4 When Does a RAG Pipeline Become an “Agent”?
Agenticity Checklist (synthesized from Microsoft, Arize, Towards Data Science):
A RAG system qualifies as “agentic” if it satisfies 4+ of these criteria, including items 1, 2, and 5:
- Explicit planner/actor loop — can initiate multiple retrieval/generation/tool steps
- Autonomous decision logic — decides to retrieve, re-retrieve, call tools without human prompts
- Tool invocation APIs — can call external tools (web search, calculators, DBs)
- Persistent memory/state — retains context across steps or sessions
- Evidence evaluation & re-retrieval — validators that trigger corrective retrieval
- Audit trail — logs queries, retrievals, tool calls, reasoning steps
- Orchestration & fault tolerance — retry logic, observability, error handling
The boundary is fuzzy by design: The “RAG agent” vs “agent with RAG” distinction is functional rather than nominal. Sources use both phrasings inconsistently. The checklist above resolves the ambiguity for governance and architecture decisions.
1.5 Real-World Agentic RAG Deployments
| Deployment | Domain | Architecture | Reported Results | Evidence |
|---|---|---|---|---|
| ALMA (AWS Bedrock) | Healthcare | Bedrock + custom RAG + SSO | 98% accuracy on medical residency exam, 65% routine adoption, 98% satisfaction | MODERATE (vendor blog) |
| CFA Institute patterns | Finance | Vector stores (FAISS/Chroma) + SQL agents + web APIs | Internal retrieval reduces hallucination vs web retrieval | MODERATE (industry content) |
| Legal research agents | Legal | Agentic decomposition of statutes/cases/filings | Days-long research compressed to interactive sessions | WEAK (anecdotal) |
| Onyx workplace | Enterprise | Direct data indexing + agentic orchestration | High win rate on 99 workplace questions; avg ~34.7s response | MODERATE (product benchmark) |
1.6 The “RAG Agent” vs “Agent with RAG” Distinction
Does it matter? Yes, for governance and architecture:
- RAG Agent (agentic RAG): The RAG pipeline itself has agentic control — planner, evaluator, tool-use, memory are embedded in the retrieval-generation loop.
- Agent with RAG: An external agent (AutoGPT, CrewAI, etc.) treats RAG as one bounded tool among many; the agent’s intelligence is separate from the retrieval system.
Practical implication: “RAG Agent” requires deeper integration of governance controls (audit trails, access controls) within the RAG pipeline itself. “Agent with RAG” can apply governance at the agent orchestration layer.
Stream 2: Recursive Language Models & REPL
2.1 What Are Recursive Language Models?
Definition (Zhang et al., MIT, late 2025): RLMs are an inference strategy — not a new model class — where language models recursively call themselves or other LLMs for intermediate computation. The key innovation: the full context is stored externally in a REPL-like environment, and the LLM emits code/sub-tasks to process it incrementally.
RLM Architecture (Zhang et al.)
User Query + Context (10M+ tokens)
|
v
+------------------+
| RLM Controller | <-- Thin wrapper around LLM
| (REPL Environment)|
+------------------+
|
+----+----+----+
| | |
v v v
Sub-task Sub-task Sub-task
(chunk1) (chunk2) (chunk3)
| | |
+----+----+----+
|
v
Aggregate Results
|
v
Final Answer
Three core primitives (confirmed across MIT paper, Google ADK implementation, practitioner reports):
- Programmatic control over context via a REPL-like execution loop
- Recursive delegation for task and context decomposition
- Agent-mediated aggregation of partial results
State machine formalization: An RLM has transitions including CODE_EXEC, llm_query, and FINAL, with configuration parameters: max_depth, max_subcalls, max_cost, timeout_seconds.
Evidence strength: MODERATE — Single primary paper (Zhang et al., MIT) with growing implementation ecosystem (Google ADK, DSPy modules). Not yet widely replicated in peer-reviewed venues.
2.2 The REPL Paradigm Applied to LLMs
The REPL (Read-Eval-Print Loop) framing treats the LLM workflow as:
- Read: Full document/context stored in a Python variable (external to LLM window)
- Eval: LLM emits code to inspect, slice, filter, and analyze the context
- Print: Intermediate results are aggregated
- Loop: Re-enter with refined sub-tasks until answer is complete
Who proposed this? Alex Zhang and colleagues at MIT (paper published late 2025). The codebase is available as rlm_repl on GitHub. Google’s ADK community has adopted and extended the pattern.
Key claim: RLMs can process contexts of 10M+ tokens — far beyond any native context window — by decomposing the work recursively. Preliminary results show:
- 91.3% accuracy on multi-document retrieval tasks (GPT-5-level)
- 62% accuracy on LongBench-v2 CodeQA vs 22% for non-recursive baselines
Evidence strength: MODERATE — Impressive numbers from a single lab; awaiting independent replication.
2.3 How RLM/REPL Relates to Existing Paradigms
| Paradigm | Relationship to RLM | Key Difference |
|---|---|---|
| Chain-of-Thought (CoT) | RLM can use CoT within each recursive step | CoT is linear; RLM is tree/graph-structured |
| Tree-of-Thought (ToT) | RLM naturally implements ToT via recursive branching | ToT explores thought units; RLM decomposes context |
| Graph-of-Thought (GoT) | RLM sub-tasks can form dependency graphs | GoT models thought dependencies; RLM models context dependencies |
| Self-reflection / Reflexion | RLM supports generate-critique-refine within loops | Reflexion is about output quality; RLM is about context management |
| Iterative RAG | RLM + RAG = recursive retrieval-generation cycles | Iterative RAG retrieves from external stores; RLM processes what’s already loaded |
Critical distinction: CoT/ToT/GoT are reasoning strategies about how to think. RLM is a context management strategy about how to handle unbounded input. They are complementary, not competing.
2.4 RLM + RAG Integration
Recursive retrieval-generation cycles combine the best of both:
Query --> RLM decomposes into sub-queries
|
+--> Sub-query 1 --> RAG retrieval --> Generation
|
+--> Sub-query 2 --> RAG retrieval --> Generation
|
+--> Sub-query 3 --> RAG retrieval --> Generation
|
v
RLM aggregates sub-results
|
v
Final synthesized answer
Evidence from multi-step retrieval benchmarks:
- Multi-step retrieval shows >50% improvement over single-step on defined end-to-end evaluation tasks [FRAMES benchmark]
- RT-RAG (hierarchical tree decomposition) achieves +7.0% F1 and +6.0% EM over SOTA on multi-hop QA benchmarks (MuSiQue, 2WikiMQA, HotpotQA)
2.5 Does Recursive Prompting Improve RAG Quality?
Yes, with caveats.
| Benchmark | Improvement | Method | Evidence |
|---|---|---|---|
| FRAMES (end-to-end RAG) | 0.408 —> 0.66 accuracy | Multi-step reasoning | STRONG |
| MuSiQue/2WikiMQA/HotpotQA | +7% F1, +6% EM | RT-RAG hierarchical decomposition | STRONG |
| Game of 24 (ToT) | 4% —> 74% success | Tree-of-Thought vs CoT | STRONG |
| TruthfulQA (Reflexion) | Significant gains for smaller models | Generate-critique-refine loops | STRONG |
| LongBench-v2 CodeQA | 22% —> 62% | RLM recursive processing | MODERATE |
Caveats: Latency increases substantially with each recursive step. LLM generation latency often dominates total response time — retrieval speedups of ~100ms can be invisible if LLM response dominates.
2.6 The “Reasoning Loop” Models and RAG
| Model Family | Reasoning Mechanism | RAG Relationship |
|---|---|---|
| DeepSeek R1 | 671B params, 37B active, 128K context (YaRN), MLA + MoE | Long-context architecture reduces need for retrieval; strong multi-hop performance natively |
| OpenAI o1/o3/o4-mini | RL-trained reasoning with CoT; tool use (web, Python, image gen) | Reasoning + tool use enables agentic RAG patterns within the model API itself |
| Claude Extended Thinking | 128K internal token window for reasoning; “think” tool | Extended reasoning budget improves factuality; can invoke tools/retrieval within thinking |
| LoopLM (Ouro) | Recurrent transformer stack reuse (4 iterations) | 1.4B model matches 12B SOTA on select benchmarks; recurrence substitutes for scale |
Key insight: These reasoning models are making RAG both more powerful and less necessary simultaneously. Long context windows (128K-2M tokens) mean more knowledge can be preloaded (CAG pattern), while reasoning capabilities mean the model can better judge when retrieval IS needed and what to do with results.
Evidence strength: STRONG for DeepSeek R1 and OpenAI o-series benchmarks. MODERATE for Claude extended thinking (fewer public benchmarks). MODERATE for LoopLM (single-lab results).
Stream 3: The Agent-RAG-RLM Triangle
3.1 How Modern AI Agent Frameworks Use RAG
| Framework | RAG Integration | Memory Model | Architecture |
|---|---|---|---|
| LangGraph | Deep — retrieval as tool nodes in state graphs; generate_or_query_or_respond decision nodes | State-based memory with checkpointing and persistence | Graph-based workflow orchestration |
| CrewAI | Built-in RAG tools; role-based memory with RAG | Structured, role-based memory with RAG | Role-playing multi-agent crews |
| AutoGen/AG2 | RAG through tool registration; multi-turn retrieval | Conversational history storage | Multi-agent conversation framework |
| OpenAI Agents SDK | Built-in vector store tools (file_search); web search | Thread-based context persistence | Production-ready single-agent framework |
| LlamaIndex Agents | Native RAG — grounded, reliable retrieval-first design | Index-based retrieval with reranking | RAG-centric agent architecture |
Key observation (DataCamp comparison, 2026): “LangGraph provides state-based memory with checkpointing and persistence. CrewAI uses structured, role-based memory with RAG, while AutoGen stores conversational history.” Each framework makes fundamentally different architectural choices about where RAG sits relative to the agent.
Evidence strength: STRONG — Well-documented in framework docs, extensive practitioner reporting, multiple comparison guides.
3.2 The Memory Problem in Agents
The memory taxonomy mirrors human cognition [arXiv 2512.13564, “Memory in the Age of AI Agents”]:
Agent Memory Architecture
+-------------------------------------------------------+
| WORKING MEMORY |
| (Context window: 200K-2M tokens) |
| Current conversation + active reasoning |
+-------------------------------------------------------+
| | |
v v v
+----------------+ +------------------+ +------------------+
| EPISODIC | | SEMANTIC | | PROCEDURAL |
| MEMORY | | MEMORY | | MEMORY |
| "What happened"| | "What I know" | | "How to do it" |
| Interaction | | Domain knowledge | | Skills, workflows|
| history, | | concepts, facts | | learned patterns |
| past sessions | | (RAG/vector DB) | | |
+----------------+ +------------------+ +------------------+
Critical 2026 insight (VentureBeat, Oracle): “Large context windows (200K-400K tokens in Claude Opus 4.5, GPT-5.2, up to 2M in Gemini 3 Pro) have NOT solved agent memory. Injecting full conversation history into every API call creates unsustainable cost and latency. Context windows are working memory — they’re not long-term storage.”
The spectrum is shifting: Traditional RAG —> Agentic RAG —> Full Memory Systems. VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026.
Key products in the memory space (2026):
- Mem0: Memory-as-a-service for agents
- Letta/MemGPT: Stateful memory server with explicit editable memory blocks
- Cognee: Memory as pipeline (ingestion —> structuring —> recall)
- Amazon Bedrock AgentCore Memory: Managed extraction, consolidation, retrieval
- Graphiti (Zep/Neo4j): Knowledge graph memory for temporal agent state
Unsolved tension (Oracle/GDPR): GDPR right-to-be-forgotten requires data deletion, but EU AI Act (August 2026) requires 10-year audit trails for high-risk systems. This creates an architectural paradox for agent memory systems.
Evidence strength: STRONG — ICLR 2026 workshop proposal on MemAgents, comprehensive survey (arXiv 2512.13564), multiple production implementations.
3.3 Knowledge Graphs + RAG + Agents
GraphRAG (Microsoft Research, 2024):
- Creates entity-centric knowledge graphs from input corpus
- LLMs precompute community summaries
- Dramatically improves reasoning over relationship-rich queries
- Enables queries that require traversing relationships across data types
LightRAG (EMNLP 2025):
- Dual retriever system: local retriever for entity-level questions, global retriever for complex subgraph reasoning
- Lightweight, fast, suitable for production
- Enhanced extraction accuracy for open-source LLMs (Qwen3-30B-A3B)
Graphiti (Neo4j/Zep):
- Temporal knowledge graph memory for agents
- Unlike GraphRAG’s static community summaries, Graphiti handles evolving, temporal data
- Designed for “agentic world” where memory must update in real-time
Agentic-KGR (OpenReview): Co-evolutionary knowledge graph construction through multi-agent reinforcement learning. When integrated with GraphRAG, achieves superior QA performance with gains in both accuracy and knowledge coverage.
Knowledge Graph + RAG + Agents Integration
Agent Layer (Planning, Reasoning, Tool Use)
| |
v v
GraphRAG Layer Vector RAG Layer
(Entity-Relationship (Semantic Similarity
Traversal) Search)
| |
v v
Knowledge Graph Vector Database
(Neo4j, etc.) (Qdrant, Pinecone)
| |
+-------- + ---------+
|
v
Unified Knowledge
Representation
Evidence strength: STRONG for GraphRAG (Microsoft Research paper + open-source). STRONG for LightRAG (EMNLP 2025). MODERATE for Graphiti and agentic KG construction.
3.4 Multi-Agent Systems with Different Knowledge Bases
Emerging pattern: Specialized agents with specialized retrieval.
The Multi-Agent RAG Framework for Entity Resolution (MDPI, 2025) demonstrates:
- Modular coordination with specialized agents (direct matching, indirect matching, household clustering)
- Each agent writes to logically disjoint sections of shared state
- Orchestrator deterministically merges results from parallel branches
- LangGraph-based unified orchestration
Oracle A2A Protocol + LangChain (2025): Scalable multi-agent RAG system where agents communicate via the Agent-to-Agent (A2A) protocol, each with different knowledge bases and capabilities.
The pattern: Different agents for different knowledge domains, connected through:
- Shared state graphs (LangGraph)
- Message-passing protocols (A2A, MCP)
- Hierarchical orchestration (planner agent delegates to specialist agents)
Evidence strength: MODERATE — Documented in framework guides and early academic papers; few large-scale production case studies published.
3.5 The Convergence Thesis
Are RAG, agents, and RLM converging into a single architecture?
Evidence FOR convergence:
-
NStarX thesis (2026-2030): “RAG will undergo a fundamental architectural shift — from a retrieval pipeline bolted onto LLMs to an autonomous knowledge runtime that orchestrates retrieval, reasoning, verification, and governance as unified operations.”
-
Glean’s emerging agent stack (2026): Context engineering is the unifying discipline — “the delicate art and science of filling the context window with just the right information for the next step” (Andrej Karpathy). RAG, agent memory, and reasoning all serve this same purpose.
-
The agentic taxonomy (arXiv 2601.12560): Unified taxonomy breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration — with RAG as one tool within the Action layer and reasoning (including RLM-style recursion) within the Brain layer.
-
RLM in Google ADK (2026): Recursive Language Models are being implemented directly within agentic frameworks, treating context management as an agent capability.
Evidence AGAINST full convergence (or at least for sustained specialization):
-
The architecture fork (UCStrategies, 2026): “Standard RAG is dead” — the field is splitting into CAG (for static/small knowledge bases) and Agentic RAG (for complex reasoning), not converging into one.
-
Latency-accuracy tradeoff: Simple tasks don’t need agent overhead. CAG completes in 2.33s vs RAG’s 94.35s. One-size-fits-all is anti-pattern.
-
Framework fragmentation: LangGraph, CrewAI, AutoGen, OpenAI SDK all make fundamentally different architectural choices. No single winning pattern has emerged.
PARALLAX assessment: The convergence is happening at the conceptual level (unified knowledge runtime) but NOT at the implementation level (no single architecture dominates). The field is converging on a shared understanding that retrieval, reasoning, and agent control are interconnected concerns, while diverging on how to implement that understanding.
The Convergence Map (Conceptual)
REASONING
(RLM/REPL,
CoT/ToT/GoT,
Extended Thinking)
/\
/ \
/ \
/ The \
/ Knowledge\
/ Runtime \
/ \
/________________\
RETRIEVAL AGENCY
(RAG, GraphRAG, (Planners, Tool Use,
Vector Search, Memory, Multi-Agent,
Hybrid Index) Orchestration)
Each vertex is pulling toward the center:
- Retrieval is becoming agentic (self-RAG, CRAG)
- Agents are becoming retrieval-aware (memory systems)
- Reasoning is becoming recursive (RLM, extended thinking)
Stream 4: Future Architectures (2026-2028)
4.1 Next-Gen RAG Architectures Being Proposed
| Architecture | Description | Maturity | Evidence |
|---|---|---|---|
| Agentic RAG | Planner + evaluator + tools + memory within RAG loop | Production (early) | STRONG |
| Cache-Augmented Generation (CAG) | Preload entire knowledge base into extended context window; eliminate retrieval | Production | STRONG |
| GraphRAG | Knowledge graph + community summaries + vector retrieval | Production | STRONG |
| Adaptive/Router RAG | Dynamically selects retrieval strategy per query complexity | Production (early) | MODERATE |
| RLM-RAG | Recursive decomposition of retrieval tasks via REPL | Experimental | MODERATE |
| Federated RAG | Distributed knowledge bases with privacy-preserving retrieval | Research | MODERATE |
| Multimodal RAG | Images, video, audio retrieval in unified embedding space | Production (early) | STRONG |
| Personalized RAG | User-specific retrieval, preference-aware reranking/generation | Research/Early | MODERATE |
4.2 Hybrid Retrieval: Sparse + Dense + KG + Structured
The production consensus for 2026: no single retrieval method suffices.
Production systems routinely maintain multiple knowledge representations:
- Vector embeddings for semantic search (dense retrieval)
- BM25/TF-IDF for keyword matching (sparse retrieval)
- Knowledge graphs for relationship reasoning (graph traversal)
- Hierarchical indexes for categorical navigation
- Structured data (SQL, APIs) for factual lookups
The “10 Types of RAG” in 2026 (multiple sources confirm this diversification):
- Naive/Standard RAG
- Advanced RAG (with reranking, query rewriting)
- Modular RAG (composable pipeline components)
- Graph RAG (knowledge graph-augmented)
- Agentic RAG (agent-orchestrated)
- Adaptive/Router RAG (dynamic strategy selection)
- Corrective RAG (CRAG pattern)
- Self-RAG (model-internal retrieval decisions)
- Speculative RAG (draft/verify pattern)
- Multimodal RAG (cross-modal retrieval)
4.3 Personalized RAG
Survey finding (arXiv 2504.10147, “A Survey of Personalization: From RAG to Agent”):
Personalization spans three RAG stages:
- Pre-retrieval: User-specific query expansion and reformulation
- Retrieval: Personalized reranking based on user history and preferences
- Generation: Adapting output style, depth, and focus to individual users
Key systems:
- PersonaRAG: User-centric agents in the retrieval process [CEUR-WS 2024]
- PGraphRAG: User-centric knowledge graphs for personalized retrieval [Au et al., 2025]
- ARAG (arXiv 2506.21931): Agentic RAG for personalized recommendation — separates user understanding, semantic alignment, context synthesis, and ranking into specialized agents
Evidence strength: MODERATE — Active research area with growing paper count but few production deployments documented.
4.4 Multimodal RAG
The state of multimodal RAG in 2026:
- Unified embedding spaces: CLIP, BLIP2, and custom dual-/multi-tower transformers encode text, images, and audio into shared vector spaces
- Query planning modules: Classify retrieval need (text, image, audio, composite), decompose multi-hop queries, dynamically re-route
- Production adoption: IBM, NVIDIA, multiple startups offering multimodal RAG platforms
Key challenge: Cross-modal alignment quality varies significantly. Text-to-image retrieval is mature; audio and video retrieval remain less reliable.
Evidence strength: STRONG for text+image (CLIP/BLIP2 ecosystem). MODERATE for video/audio (NVIDIA blog, early research).
4.5 Federated RAG: Distributed, Privacy-Preserving
Systematic mapping (arXiv 2505.18906): 18 primary studies identified (2020-2025) addressing federated RAG.
Key approaches:
- HyFedRAG: Privacy-preserving + heterogeneous data; anonymization via Presidio masking, Eraser4RAG, TenSEAL encryption [arXiv 2509.06444]
- Dual Federated RAG (DF-RAG): Separately federates retrieval and generation components
- D-RAG: Blockchain-based decentralized RAG with privacy-preserving consensus protocol
- Privacy-Preserving Federated Embedding Learning: Collaborative training of client-side RAG retrieval models with parameter aggregation on central server [arXiv 2504.19101]
Enterprise driver: EU AI Act + GDPR create dual pressures for both data privacy and auditability that federated approaches naturally address.
Evidence strength: MODERATE — Growing body of academic work but minimal production deployment evidence.
4.6 “RAG 3.0” / “Post-RAG” — What Comes After?
Three competing visions:
Vision 1: CAG replaces RAG for static workloads (UCStrategies, 2026)
- Context windows expand to 10M+ tokens
- CAG preloads entire knowledge bases, eliminating retrieval overhead
- 40.5x speed improvement over standard RAG on benchmarks
- Prediction: Standard RAG dies; CAG handles static, Agentic RAG handles dynamic
Vision 2: The Knowledge Runtime (NStarX, 2026-2030)
- RAG evolves from “retrieval pipeline bolted onto LLMs” to “autonomous knowledge runtime”
- Orchestrates retrieval, reasoning, verification, access control, and audit trails
- Analogous to Kubernetes for information flow
- Driven by: EU AI Act compliance, institutional knowledge loss (retirement crisis), economic need for verifiable truth
Vision 3: Memory supersedes RAG (VentureBeat, Oracle, 2026)
- “Contextual memory will surpass RAG for agentic AI in 2026”
- RAG retrieves documents; Memory understands context
- The winners will do both, but memory is the differentiator
- Shift: RAG —> Agentic RAG —> Full Memory Systems
PARALLAX assessment: These visions are complementary, not competing. CAG handles the “known knowledge” tier. Agentic RAG handles “dynamic discovery.” Memory systems handle “learned experience.” The knowledge runtime is the orchestration layer that decides which to invoke.
Cross-Stream Synthesis
The Convergence Map
2024 2026 2028
| | |
RETRIEVAL: Basic RAG ---------> Agentic RAG + GraphRAG --> Knowledge Runtime
+ CAG fork + Federated RAG
+ Multimodal
| | |
REASONING: CoT + ReAct -------> Extended Thinking -------> RLM + Reasoning
o1/o3/R1/Claude as native agent
capability
| | |
AGENCY: Single agents -----> Multi-agent + Memory ----> Autonomous
(AutoGPT v1) (CrewAI, LangGraph) Knowledge Agents
+ GraphRAG Memory + Self-improving
memory
| | |
CONVERGENCE: Separate ----------> Shared concepts ----------> Unified
concerns (context engineering) Knowledge Runtime
Key Cross-Stream Connections
-
RLM enables better Agentic RAG: Recursive decomposition lets agents handle arbitrarily complex multi-hop queries by breaking them into manageable retrieval sub-tasks.
-
Agent memory IS the evolution of RAG: Episodic + semantic + procedural memory is RAG generalized. RAG provides semantic memory; the agent adds episodic and procedural layers.
-
Reasoning models reduce RAG dependence for some tasks: With 128K-2M token windows and strong reasoning, models can preload more context (CAG) and reason better over what they retrieve (fewer retrieval steps needed).
-
Knowledge graphs bridge all three: GraphRAG serves retrieval (structured search), reasoning (relationship traversal), and agency (dynamic memory updates via Graphiti).
-
The “context engineering” thesis unifies everything: All three streams are ultimately about one problem — putting the right information in the right format at the right time into the model’s context window.
Evidence Strength Summary by Stream
| Stream | Overall Evidence | Strongest Area | Weakest Area |
|---|---|---|---|
| Agentic RAG | STRONG | Multi-step retrieval benchmarks | Enterprise deployment cost data |
| RLM/REPL | MODERATE | Conceptual framework, early benchmarks | Independent replication, production evidence |
| Agent-RAG-RLM Triangle | MODERATE-STRONG | Framework comparisons, memory taxonomy | Multi-agent heterogeneous RAG in production |
| Future Architectures | MODERATE | Multimodal RAG, hybrid retrieval | Federated RAG in production, personalized RAG |
Key Researchers & Labs
Agentic RAG
- Shunyu Yao (Princeton/OpenAI) — ReAct framework
- Akari Asai (UW/Meta) — Self-RAG (ICLR 2024)
- Microsoft Research — GraphRAG, A-RAG
- Google Research — Speculative RAG
- LangChain/LangGraph team — Production agentic RAG patterns
- Arize AI — Agentic RAG evaluation and observability
RLM/REPL
- Alex Zhang et al. (MIT) — Recursive Language Models paper (late 2025)
- LoopLM/Ouro researchers — Recurrent depth substituting for scale
- DeepSeek — R1 long-context reasoning architecture
- OpenAI — o1/o3/o4-mini reasoning models
- Anthropic — Claude extended thinking, “think” tool
Knowledge Graphs + RAG
- Microsoft Research — GraphRAG
- HKUDS — LightRAG (EMNLP 2025)
- Neo4j/Zep — Graphiti temporal knowledge graph
- DEEP-PolyU — Awesome-GraphRAG curation
Agent Memory
- Guibin Zhang et al. — “Memory in the Age of AI Agents” (arXiv 2512.13564)
- Letta/MemGPT team — Stateful memory architecture
- Mem0 — Memory-as-a-service
- AWS Bedrock team — AgentCore Memory
- Cognee — Memory pipeline architecture
Future Architectures
- NStarX — Knowledge runtime thesis
- Glean — Emerging agent stack/context engineering
- Various federated RAG teams (HyFedRAG, D-RAG, FairRAG)
- PersonaRAG, PGraphRAG, ARAG teams — Personalized RAG
Evidence Gaps & Contradictions
Gaps (Where Evidence Is Thin)
-
Production cost data for agentic RAG: No standardized, public cost-vs-accuracy comparisons across large-scale deployments. Most data is vendor-selected metrics.
-
RLM independent replication: Zhang et al.’s results are impressive but from a single lab. Awaiting independent benchmarking and replication.
-
Cross-model RLM+RAG comparisons: No unified benchmark comparing RLM+RAG across model sizes (small/medium/large) and retrieval modalities (dense vs sparse vs graph) under latency constraints.
-
Federated RAG in production: Active academic research but near-zero documented production deployments.
-
Personalized RAG at scale: Growing paper count but few production case studies with measurable outcomes.
-
Long-horizon robustness: No longitudinal studies of model drift and retrieval decay in recursive/agentic RAG pipelines over time.
-
Safety under recursive prompting: Attribution pipelines and filtering are recommended but rigorous, reproducible studies of hallucination reduction across tasks and scales are sparse.
Contradictions
-
“RAG is dead” vs “RAG is evolving”: Some sources claim standard RAG is obsolete (UCStrategies), while others frame the same developments as RAG’s natural evolution (NStarX, RAGFlow). Resolution: standard/naive RAG is indeed being replaced, but the RAG concept broadens rather than disappears.
-
Context windows solve everything vs context windows solve nothing: Some argue expanding windows (2M+ tokens) eliminate RAG need; others argue windows are working memory, not storage. Resolution: both are right for different workload sizes. CAG works for <1M token corpora; beyond that, retrieval remains necessary.
-
Convergence vs specialization: NStarX and Glean argue for unified knowledge runtimes; UCStrategies argues for a fork (CAG vs Agentic RAG). Resolution: convergence at the conceptual/orchestration layer; specialization at the implementation layer.
-
Memory replaces RAG vs memory extends RAG: VentureBeat claims memory “surpasses” RAG; Oracle says “memory extends RAG.” Resolution: semantic memory IS RAG; episodic and procedural memory extend beyond RAG. The taxonomy matters.
Master Source Index
Peer-Reviewed / Strong Evidence
- Self-RAG (ICLR 2024) — https://github.com/AkariAsai/self-rag
- LightRAG (EMNLP 2025) — https://github.com/HKUDS/LightRAG
- AIR-RAG (Neurocomputing 2026) — Adaptive iterative retrieval
- ReAct (Yao et al., 2022) — https://arxiv.org/abs/2210.03629
- Reflexion (OpenReview) — https://openreview.net/forum?id=FDG2G7JDWO
- Memory in the Age of AI Agents (arXiv 2512.13564) — https://arxiv.org/abs/2512.13564
- Agentic AI Architectures/Taxonomies (arXiv 2601.12560) — https://arxiv.org/html/2601.12560v1
- A Survey of Personalization: From RAG to Agent (arXiv 2504.10147) — https://arxiv.org/html/2504.10147v1
- Federated RAG Systematic Mapping (arXiv 2505.18906) — https://arxiv.org/abs/2505.18906
- HyFedRAG (arXiv 2509.06444) — https://arxiv.org/abs/2509.06444
Preprints / Moderate Evidence
- RLM (Zhang et al., MIT, late 2025) — https://alexzhang13.github.io/blog/2025/rlm/
- A-RAG (arXiv 2602.03442) — https://arxiv.org/abs/2602.03442
- Agentic RAG Survey (arXiv 2501.09136) — https://arxiv.org/abs/2501.09136
- Speculative RAG (Google, arXiv 2407.08223) — https://arxiv.org/pdf/2407.08223
- PlanRAG (arXiv 2601.19827) — https://arxiv.org/html/2601.19827v1
- CRAG — https://github.com/HuskyInSalt/CRAG
- RAG Latency Analysis (arXiv 2412.11854) — https://arxiv.org/html/2412.11854v1
- DeepSeek-R1 — https://huggingface.co/deepseek-ai/DeepSeek-R1
- LoopLM (arXiv 2510.25741v2) — https://arxiv.org/html/2510.25741v2
- ARAG for Personalized Recommendation (arXiv 2506.21931) — https://arxiv.org/html/2506.21931v1
- Agentic-KGR (OpenReview) — https://openreview.net/forum?id=7qQ50LrRn5
- Multi-Agent RAG for Entity Resolution (MDPI) — https://www.mdpi.com/2073-431X/14/12/525
Industry / Engineering Sources
- IBM RAG Primer — https://www.ibm.com/think/topics/retrieval-augmented-generation
- IBM Agentic RAG — https://www.ibm.com/think/topics/agentic-rag
- Microsoft GraphRAG — https://microsoft.github.io/graphrag/
- Microsoft AI Agents for Beginners (Agentic RAG) — https://microsoft.github.io/ai-agents-for-beginners/05-agentic-rag/
- Arize Understanding Agentic RAG — https://arize.com/blog/understanding-agentic-rag/
- LangGraph Agentic RAG Docs — https://docs.langchain.com/oss/python/langgraph/agentic-rag
- Google Speculative RAG Blog — https://research.google/blog/speculative-rag-enhancing-retrieval-augmented-generation-through-drafting/
- OpenAI o3/o4-mini — https://openai.com/index/introducing-o3-and-o4-mini/
- Glean Emerging Agent Stack 2026 — https://www.glean.com/blog/emerging-agent-stack-2026
- NStarX Next Frontier of RAG (2026-2030) — https://nstarxinc.com/blog/the-next-frontier-of-rag-how-enterprise-knowledge-systems-will-evolve-2026-2030/
- UCStrategies Standard RAG Is Dead — https://ucstrategies.com/news/standard-rag-is-dead-why-ai-architecture-split-in-2026/
- VentureBeat 6 Data Predictions 2026 — https://venturebeat.com/data/six-data-shifts-that-will-shape-enterprise-ai-in-2026
- Oracle Agent Memory — https://blogs.oracle.com/developers/agent-memory-why-your-ai-has-amnesia-and-how-to-fix-it
- Neo4j Graphiti — https://neo4j.com/blog/developer/graphiti-knowledge-graph-memory/
- AWS ALMA Healthcare — Referenced in Tavily research
- Onyx Workplace Benchmark — https://www.onyx.app/blog/benchmarking-agentic-rag-on-workplace-questions
- RAGFlow 2025 Year-End Review — https://ragflow.io/blog/rag-review-2025-from-rag-to-context
- NVIDIA Multimodal RAG Blog — https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation-for-video-and-audio/
- RLM in Google ADK — https://discuss.google.dev/t/recursive-language-models-in-adk/323523
- Awesome-GraphRAG — https://github.com/DEEP-PolyU/Awesome-GraphRAG
- Agent Memory Paper List — https://github.com/Shichun-Liu/Agent-Memory-Paper-List
- Multimodal RAG Survey — https://github.com/llm-lab-org/Multimodal-RAG-Survey
Governance & Risk Sources
- SAS RAG Governance (2025) — https://blogs.sas.com/content/sascom/2025/11/25/the-strategic-imperative-governance-for-retrieval-augmented-generation/
- ISACA AI Safety/Risk Blog (2025) — https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/safeguarding-the-future-strategies-for-protecting-generative-ai-llms-and-agentic-ai
- Dataiku Governance Notes — https://www.dataiku.com/stories/blog/the-risks-and-governance-requirements-of-agentic-ai
PARALLAX module note: This research synthesizes 80+ sources across 4 independent research streams. Evidence is strongest for Agentic RAG patterns and agent memory architectures, moderate for RLM/REPL (single-lab origin), and growing rapidly for future architectures (federated, personalized, multimodal). The convergence thesis is supported at the conceptual level but implementation remains fragmented across competing frameworks and approaches.
Strategic Synthesis
- Define one owner and one decision checkpoint for the next iteration.
- Measure both speed and reliability so optimization does not degrade quality.
- Close the loop with one retrospective and one execution adjustment.
Next step
If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.