Skip to content

English edition

RAG Future Through a Parallax Lens

RAG strategy fails when viewed from a single technical angle. A parallax approach aligns product, governance, and knowledge operations in one frame.

VZ editorial frame

Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.

VZ Lens

In VZ framing, the point is not novelty but decision quality under uncertainty. RAG strategy fails when viewed from a single technical angle. A parallax approach aligns product, governance, and knowledge operations in one frame. The practical edge comes from turning this into repeatable decision rhythms.

Module: PARALLAX (Multi-Perspective Research Engine) | GFIS
Date: March 9, 2026
Status: Complete — 4 streams synthesized
Evidence base: 80+ sources (academic papers, industry reports, engineering blogs, framework docs)


The Prague Window

I’m sitting on the windowsill of the coworking space. Rain streaks down the old-fashioned glass, and behind it, the Vltava flows gray. Four research streams are running in parallel on my laptop—Agentic RAG, Recursive LM, REPL convergence, future architectures. My coffee is already cold.

Outside, the bridge vanishes into the fog, and a question crystallizes within me: what happens when these lines are not just parallel, but converge? When retrieval ceases to be a passive data carrier and becomes an active participant in the thought cycle?

My finger scrolls through a line of code that no longer just queries, but asks back. This isn’t about what we can load. It’s about the system finally learning to process what it has loaded.

Table of Contents

  1. Stream 1: Agentic RAG
  2. Stream 2: Recursive Language Models & REPL
  3. Stream 3: The Agent-RAG-RLM Triangle
  4. Stream 4: Future Architectures (2026-2028)
  5. Cross-Stream Synthesis: The Convergence Map
  6. Key Researchers & Labs
  7. Evidence Gaps & Contradictions
  8. Master Source Index

Stream 1: Agentic RAG

1.1 What Is Agentic RAG? How Does It Differ from Traditional RAG?

Traditional RAG is a five-stage stateless pipeline: user prompt —> retrieval query —> returned documents —> augmented prompt —> LLM generation. Each query is independent; no planning, no re-retrieval, no tool use [IBM RAG Primer; arXiv 2410.12837].

Agentic RAG augments this with autonomous agentic components:

  • Planners that decompose complex queries into sub-queries
  • Evaluators that assess retrieval quality and trigger re-retrieval
  • Tool registries enabling web search, calculators, DB queries, API calls
  • Persistent memory across steps and sessions
  • Control loops that iterate until task completion

Evidence strength: STRONG — Multiple peer-reviewed surveys (arXiv 2501.09136, arXiv 2602.03442), industry adoption documented by IBM, Microsoft, Arize, LangChain.

Pragmatic taxonomy (recommended):

PatternCore MechanismLatencyComplexityBest For
Classic RAGSingle retrieve —> augment —> generateLowLowSimple QA, customer support
Iterative RAGRepeated retrieve+generate with loop policyMediumMediumMulti-hop QA, legal research
Agentic RAGPlanner + actors + tools + memory + evaluatorsHighHighEnterprise automation, multi-system workflows
Agent-with-RAGExternal agent uses RAG as one bounded toolVariableMediumFlexible agent architectures

1.2 Key Papers: Agents Using Retrieval as a Tool

Paper/SystemYearCore ContributionEvidence
ReAct (Yao et al.)2022/2023Thought-action-observation loop with retrieval as an actionSTRONG
Self-RAG (Asai et al., ICLR 2024)2024LLM trained to decide when to retrieve and self-critique via reflection tokensSTRONG
CRAG (Corrective RAG)2024Plug-and-play retrieval evaluator that triggers refinementSTRONG
Speculative RAG (Google Research)2024Small model drafts, large model verifies; ~12.97% accuracy gain, 51% latency reductionSTRONG
A-RAG (arXiv 2602.03442)2025Hierarchical retrieval interfaces scaling agentic RAGMODERATE
AIR-RAG (Neurocomputing 2026)2026Adaptive iterative retrieval without retraining retrieverSTRONG (peer-reviewed)
PlanRAG (arXiv 2601.19827)2026Plan-then-retrieve: structured plan targets retrievalMODERATE

1.3 Multi-Step Retrieval Approaches

 Multi-Step RAG Variants
 
 Iterative RAG Self-RAG CRAG
  +-----------+     +-----------+     +-----------+
  | Retrieve  |     | LLM with  |     | Retrieve  |
  | Generate  |     | reflection|     | Evaluate  |
  | Loop until|     | tokens    |     | Re-fetch  |
  | confident |     | decides   |     | if poor   |
  +-----------+     +-----------+     +-----------+
       | | |
 v v v
  Speculative RAG    AIR-RAG PlanRAG
  +-----------+     +-----------+     +-----------+
  | Small LM  |     | Adaptive  |     | Plan first|
  | drafts    |     | iterative |     | Retrieve  |
  | Large LM  |     | refinement|     | to plan   |
  | verifies  |     | no retrain|     | Re-plan   |
  +-----------+     +-----------+     +-----------+

Key finding: Retrieval contributes ~41% of end-to-end latency and nearly doubles TTFT in production systems [arXiv 2412.11854v1]. Multi-step approaches multiply this cost but substantially improve correctness on complex queries.

1.4 When Does a RAG Pipeline Become an “Agent”?

Agenticity Checklist (synthesized from Microsoft, Arize, Towards Data Science):

A RAG system qualifies as “agentic” if it satisfies 4+ of these criteria, including items 1, 2, and 5:

  1. Explicit planner/actor loop — can initiate multiple retrieval/generation/tool steps
  2. Autonomous decision logic — decides to retrieve, re-retrieve, call tools without human prompts
  3. Tool invocation APIs — can call external tools (web search, calculators, DBs)
  4. Persistent memory/state — retains context across steps or sessions
  5. Evidence evaluation & re-retrieval — validators that trigger corrective retrieval
  6. Audit trail — logs queries, retrievals, tool calls, reasoning steps
  7. Orchestration & fault tolerance — retry logic, observability, error handling

The boundary is fuzzy by design: The “RAG agent” vs “agent with RAG” distinction is functional rather than nominal. Sources use both phrasings inconsistently. The checklist above resolves the ambiguity for governance and architecture decisions.

1.5 Real-World Agentic RAG Deployments

DeploymentDomainArchitectureReported ResultsEvidence
ALMA (AWS Bedrock)HealthcareBedrock + custom RAG + SSO98% accuracy on medical residency exam, 65% routine adoption, 98% satisfactionMODERATE (vendor blog)
CFA Institute patternsFinanceVector stores (FAISS/Chroma) + SQL agents + web APIsInternal retrieval reduces hallucination vs web retrievalMODERATE (industry content)
Legal research agentsLegalAgentic decomposition of statutes/cases/filingsDays-long research compressed to interactive sessionsWEAK (anecdotal)
Onyx workplaceEnterpriseDirect data indexing + agentic orchestrationHigh win rate on 99 workplace questions; avg ~34.7s responseMODERATE (product benchmark)

1.6 The “RAG Agent” vs “Agent with RAG” Distinction

Does it matter? Yes, for governance and architecture:

  • RAG Agent (agentic RAG): The RAG pipeline itself has agentic control — planner, evaluator, tool-use, memory are embedded in the retrieval-generation loop.
  • Agent with RAG: An external agent (AutoGPT, CrewAI, etc.) treats RAG as one bounded tool among many; the agent’s intelligence is separate from the retrieval system.

Practical implication: “RAG Agent” requires deeper integration of governance controls (audit trails, access controls) within the RAG pipeline itself. “Agent with RAG” can apply governance at the agent orchestration layer.


Stream 2: Recursive Language Models & REPL

2.1 What Are Recursive Language Models?

Definition (Zhang et al., MIT, late 2025): RLMs are an inference strategy — not a new model class — where language models recursively call themselves or other LLMs for intermediate computation. The key innovation: the full context is stored externally in a REPL-like environment, and the LLM emits code/sub-tasks to process it incrementally.

RLM Architecture (Zhang et al.)

  User Query + Context (10M+ tokens)
 |
 v
  +------------------+
  | RLM Controller   |  <-- Thin wrapper around LLM
  | (REPL Environment)|
  +------------------+
 |
    +----+----+----+
    | |    |
    v v    v
  Sub-task  Sub-task  Sub-task
  (chunk1)  (chunk2)  (chunk3)
    | |    |
    +----+----+----+
 |
 v
  Aggregate Results
 |
 v
  Final Answer

Three core primitives (confirmed across MIT paper, Google ADK implementation, practitioner reports):

  1. Programmatic control over context via a REPL-like execution loop
  2. Recursive delegation for task and context decomposition
  3. Agent-mediated aggregation of partial results

State machine formalization: An RLM has transitions including CODE_EXEC, llm_query, and FINAL, with configuration parameters: max_depth, max_subcalls, max_cost, timeout_seconds.

Evidence strength: MODERATE — Single primary paper (Zhang et al., MIT) with growing implementation ecosystem (Google ADK, DSPy modules). Not yet widely replicated in peer-reviewed venues.

2.2 The REPL Paradigm Applied to LLMs

The REPL (Read-Eval-Print Loop) framing treats the LLM workflow as:

  1. Read: Full document/context stored in a Python variable (external to LLM window)
  2. Eval: LLM emits code to inspect, slice, filter, and analyze the context
  3. Print: Intermediate results are aggregated
  4. Loop: Re-enter with refined sub-tasks until answer is complete

Who proposed this? Alex Zhang and colleagues at MIT (paper published late 2025). The codebase is available as rlm_repl on GitHub. Google’s ADK community has adopted and extended the pattern.

Key claim: RLMs can process contexts of 10M+ tokens — far beyond any native context window — by decomposing the work recursively. Preliminary results show:

  • 91.3% accuracy on multi-document retrieval tasks (GPT-5-level)
  • 62% accuracy on LongBench-v2 CodeQA vs 22% for non-recursive baselines

Evidence strength: MODERATE — Impressive numbers from a single lab; awaiting independent replication.

2.3 How RLM/REPL Relates to Existing Paradigms

ParadigmRelationship to RLMKey Difference
Chain-of-Thought (CoT)RLM can use CoT within each recursive stepCoT is linear; RLM is tree/graph-structured
Tree-of-Thought (ToT)RLM naturally implements ToT via recursive branchingToT explores thought units; RLM decomposes context
Graph-of-Thought (GoT)RLM sub-tasks can form dependency graphsGoT models thought dependencies; RLM models context dependencies
Self-reflection / ReflexionRLM supports generate-critique-refine within loopsReflexion is about output quality; RLM is about context management
Iterative RAGRLM + RAG = recursive retrieval-generation cyclesIterative RAG retrieves from external stores; RLM processes what’s already loaded

Critical distinction: CoT/ToT/GoT are reasoning strategies about how to think. RLM is a context management strategy about how to handle unbounded input. They are complementary, not competing.

2.4 RLM + RAG Integration

Recursive retrieval-generation cycles combine the best of both:

Query --> RLM decomposes into sub-queries
 |
 +--> Sub-query 1 --> RAG retrieval --> Generation
 |
 +--> Sub-query 2 --> RAG retrieval --> Generation 
 |
 +--> Sub-query 3 --> RAG retrieval --> Generation
          |
 v
 RLM aggregates sub-results
 |
 v
 Final synthesized answer

Evidence from multi-step retrieval benchmarks:

  • Multi-step retrieval shows >50% improvement over single-step on defined end-to-end evaluation tasks [FRAMES benchmark]
  • RT-RAG (hierarchical tree decomposition) achieves +7.0% F1 and +6.0% EM over SOTA on multi-hop QA benchmarks (MuSiQue, 2WikiMQA, HotpotQA)

2.5 Does Recursive Prompting Improve RAG Quality?

Yes, with caveats.

BenchmarkImprovementMethodEvidence
FRAMES (end-to-end RAG)0.408 —> 0.66 accuracyMulti-step reasoningSTRONG
MuSiQue/2WikiMQA/HotpotQA+7% F1, +6% EMRT-RAG hierarchical decompositionSTRONG
Game of 24 (ToT)4% —> 74% successTree-of-Thought vs CoTSTRONG
TruthfulQA (Reflexion)Significant gains for smaller modelsGenerate-critique-refine loopsSTRONG
LongBench-v2 CodeQA22% —> 62%RLM recursive processingMODERATE

Caveats: Latency increases substantially with each recursive step. LLM generation latency often dominates total response time — retrieval speedups of ~100ms can be invisible if LLM response dominates.

2.6 The “Reasoning Loop” Models and RAG

Model FamilyReasoning MechanismRAG Relationship
DeepSeek R1671B params, 37B active, 128K context (YaRN), MLA + MoELong-context architecture reduces need for retrieval; strong multi-hop performance natively
OpenAI o1/o3/o4-miniRL-trained reasoning with CoT; tool use (web, Python, image gen)Reasoning + tool use enables agentic RAG patterns within the model API itself
Claude Extended Thinking128K internal token window for reasoning; “think” toolExtended reasoning budget improves factuality; can invoke tools/retrieval within thinking
LoopLM (Ouro)Recurrent transformer stack reuse (4 iterations)1.4B model matches 12B SOTA on select benchmarks; recurrence substitutes for scale

Key insight: These reasoning models are making RAG both more powerful and less necessary simultaneously. Long context windows (128K-2M tokens) mean more knowledge can be preloaded (CAG pattern), while reasoning capabilities mean the model can better judge when retrieval IS needed and what to do with results.

Evidence strength: STRONG for DeepSeek R1 and OpenAI o-series benchmarks. MODERATE for Claude extended thinking (fewer public benchmarks). MODERATE for LoopLM (single-lab results).


Stream 3: The Agent-RAG-RLM Triangle

3.1 How Modern AI Agent Frameworks Use RAG

FrameworkRAG IntegrationMemory ModelArchitecture
LangGraphDeep — retrieval as tool nodes in state graphs; generate_or_query_or_respond decision nodesState-based memory with checkpointing and persistenceGraph-based workflow orchestration
CrewAIBuilt-in RAG tools; role-based memory with RAGStructured, role-based memory with RAGRole-playing multi-agent crews
AutoGen/AG2RAG through tool registration; multi-turn retrievalConversational history storageMulti-agent conversation framework
OpenAI Agents SDKBuilt-in vector store tools (file_search); web searchThread-based context persistenceProduction-ready single-agent framework
LlamaIndex AgentsNative RAG — grounded, reliable retrieval-first designIndex-based retrieval with rerankingRAG-centric agent architecture

Key observation (DataCamp comparison, 2026): “LangGraph provides state-based memory with checkpointing and persistence. CrewAI uses structured, role-based memory with RAG, while AutoGen stores conversational history.” Each framework makes fundamentally different architectural choices about where RAG sits relative to the agent.

Evidence strength: STRONG — Well-documented in framework docs, extensive practitioner reporting, multiple comparison guides.

3.2 The Memory Problem in Agents

The memory taxonomy mirrors human cognition [arXiv 2512.13564, “Memory in the Age of AI Agents”]:

Agent Memory Architecture
+-------------------------------------------------------+
| WORKING MEMORY |
|  (Context window: 200K-2M tokens) |
|  Current conversation + active reasoning |
+-------------------------------------------------------+
         | | |
 v v v
+----------------+  +------------------+  +------------------+
| EPISODIC |  | SEMANTIC |  | PROCEDURAL |
| MEMORY |  | MEMORY |  | MEMORY |
| "What happened"|  | "What I know"    |  | "How to do it"   |
| Interaction    |  | Domain knowledge |  | Skills, workflows|
| history, |  | concepts, facts  |  | learned patterns |
| past sessions  |  | (RAG/vector DB)  |  | |
+----------------+  +------------------+  +------------------+

Critical 2026 insight (VentureBeat, Oracle): “Large context windows (200K-400K tokens in Claude Opus 4.5, GPT-5.2, up to 2M in Gemini 3 Pro) have NOT solved agent memory. Injecting full conversation history into every API call creates unsustainable cost and latency. Context windows are working memory — they’re not long-term storage.”

The spectrum is shifting: Traditional RAG —> Agentic RAG —> Full Memory Systems. VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026.

Key products in the memory space (2026):

  • Mem0: Memory-as-a-service for agents
  • Letta/MemGPT: Stateful memory server with explicit editable memory blocks
  • Cognee: Memory as pipeline (ingestion —> structuring —> recall)
  • Amazon Bedrock AgentCore Memory: Managed extraction, consolidation, retrieval
  • Graphiti (Zep/Neo4j): Knowledge graph memory for temporal agent state

Unsolved tension (Oracle/GDPR): GDPR right-to-be-forgotten requires data deletion, but EU AI Act (August 2026) requires 10-year audit trails for high-risk systems. This creates an architectural paradox for agent memory systems.

Evidence strength: STRONG — ICLR 2026 workshop proposal on MemAgents, comprehensive survey (arXiv 2512.13564), multiple production implementations.

3.3 Knowledge Graphs + RAG + Agents

GraphRAG (Microsoft Research, 2024):

  • Creates entity-centric knowledge graphs from input corpus
  • LLMs precompute community summaries
  • Dramatically improves reasoning over relationship-rich queries
  • Enables queries that require traversing relationships across data types

LightRAG (EMNLP 2025):

  • Dual retriever system: local retriever for entity-level questions, global retriever for complex subgraph reasoning
  • Lightweight, fast, suitable for production
  • Enhanced extraction accuracy for open-source LLMs (Qwen3-30B-A3B)

Graphiti (Neo4j/Zep):

  • Temporal knowledge graph memory for agents
  • Unlike GraphRAG’s static community summaries, Graphiti handles evolving, temporal data
  • Designed for “agentic world” where memory must update in real-time

Agentic-KGR (OpenReview): Co-evolutionary knowledge graph construction through multi-agent reinforcement learning. When integrated with GraphRAG, achieves superior QA performance with gains in both accuracy and knowledge coverage.

Knowledge Graph + RAG + Agents Integration

  Agent Layer (Planning, Reasoning, Tool Use)
 | |
 v v
  GraphRAG Layer Vector RAG Layer
  (Entity-Relationship  (Semantic Similarity
   Traversal) Search)
 | |
 v v
  Knowledge Graph Vector Database
  (Neo4j, etc.) (Qdrant, Pinecone)
 | |
 +-------- + ---------+
 |
 v
 Unified Knowledge
 Representation

Evidence strength: STRONG for GraphRAG (Microsoft Research paper + open-source). STRONG for LightRAG (EMNLP 2025). MODERATE for Graphiti and agentic KG construction.

3.4 Multi-Agent Systems with Different Knowledge Bases

Emerging pattern: Specialized agents with specialized retrieval.

The Multi-Agent RAG Framework for Entity Resolution (MDPI, 2025) demonstrates:

  • Modular coordination with specialized agents (direct matching, indirect matching, household clustering)
  • Each agent writes to logically disjoint sections of shared state
  • Orchestrator deterministically merges results from parallel branches
  • LangGraph-based unified orchestration

Oracle A2A Protocol + LangChain (2025): Scalable multi-agent RAG system where agents communicate via the Agent-to-Agent (A2A) protocol, each with different knowledge bases and capabilities.

The pattern: Different agents for different knowledge domains, connected through:

  • Shared state graphs (LangGraph)
  • Message-passing protocols (A2A, MCP)
  • Hierarchical orchestration (planner agent delegates to specialist agents)

Evidence strength: MODERATE — Documented in framework guides and early academic papers; few large-scale production case studies published.

3.5 The Convergence Thesis

Are RAG, agents, and RLM converging into a single architecture?

Evidence FOR convergence:

  1. NStarX thesis (2026-2030): “RAG will undergo a fundamental architectural shift — from a retrieval pipeline bolted onto LLMs to an autonomous knowledge runtime that orchestrates retrieval, reasoning, verification, and governance as unified operations.”

  2. Glean’s emerging agent stack (2026): Context engineering is the unifying discipline — “the delicate art and science of filling the context window with just the right information for the next step” (Andrej Karpathy). RAG, agent memory, and reasoning all serve this same purpose.

  3. The agentic taxonomy (arXiv 2601.12560): Unified taxonomy breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration — with RAG as one tool within the Action layer and reasoning (including RLM-style recursion) within the Brain layer.

  4. RLM in Google ADK (2026): Recursive Language Models are being implemented directly within agentic frameworks, treating context management as an agent capability.

Evidence AGAINST full convergence (or at least for sustained specialization):

  1. The architecture fork (UCStrategies, 2026): “Standard RAG is dead” — the field is splitting into CAG (for static/small knowledge bases) and Agentic RAG (for complex reasoning), not converging into one.

  2. Latency-accuracy tradeoff: Simple tasks don’t need agent overhead. CAG completes in 2.33s vs RAG’s 94.35s. One-size-fits-all is anti-pattern.

  3. Framework fragmentation: LangGraph, CrewAI, AutoGen, OpenAI SDK all make fundamentally different architectural choices. No single winning pattern has emerged.

PARALLAX assessment: The convergence is happening at the conceptual level (unified knowledge runtime) but NOT at the implementation level (no single architecture dominates). The field is converging on a shared understanding that retrieval, reasoning, and agent control are interconnected concerns, while diverging on how to implement that understanding.

The Convergence Map (Conceptual)

 REASONING
 (RLM/REPL,
 CoT/ToT/GoT,
 Extended Thinking)
 /\
 /  \
 /    \
 /  The \
 / Knowledge\
 /  Runtime   \
 / \
     /________________\
  RETRIEVAL AGENCY
  (RAG, GraphRAG,    (Planners, Tool Use,
   Vector Search,     Memory, Multi-Agent,
   Hybrid Index) Orchestration)

Each vertex is pulling toward the center:
- Retrieval is becoming agentic (self-RAG, CRAG)
- Agents are becoming retrieval-aware (memory systems)
- Reasoning is becoming recursive (RLM, extended thinking)

Stream 4: Future Architectures (2026-2028)

4.1 Next-Gen RAG Architectures Being Proposed

ArchitectureDescriptionMaturityEvidence
Agentic RAGPlanner + evaluator + tools + memory within RAG loopProduction (early)STRONG
Cache-Augmented Generation (CAG)Preload entire knowledge base into extended context window; eliminate retrievalProductionSTRONG
GraphRAGKnowledge graph + community summaries + vector retrievalProductionSTRONG
Adaptive/Router RAGDynamically selects retrieval strategy per query complexityProduction (early)MODERATE
RLM-RAGRecursive decomposition of retrieval tasks via REPLExperimentalMODERATE
Federated RAGDistributed knowledge bases with privacy-preserving retrievalResearchMODERATE
Multimodal RAGImages, video, audio retrieval in unified embedding spaceProduction (early)STRONG
Personalized RAGUser-specific retrieval, preference-aware reranking/generationResearch/EarlyMODERATE

4.2 Hybrid Retrieval: Sparse + Dense + KG + Structured

The production consensus for 2026: no single retrieval method suffices.

Production systems routinely maintain multiple knowledge representations:

  • Vector embeddings for semantic search (dense retrieval)
  • BM25/TF-IDF for keyword matching (sparse retrieval)
  • Knowledge graphs for relationship reasoning (graph traversal)
  • Hierarchical indexes for categorical navigation
  • Structured data (SQL, APIs) for factual lookups

The “10 Types of RAG” in 2026 (multiple sources confirm this diversification):

  1. Naive/Standard RAG
  2. Advanced RAG (with reranking, query rewriting)
  3. Modular RAG (composable pipeline components)
  4. Graph RAG (knowledge graph-augmented)
  5. Agentic RAG (agent-orchestrated)
  6. Adaptive/Router RAG (dynamic strategy selection)
  7. Corrective RAG (CRAG pattern)
  8. Self-RAG (model-internal retrieval decisions)
  9. Speculative RAG (draft/verify pattern)
  10. Multimodal RAG (cross-modal retrieval)

4.3 Personalized RAG

Survey finding (arXiv 2504.10147, “A Survey of Personalization: From RAG to Agent”):

Personalization spans three RAG stages:

  1. Pre-retrieval: User-specific query expansion and reformulation
  2. Retrieval: Personalized reranking based on user history and preferences
  3. Generation: Adapting output style, depth, and focus to individual users

Key systems:

  • PersonaRAG: User-centric agents in the retrieval process [CEUR-WS 2024]
  • PGraphRAG: User-centric knowledge graphs for personalized retrieval [Au et al., 2025]
  • ARAG (arXiv 2506.21931): Agentic RAG for personalized recommendation — separates user understanding, semantic alignment, context synthesis, and ranking into specialized agents

Evidence strength: MODERATE — Active research area with growing paper count but few production deployments documented.

4.4 Multimodal RAG

The state of multimodal RAG in 2026:

  • Unified embedding spaces: CLIP, BLIP2, and custom dual-/multi-tower transformers encode text, images, and audio into shared vector spaces
  • Query planning modules: Classify retrieval need (text, image, audio, composite), decompose multi-hop queries, dynamically re-route
  • Production adoption: IBM, NVIDIA, multiple startups offering multimodal RAG platforms

Key challenge: Cross-modal alignment quality varies significantly. Text-to-image retrieval is mature; audio and video retrieval remain less reliable.

Evidence strength: STRONG for text+image (CLIP/BLIP2 ecosystem). MODERATE for video/audio (NVIDIA blog, early research).

4.5 Federated RAG: Distributed, Privacy-Preserving

Systematic mapping (arXiv 2505.18906): 18 primary studies identified (2020-2025) addressing federated RAG.

Key approaches:

  • HyFedRAG: Privacy-preserving + heterogeneous data; anonymization via Presidio masking, Eraser4RAG, TenSEAL encryption [arXiv 2509.06444]
  • Dual Federated RAG (DF-RAG): Separately federates retrieval and generation components
  • D-RAG: Blockchain-based decentralized RAG with privacy-preserving consensus protocol
  • Privacy-Preserving Federated Embedding Learning: Collaborative training of client-side RAG retrieval models with parameter aggregation on central server [arXiv 2504.19101]

Enterprise driver: EU AI Act + GDPR create dual pressures for both data privacy and auditability that federated approaches naturally address.

Evidence strength: MODERATE — Growing body of academic work but minimal production deployment evidence.

4.6 “RAG 3.0” / “Post-RAG” — What Comes After?

Three competing visions:

Vision 1: CAG replaces RAG for static workloads (UCStrategies, 2026)

  • Context windows expand to 10M+ tokens
  • CAG preloads entire knowledge bases, eliminating retrieval overhead
  • 40.5x speed improvement over standard RAG on benchmarks
  • Prediction: Standard RAG dies; CAG handles static, Agentic RAG handles dynamic

Vision 2: The Knowledge Runtime (NStarX, 2026-2030)

  • RAG evolves from “retrieval pipeline bolted onto LLMs” to “autonomous knowledge runtime”
  • Orchestrates retrieval, reasoning, verification, access control, and audit trails
  • Analogous to Kubernetes for information flow
  • Driven by: EU AI Act compliance, institutional knowledge loss (retirement crisis), economic need for verifiable truth

Vision 3: Memory supersedes RAG (VentureBeat, Oracle, 2026)

  • “Contextual memory will surpass RAG for agentic AI in 2026”
  • RAG retrieves documents; Memory understands context
  • The winners will do both, but memory is the differentiator
  • Shift: RAG —> Agentic RAG —> Full Memory Systems

PARALLAX assessment: These visions are complementary, not competing. CAG handles the “known knowledge” tier. Agentic RAG handles “dynamic discovery.” Memory systems handle “learned experience.” The knowledge runtime is the orchestration layer that decides which to invoke.


Cross-Stream Synthesis

The Convergence Map

 2024 2026 2028
 | | |
  RETRIEVAL:    Basic RAG ---------> Agentic RAG + GraphRAG --> Knowledge Runtime
 + CAG fork + Federated RAG
                                                               + Multimodal
 | | |
  REASONING:    CoT + ReAct -------> Extended Thinking -------> RLM + Reasoning
 o1/o3/R1/Claude as native agent
 capability
 | | |
  AGENCY: Single agents -----> Multi-agent + Memory ----> Autonomous
 (AutoGPT v1) (CrewAI, LangGraph) Knowledge Agents
 + GraphRAG Memory + Self-improving
 memory
 | | |
  CONVERGENCE:  Separate ----------> Shared concepts ----------> Unified
 concerns (context engineering) Knowledge Runtime

Key Cross-Stream Connections

  1. RLM enables better Agentic RAG: Recursive decomposition lets agents handle arbitrarily complex multi-hop queries by breaking them into manageable retrieval sub-tasks.

  2. Agent memory IS the evolution of RAG: Episodic + semantic + procedural memory is RAG generalized. RAG provides semantic memory; the agent adds episodic and procedural layers.

  3. Reasoning models reduce RAG dependence for some tasks: With 128K-2M token windows and strong reasoning, models can preload more context (CAG) and reason better over what they retrieve (fewer retrieval steps needed).

  4. Knowledge graphs bridge all three: GraphRAG serves retrieval (structured search), reasoning (relationship traversal), and agency (dynamic memory updates via Graphiti).

  5. The “context engineering” thesis unifies everything: All three streams are ultimately about one problem — putting the right information in the right format at the right time into the model’s context window.

Evidence Strength Summary by Stream

StreamOverall EvidenceStrongest AreaWeakest Area
Agentic RAGSTRONGMulti-step retrieval benchmarksEnterprise deployment cost data
RLM/REPLMODERATEConceptual framework, early benchmarksIndependent replication, production evidence
Agent-RAG-RLM TriangleMODERATE-STRONGFramework comparisons, memory taxonomyMulti-agent heterogeneous RAG in production
Future ArchitecturesMODERATEMultimodal RAG, hybrid retrievalFederated RAG in production, personalized RAG

Key Researchers & Labs

Agentic RAG

  • Shunyu Yao (Princeton/OpenAI) — ReAct framework
  • Akari Asai (UW/Meta) — Self-RAG (ICLR 2024)
  • Microsoft Research — GraphRAG, A-RAG
  • Google Research — Speculative RAG
  • LangChain/LangGraph team — Production agentic RAG patterns
  • Arize AI — Agentic RAG evaluation and observability

RLM/REPL

  • Alex Zhang et al. (MIT) — Recursive Language Models paper (late 2025)
  • LoopLM/Ouro researchers — Recurrent depth substituting for scale
  • DeepSeek — R1 long-context reasoning architecture
  • OpenAI — o1/o3/o4-mini reasoning models
  • Anthropic — Claude extended thinking, “think” tool

Knowledge Graphs + RAG

  • Microsoft Research — GraphRAG
  • HKUDS — LightRAG (EMNLP 2025)
  • Neo4j/Zep — Graphiti temporal knowledge graph
  • DEEP-PolyU — Awesome-GraphRAG curation

Agent Memory

  • Guibin Zhang et al. — “Memory in the Age of AI Agents” (arXiv 2512.13564)
  • Letta/MemGPT team — Stateful memory architecture
  • Mem0 — Memory-as-a-service
  • AWS Bedrock team — AgentCore Memory
  • Cognee — Memory pipeline architecture

Future Architectures

  • NStarX — Knowledge runtime thesis
  • Glean — Emerging agent stack/context engineering
  • Various federated RAG teams (HyFedRAG, D-RAG, FairRAG)
  • PersonaRAG, PGraphRAG, ARAG teams — Personalized RAG

Evidence Gaps & Contradictions

Gaps (Where Evidence Is Thin)

  1. Production cost data for agentic RAG: No standardized, public cost-vs-accuracy comparisons across large-scale deployments. Most data is vendor-selected metrics.

  2. RLM independent replication: Zhang et al.’s results are impressive but from a single lab. Awaiting independent benchmarking and replication.

  3. Cross-model RLM+RAG comparisons: No unified benchmark comparing RLM+RAG across model sizes (small/medium/large) and retrieval modalities (dense vs sparse vs graph) under latency constraints.

  4. Federated RAG in production: Active academic research but near-zero documented production deployments.

  5. Personalized RAG at scale: Growing paper count but few production case studies with measurable outcomes.

  6. Long-horizon robustness: No longitudinal studies of model drift and retrieval decay in recursive/agentic RAG pipelines over time.

  7. Safety under recursive prompting: Attribution pipelines and filtering are recommended but rigorous, reproducible studies of hallucination reduction across tasks and scales are sparse.

Contradictions

  1. “RAG is dead” vs “RAG is evolving”: Some sources claim standard RAG is obsolete (UCStrategies), while others frame the same developments as RAG’s natural evolution (NStarX, RAGFlow). Resolution: standard/naive RAG is indeed being replaced, but the RAG concept broadens rather than disappears.

  2. Context windows solve everything vs context windows solve nothing: Some argue expanding windows (2M+ tokens) eliminate RAG need; others argue windows are working memory, not storage. Resolution: both are right for different workload sizes. CAG works for <1M token corpora; beyond that, retrieval remains necessary.

  3. Convergence vs specialization: NStarX and Glean argue for unified knowledge runtimes; UCStrategies argues for a fork (CAG vs Agentic RAG). Resolution: convergence at the conceptual/orchestration layer; specialization at the implementation layer.

  4. Memory replaces RAG vs memory extends RAG: VentureBeat claims memory “surpasses” RAG; Oracle says “memory extends RAG.” Resolution: semantic memory IS RAG; episodic and procedural memory extend beyond RAG. The taxonomy matters.


Master Source Index

Peer-Reviewed / Strong Evidence

Preprints / Moderate Evidence

Industry / Engineering Sources

Governance & Risk Sources


PARALLAX module note: This research synthesizes 80+ sources across 4 independent research streams. Evidence is strongest for Agentic RAG patterns and agent memory architectures, moderate for RLM/REPL (single-lab origin), and growing rapidly for future architectures (federated, personalized, multimodal). The convergence thesis is supported at the conceptual level but implementation remains fragmented across competing frameworks and approaches.

Strategic Synthesis

  • Define one owner and one decision checkpoint for the next iteration.
  • Measure both speed and reliability so optimization does not degrade quality.
  • Close the loop with one retrospective and one execution adjustment.

Next step

If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.