From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Автор: Aleksandr Kovyazin
Загружено: 2024-10-15
Просмотров: 45
This research paper doesn't present a timeline of events in a traditional sense. It outlines a novel approach to query-focused summarization called Graph RAG. The process is described as a pipeline, which can be interpreted as a sequence of steps:
Pipeline Stages:
Source Documents → Text Chunks: Input documents are divided into smaller text chunks for efficient processing by the LLM.
Text Chunks → Element Instances: The LLM identifies and extracts key elements like entities, relationships, and claims from each text chunk.
Element Instances → Element Summaries: The LLM summarizes extracted elements into coherent descriptions for each element type (e.g., a single summary for all mentions of a specific entity).
Element Summaries → Graph Communities: The element summaries are used to construct a graph, and community detection algorithms (like Leiden) are applied to identify clusters of related elements.
Graph Communities → Community Summaries: The LLM generates comprehensive summaries for each identified community within the graph.
Community Summaries → Community Answers → Global Answer: Given a user query, each community summary is used to generate a partial answer. These are then combined and summarized into a final, comprehensive answer for the user.
Cast of Characters
This research paper is focused on a technical approach and doesn't involve a narrative with characters in a traditional sense. However, we can identify key figures and concepts as follows:
Key Figures & Concepts:
LLM (Large Language Model): The core engine of the Graph RAG approach, responsible for text processing, extraction, summarization, and answer generation. Specific LLMs mentioned include GPT, Llama, and Gemini.
User: The individual interacting with the Graph RAG system to seek insights from a large corpus of text. Example user roles mentioned include a tech journalist and an educator.
Graph: A representation of the relationships between entities and concepts extracted from the text corpus. This graph's structure is crucial for identifying meaningful communities.
Community Detection Algorithm (e.g., Leiden): An algorithm used to partition the graph into clusters of related elements, forming the basis for generating comprehensive summaries.
Community Summary: A concise yet detailed overview of a specific cluster of related entities and concepts within the graph.
Query: The question posed by the user to the Graph RAG system, seeking information or insights from the text corpus.
Authors:
The paper is authored by a team from Microsoft:
Darren Edge (Microsoft Research)
Ha Trinh (Microsoft Research)
Newman Cheng (Microsoft Strategic Missions and Technologies)
Joshua Bradley (Microsoft Strategic Missions and Technologies)
Alex Chao (Microsoft Office of the CTO)
Apurva Mody (Microsoft Office of the CTO)
Steven Truitt (Microsoft Strategic Missions and Technologies)
Jonathan Larson (Microsoft Research)
https://arxiv.org/abs/2404.16130
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: