GraphRAG 在 LlamaIndex 中的实现 - V2¶
GraphRAG (图谱 + 检索增强生成) 结合了检索增强生成 (RAG) 和查询聚焦总结 (QFS) 的优势,可有效处理大型文本数据集上的复杂查询。虽然 RAG 擅长获取精确信息,但在需要主题理解的更广泛查询方面表现不佳,而这是 QFS 可以解决但无法很好地扩展的挑战。GraphRAG 集成了这些方法,以在广泛、多样化的文本语料库中提供响应迅速且全面的查询能力。
本 notebook 提供了使用 Neo4J 和 LlamaIndex PropertyGraph 抽象构建 GraphRAG 管道的指南。
本 notebook 将 GraphRAG 管道更新到 v2。如果您还没有查看 v1,可以在此处找到。以下是现有实现的更新内容:
- 与 Neo4J 图谱数据库集成。
- 基于嵌入的检索。
安装¶
使用 `graspologic` 构建社区层次化的 Leiden 算法。
!pip install llama-index llama-index-graph-stores-neo4j graspologic numpy==1.24.4 scipy==1.12.0 future
加载数据¶
我们将使用从 Diffbot 检索到的新闻文章样本数据集,Tomaz 已方便地将其放在 GitHub 上以便轻松访问。
数据集包含 2,500 个样本;为了便于实验,我们将使用其中 50 个样本,其中包括新闻文章的 title
和 text
。
import pandas as pd
from llama_index.core import Document
news = pd.read_csv(
"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/news_articles.csv"
)[:50]
news.head()
标题 | 日期 | 文本 | |
---|---|---|---|
0 | 雪佛龙:行业最佳 | 2031-04-06T01:36:32.000000000+00:00 | JHVEPhoto 和 O&G 行业的许多公司一样... |
1 | FirstEnergy (NYSE:FE) 公布收益报告 | 2030-04-29T06:55:28.000000000+00:00 | FirstEnergy (NYSE:FE – 获取评级) 公布了... |
2 | Sinn Féin TD 发言后 Dáil 几乎被暂停... | 2023-06-15T14:32:11.000000000+00:00 | 周四,Sinn Féin TD 发表了... |
3 | Epic 最新工具可为超现实主义... 动画 | 2023-06-15T14:00:00.000000000+00:00 | 今天,Epic 发布了一款新工具,旨在... |
4 | 欧盟将禁止华为、中兴进入欧盟委员会内部... | 2023-06-15T13:50:00.000000000+00:00 | 欧盟委员会正计划禁止... |
按照 LlamaIndex 要求准备文档
documents = [
Document(text=f"{row['title']}: {row['text']}")
for i, row in news.iterrows()
]
设置 API 密钥和 LLM¶
import os
os.environ["OPENAI_API_KEY"] = "sk-.."
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4")
GraphRAGExtractor¶
GraphRAGExtractor 类旨在从文本中提取三元组 (主语-谓语-宾语),并通过使用 LLM 为实体和关系添加描述来丰富它们。
此功能与 SimpleLLMPathExtractor
类似,但包含额外的增强功能,可处理实体、关系描述。有关实现的指南,您可以查看类似的现有提取器。
以下是其功能的细分:
关键组件
llm:
用于提取的语言模型。extract_prompt:
用于指导 LLM 提取信息的提示模板。parse_fn:
用于将 LLM 输出解析为结构化数据的函数。max_paths_per_chunk:
限制每个文本块提取的三元组数量。num_workers:
用于并行处理多个文本节点。
主要方法
__call__:
处理文本节点列表的入口点。acall:
call 的异步版本,用于提高性能。_aextract:
处理每个独立节点的核心方法。
提取过程
对于每个输入节点 (文本块):
- 它将文本连同提取提示发送给 LLM。
- 解析 LLM 的响应以提取实体、关系以及实体和关系的描述。
- 实体转换为 EntityNode 对象。实体描述存储在元数据中。
- 关系转换为 Relation 对象。关系描述存储在元数据中。
- 这些将添加到节点的元数据中,分别位于 KG_NODES_KEY 和 KG_RELATIONS_KEY 下。
注意:在当前实现中,我们仅使用关系描述。在下一个实现中,我们将在检索阶段利用实体描述。
import asyncio
import nest_asyncio
nest_asyncio.apply()
from typing import Any, List, Callable, Optional, Union, Dict
from IPython.display import Markdown, display
from llama_index.core.async_utils import run_jobs
from llama_index.core.indices.property_graph.utils import (
default_parse_triplets_fn,
)
from llama_index.core.graph_stores.types import (
EntityNode,
KG_NODES_KEY,
KG_RELATIONS_KEY,
Relation,
)
from llama_index.core.llms.llm import LLM
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts.default_prompts import (
DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,
)
from llama_index.core.schema import TransformComponent, BaseNode
from llama_index.core.bridge.pydantic import BaseModel, Field
class GraphRAGExtractor(TransformComponent):
"""Extract triples from a graph.
Uses an LLM and a simple prompt + output parsing to extract paths (i.e. triples) and entity, relation descriptions from text.
Args:
llm (LLM):
The language model to use.
extract_prompt (Union[str, PromptTemplate]):
The prompt to use for extracting triples.
parse_fn (callable):
A function to parse the output of the language model.
num_workers (int):
The number of workers to use for parallel processing.
max_paths_per_chunk (int):
The maximum number of paths to extract per chunk.
"""
llm: LLM
extract_prompt: PromptTemplate
parse_fn: Callable
num_workers: int
max_paths_per_chunk: int
def __init__(
self,
llm: Optional[LLM] = None,
extract_prompt: Optional[Union[str, PromptTemplate]] = None,
parse_fn: Callable = default_parse_triplets_fn,
max_paths_per_chunk: int = 10,
num_workers: int = 4,
) -> None:
"""Init params."""
from llama_index.core import Settings
if isinstance(extract_prompt, str):
extract_prompt = PromptTemplate(extract_prompt)
super().__init__(
llm=llm or Settings.llm,
extract_prompt=extract_prompt or DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,
parse_fn=parse_fn,
num_workers=num_workers,
max_paths_per_chunk=max_paths_per_chunk,
)
@classmethod
def class_name(cls) -> str:
return "GraphExtractor"
def __call__(
self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any
) -> List[BaseNode]:
"""Extract triples from nodes."""
return asyncio.run(
self.acall(nodes, show_progress=show_progress, **kwargs)
)
async def _aextract(self, node: BaseNode) -> BaseNode:
"""Extract triples from a node."""
assert hasattr(node, "text")
text = node.get_content(metadata_mode="llm")
try:
llm_response = await self.llm.apredict(
self.extract_prompt,
text=text,
max_knowledge_triplets=self.max_paths_per_chunk,
)
entities, entities_relationship = self.parse_fn(llm_response)
except ValueError:
entities = []
entities_relationship = []
existing_nodes = node.metadata.pop(KG_NODES_KEY, [])
existing_relations = node.metadata.pop(KG_RELATIONS_KEY, [])
entity_metadata = node.metadata.copy()
for entity, entity_type, description in entities:
entity_metadata["entity_description"] = description
entity_node = EntityNode(
name=entity, label=entity_type, properties=entity_metadata
)
existing_nodes.append(entity_node)
relation_metadata = node.metadata.copy()
for triple in entities_relationship:
subj, obj, rel, description = triple
relation_metadata["relationship_description"] = description
rel_node = Relation(
label=rel,
source_id=subj,
target_id=obj,
properties=relation_metadata,
)
existing_relations.append(rel_node)
node.metadata[KG_NODES_KEY] = existing_nodes
node.metadata[KG_RELATIONS_KEY] = existing_relations
return node
async def acall(
self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any
) -> List[BaseNode]:
"""Extract triples from nodes async."""
jobs = []
for node in nodes:
jobs.append(self._aextract(node))
return await run_jobs(
jobs,
workers=self.num_workers,
show_progress=show_progress,
desc="Extracting paths from text",
)
GraphRAGStore¶
GraphRAGStore
类是 Neo4jPropertyGraphStore
类的扩展,旨在实现 GraphRAG 管道。以下是其关键组件和功能的细分:
该类使用社区检测算法对图中的相关节点进行分组,然后使用 LLM 为每个社区生成摘要。
主要方法
build_communities()
将内部图表示转换为 NetworkX 图。
应用层次化 Leiden 算法进行社区检测。
收集每个社区的详细信息。
为每个社区生成摘要。
generate_community_summary(text)
- 使用 LLM 生成社区关系的摘要。
- 摘要包括实体名称和关系描述的合成。
_create_nx_graph()
- 将内部图表示转换为 NetworkX 图以进行社区检测。
_collect_community_info(nx_graph, clusters)
- 根据社区收集每个节点的详细信息。
- 创建社区内每个关系的字符串表示。
_summarize_communities(community_info)
- 使用 LLM 为每个社区生成并存储摘要。
get_community_summaries()
- 返回社区摘要,如果尚未生成则先构建它们。
import re
import networkx as nx
from graspologic.partition import hierarchical_leiden
from collections import defaultdict
from llama_index.core.llms import ChatMessage
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
class GraphRAGStore(Neo4jPropertyGraphStore):
community_summary = {}
entity_info = None
max_cluster_size = 5
def generate_community_summary(self, text):
"""Generate summary for a given text using an LLM."""
messages = [
ChatMessage(
role="system",
content=(
"You are provided with a set of relationships from a knowledge graph, each represented as "
"entity1->entity2->relation->relationship_description. Your task is to create a summary of these "
"relationships. The summary should include the names of the entities involved and a concise synthesis "
"of the relationship descriptions. The goal is to capture the most critical and relevant details that "
"highlight the nature and significance of each relationship. Ensure that the summary is coherent and "
"integrates the information in a way that emphasizes the key aspects of the relationships."
),
),
ChatMessage(role="user", content=text),
]
response = OpenAI().chat(messages)
clean_response = re.sub(r"^assistant:\s*", "", str(response)).strip()
return clean_response
def build_communities(self):
"""Builds communities from the graph and summarizes them."""
nx_graph = self._create_nx_graph()
community_hierarchical_clusters = hierarchical_leiden(
nx_graph, max_cluster_size=self.max_cluster_size
)
self.entity_info, community_info = self._collect_community_info(
nx_graph, community_hierarchical_clusters
)
self._summarize_communities(community_info)
def _create_nx_graph(self):
"""Converts internal graph representation to NetworkX graph."""
nx_graph = nx.Graph()
triplets = self.get_triplets()
for entity1, relation, entity2 in triplets:
nx_graph.add_node(entity1.name)
nx_graph.add_node(entity2.name)
nx_graph.add_edge(
relation.source_id,
relation.target_id,
relationship=relation.label,
description=relation.properties["relationship_description"],
)
return nx_graph
def _collect_community_info(self, nx_graph, clusters):
"""
Collect information for each node based on their community,
allowing entities to belong to multiple clusters.
"""
entity_info = defaultdict(set)
community_info = defaultdict(list)
for item in clusters:
node = item.node
cluster_id = item.cluster
# Update entity_info
entity_info[node].add(cluster_id)
for neighbor in nx_graph.neighbors(node):
edge_data = nx_graph.get_edge_data(node, neighbor)
if edge_data:
detail = f"{node} -> {neighbor} -> {edge_data['relationship']} -> {edge_data['description']}"
community_info[cluster_id].append(detail)
# Convert sets to lists for easier serialization if needed
entity_info = {k: list(v) for k, v in entity_info.items()}
return dict(entity_info), dict(community_info)
def _summarize_communities(self, community_info):
"""Generate and store summaries for each community."""
for community_id, details in community_info.items():
details_text = (
"\n".join(details) + "."
) # Ensure it ends with a period
self.community_summary[
community_id
] = self.generate_community_summary(details_text)
def get_community_summaries(self):
"""Returns the community summaries, building them if not already done."""
if not self.community_summary:
self.build_communities()
return self.community_summary
GraphRAGQueryEngine¶
GraphRAGQueryEngine 类是一个自定义查询引擎,旨在利用 GraphRAG 方法处理查询。它利用 GraphRAGStore 生成的社区摘要来回答用户查询。以下是其功能的详细说明:
主要组成部分
graph_store:
GraphRAGStore 的一个实例,包含社区摘要。 llm:
用于生成和汇总答案的语言模型(LLM)。
主要方法
custom_query(query_str: str)
- 这是处理查询的主要入口点。它检索社区摘要,从每个摘要生成答案,然后将这些答案汇总成最终响应。
generate_answer_from_summary(community_summary, query)
- 根据单个社区摘要为查询生成答案。使用 LLM 在查询上下文中解释社区摘要。
aggregate_answers(community_answers)
- 将来自不同社区的独立答案组合成一个连贯的最终响应。
- 使用 LLM 将多个视角综合成一个简洁的答案。
查询处理流程
- 从图存储中检索社区摘要。
- 对于每个社区摘要,生成对查询的具体答案。
- 将所有社区特定的答案汇总成一个最终的、连贯的响应。
示例用法
query_engine = GraphRAGQueryEngine(graph_store=graph_store, llm=llm)
response = query_engine.query("query")
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.llms import LLM
from llama_index.core import PropertyGraphIndex
import re
class GraphRAGQueryEngine(CustomQueryEngine):
graph_store: GraphRAGStore
index: PropertyGraphIndex
llm: LLM
similarity_top_k: int = 20
def custom_query(self, query_str: str) -> str:
"""Process all community summaries to generate answers to a specific query."""
entities = self.get_entities(query_str, self.similarity_top_k)
community_ids = self.retrieve_entity_communities(
self.graph_store.entity_info, entities
)
community_summaries = self.graph_store.get_community_summaries()
community_answers = [
self.generate_answer_from_summary(community_summary, query_str)
for id, community_summary in community_summaries.items()
if id in community_ids
]
final_answer = self.aggregate_answers(community_answers)
return final_answer
def get_entities(self, query_str, similarity_top_k):
nodes_retrieved = self.index.as_retriever(
similarity_top_k=similarity_top_k
).retrieve(query_str)
enitites = set()
pattern = (
r"^(\w+(?:\s+\w+)*)\s*->\s*([a-zA-Z\s]+?)\s*->\s*(\w+(?:\s+\w+)*)$"
)
for node in nodes_retrieved:
matches = re.findall(
pattern, node.text, re.MULTILINE | re.IGNORECASE
)
for match in matches:
subject = match[0]
obj = match[2]
enitites.add(subject)
enitites.add(obj)
return list(enitites)
def retrieve_entity_communities(self, entity_info, entities):
"""
Retrieve cluster information for given entities, allowing for multiple clusters per entity.
Args:
entity_info (dict): Dictionary mapping entities to their cluster IDs (list).
entities (list): List of entity names to retrieve information for.
Returns:
List of community or cluster IDs to which an entity belongs.
"""
community_ids = []
for entity in entities:
if entity in entity_info:
community_ids.extend(entity_info[entity])
return list(set(community_ids))
def generate_answer_from_summary(self, community_summary, query):
"""Generate an answer from a community summary based on a given query using LLM."""
prompt = (
f"Given the community summary: {community_summary}, "
f"how would you answer the following query? Query: {query}"
)
messages = [
ChatMessage(role="system", content=prompt),
ChatMessage(
role="user",
content="I need an answer based on the above information.",
),
]
response = self.llm.chat(messages)
cleaned_response = re.sub(r"^assistant:\s*", "", str(response)).strip()
return cleaned_response
def aggregate_answers(self, community_answers):
"""Aggregate individual community answers into a final, coherent response."""
# intermediate_text = " ".join(community_answers)
prompt = "Combine the following intermediate answers into a final, concise response."
messages = [
ChatMessage(role="system", content=prompt),
ChatMessage(
role="user",
content=f"Intermediate answers: {community_answers}",
),
]
final_response = self.llm.chat(messages)
cleaned_final_response = re.sub(
r"^assistant:\s*", "", str(final_response)
).strip()
return cleaned_final_response
构建端到端 GraphRAG 管道¶
现在我们已经定义了所有必需的组件,接下来构建 GraphRAG 管道
- 从文本创建节点/块。
- 使用
GraphRAGExtractor
和GraphRAGStore
构建 PropertyGraphIndex。 - 使用上面构建的图构建社区并为每个社区生成摘要。
- 创建一个
GraphRAGQueryEngine
并开始查询。
从文本创建节点/块。¶
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(
chunk_size=1024,
chunk_overlap=20,
)
nodes = splitter.get_nodes_from_documents(documents)
len(nodes)
50
使用 GraphRAGExtractor
和 GraphRAGStore
构建 PropertyGraphIndex¶
KG_TRIPLET_EXTRACT_TMPL = """
-Goal-
Given a text document, identify all entities and their entity types from the text and all relationships among the identified entities.
Given the text, extract up to {max_knowledge_triplets} entity-relation triplets.
-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: Type of the entity
- entity_description: Comprehensive description of the entity's attributes and activities
2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relation: relationship between source_entity and target_entity
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
3. Output Formatting:
- Return the result in valid JSON format with two keys: 'entities' (list of entity objects) and 'relationships' (list of relationship objects).
- Exclude any text outside the JSON structure (e.g., no explanations or comments).
- If no entities or relationships are identified, return empty lists: { "entities": [], "relationships": [] }.
-An Output Example-
{
"entities": [
{
"entity_name": "Albert Einstein",
"entity_type": "Person",
"entity_description": "Albert Einstein was a theoretical physicist who developed the theory of relativity and made significant contributions to physics."
},
{
"entity_name": "Theory of Relativity",
"entity_type": "Scientific Theory",
"entity_description": "A scientific theory developed by Albert Einstein, describing the laws of physics in relation to observers in different frames of reference."
},
{
"entity_name": "Nobel Prize in Physics",
"entity_type": "Award",
"entity_description": "A prestigious international award in the field of physics, awarded annually by the Royal Swedish Academy of Sciences."
}
],
"relationships": [
{
"source_entity": "Albert Einstein",
"target_entity": "Theory of Relativity",
"relation": "developed",
"relationship_description": "Albert Einstein is the developer of the theory of relativity."
},
{
"source_entity": "Albert Einstein",
"target_entity": "Nobel Prize in Physics",
"relation": "won",
"relationship_description": "Albert Einstein won the Nobel Prize in Physics in 1921."
}
]
}
-Real Data-
######################
text: {text}
######################
output:"""
import json
def parse_fn(response_str: str) -> Any:
json_pattern = r"\{.*\}"
match = re.search(json_pattern, response_str, re.DOTALL)
entities = []
relationships = []
if not match:
return entities, relationships
json_str = match.group(0)
try:
data = json.loads(json_str)
entities = [
(
entity["entity_name"],
entity["entity_type"],
entity["entity_description"],
)
for entity in data.get("entities", [])
]
relationships = [
(
relation["source_entity"],
relation["target_entity"],
relation["relation"],
relation["relationship_description"],
)
for relation in data.get("relationships", [])
]
return entities, relationships
except json.JSONDecodeError as e:
print("Error parsing JSON:", e)
return entities, relationships
kg_extractor = GraphRAGExtractor(
llm=llm,
extract_prompt=KG_TRIPLET_EXTRACT_TMPL,
max_paths_per_chunk=2,
parse_fn=parse_fn,
)
Docker 和 Neo4J 设置¶
要在本地启动 Neo4j,首先确保已安装 docker。然后,可以使用以下 docker 命令启动数据库。
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latest
在此处,您可以访问数据库:http://localhost:7474/。在此页面上,系统会要求您登录。使用默认用户名/密码 neo4j 和 neo4j。
首次登录后,系统会要求您更改密码。
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
# Note: used to be `Neo4jPGStore`
graph_store = GraphRAGStore(
username="neo4j", password="<PASSWORD>", url="bolt://localhost:7687"
)
Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The procedure has a deprecated field. ('config' used by 'apoc.meta.graphSample' is deprecated.)} {position: line: 1, column: 1, offset: 0} for query: "CALL apoc.meta.graphSample() YIELD nodes, relationships RETURN nodes, [rel in relationships | {name:apoc.any.property(rel, 'type'), count: apoc.any.property(rel, 'count')}] AS relationships"
from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex(
nodes=nodes,
kg_extractors=[kg_extractor],
property_graph_store=graph_store,
show_progress=True,
)
Extracting paths from text: 100%|██████████| 50/50 [05:45<00:00, 6.90s/it] Generating embeddings: 100%|██████████| 1/1 [00:02<00:00, 2.59s/it] Generating embeddings: 100%|██████████| 2/2 [00:03<00:00, 1.90s/it] Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The procedure has a deprecated field. ('config' used by 'apoc.meta.graphSample' is deprecated.)} {position: line: 1, column: 1, offset: 0} for query: "CALL apoc.meta.graphSample() YIELD nodes, relationships RETURN nodes, [rel in relationships | {name:apoc.any.property(rel, 'type'), count: apoc.any.property(rel, 'count')}] AS relationships"
index.property_graph_store.get_triplets()[10]
[EntityNode(label='Software', embedding=None, properties={'id': 'Unreal Engine', 'entity_description': "Unreal Engine is a game engine developed by Epic. It is used in conjunction with Epic's MetaHuman Animator tool to animate hyperrealistic MetaHumans.", 'triplet_source_id': 'b6fbbdc0-cc13-4342-a70e-b0d86f3fd2ad'}, name='MetaHuman Animator'), Relation(label='Integrated', source_id='MetaHuman Animator', target_id='Unreal Engine', properties={'relationship_description': 'The MetaHuman Animator tool developed by Epic is integrated with the Unreal Engine. It applies the captured actor’s facial performance to a hyperrealistic “MetaHuman” in the Unreal Engine.', 'triplet_source_id': 'a6f5c123-65a8-4278-8e24-e103e767b82f'}), EntityNode(label='Software', embedding=None, properties={'id': 'MetaHuman Animator', 'entity_description': 'MetaHuman Animator is a tool developed by Epic that captures an actor’s facial performance using a device as simple as an iPhone and applies it to a MetaHuman in the Unreal Engine. It is designed to produce results quickly and efficiently.', 'triplet_source_id': 'b6fbbdc0-cc13-4342-a70e-b0d86f3fd2ad'}, name='Unreal Engine')]
index.property_graph_store.get_triplets()[10][0].properties
{'id': 'Unreal Engine', 'entity_description': "Unreal Engine is a game engine developed by Epic. It is used in conjunction with Epic's MetaHuman Animator tool to animate hyperrealistic MetaHumans.", 'triplet_source_id': 'b6fbbdc0-cc13-4342-a70e-b0d86f3fd2ad'}
index.property_graph_store.get_triplets()[10][1].properties
{'relationship_description': 'The MetaHuman Animator tool developed by Epic is integrated with the Unreal Engine. It applies the captured actor’s facial performance to a hyperrealistic “MetaHuman” in the Unreal Engine.', 'triplet_source_id': 'a6f5c123-65a8-4278-8e24-e103e767b82f'}
构建社区¶
这将为每个社区创建社区和摘要。
index.property_graph_store.build_communities()
创建 QueryEngine¶
query_engine = GraphRAGQueryEngine(
graph_store=index.property_graph_store,
llm=llm,
index=index,
similarity_top_k=10,
)
查询¶
response = query_engine.query(
"What are the main news discussed in the document?"
)
display(Markdown(f"{response.response}"))
文档讨论了几项重要的商业新闻:FirstEnergy 的盈利业绩,Tram Nguyen 被任命为美国银行战略和可持续投资全球主管,摩根士丹利聘请 Thomas Christl 与 Imran Ansari 共同领导其欧洲消费和零售客户业务,以及 COVID-19 大流行对达美航空和西南航空的重大影响,包括暂停和恢复其股息支付。
response = query_engine.query("What are the main news in energy sector?")
display(Markdown(f"{response.response}"))
能源领域的主要新闻是 GE Vernova 和 Amplus Solar 已建立供应商-客户关系。Amplus Solar 选择 GE Vernova 为一个 108 兆瓦的风电项目提供并安装 40 台 2.7-132 型陆上风力涡轮机。这意味着 GE Vernova 将为项目的成功执行提供必要的设备和服务。