知识图谱索引¶
本教程提供了关于如何使用我们的 KnowledgeGraphIndex
的基本概述,它处理非结构化文本的自动化知识图谱构建以及基于实体的查询。
如果您想以更灵活的方式查询知识图谱,包括预先存在的图谱,请查看我们的 KnowledgeGraphQueryEngine
和其他构造。
In [ ]
已复制!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]
已复制!
# My OpenAI Key
import os
os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
# 我的 OpenAI 密钥 import os os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
In [ ]
已复制!
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO)
使用知识图谱¶
构建知识图谱¶
In [ ]
已复制!
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex from llama_index.core.graph_stores import SimpleGraphStore from llama_index.llms.openai import OpenAI from llama_index.core import Settings from IPython.display import Markdown, display
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
In [ ]
已复制!
documents = SimpleDirectoryReader(
"../../../../examples/paul_graham_essay/data"
).load_data()
documents = SimpleDirectoryReader( "../../../../examples/paul_graham_essay/data" ).load_data()
In [ ]
已复制!
# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
llm = OpenAI(temperature=0, model="text-davinci-002")
Settings.llm = llm
Settings.chunk_size = 512
# 定义 LLM # 注意:在演示时,text-davinci-002 没有速率限制错误 llm = OpenAI(temperature=0, model="text-davinci-002") Settings.llm = llm Settings.chunk_size = 512
In [ ]
已复制!
from llama_index.core import StorageContext
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
storage_context=storage_context,
)
from llama_index.core import StorageContext graph_store = SimpleGraphStore() storage_context = StorageContext.from_defaults(graph_store=graph_store) # 注意:可能需要一些时间! index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context, )
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
[可选] 尝试构建图并手动添加三元组!¶
查询知识图谱¶
In [ ]
已复制!
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
query_engine = index.as_query_engine( include_text=False, response_mode="tree_summarize" ) response = query_engine.query( "Tell me more about Interleaf", )
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['Interleaf', 'company', 'software', 'history'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]
已复制!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
Interleaf 是一家软件公司,开发和发布文档准备和桌面排版软件。它成立于 1986 年,总部位于马萨诸塞州沃尔瑟姆。该公司于 2000 年被 Quark, Inc. 收购。
In [ ]
已复制!
query_engine = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf",
)
query_engine = index.as_query_engine( include_text=True, response_mode="tree_summarize" ) response = query_engine.query( "Tell me more about what the author worked on at Interleaf", )
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['author', 'Interleaf', 'work'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]
已复制!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
作者在 Interleaf 工作了许多项目,包括开发公司的旗舰产品 Interleaf Publisher。
使用嵌入进行查询¶
In [ ]
已复制!
# NOTE: can take a while!
new_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
include_embeddings=True,
)
# 注意:可能需要一些时间! new_index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, include_embeddings=True, )
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
In [ ]
已复制!
# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf",
)
# 使用前 3 个三元组加上关键词进行查询(重复的三元组将被移除) query_engine = index.as_query_engine( include_text=True, response_mode="tree_summarize", embedding_mode="hybrid", similarity_top_k=5, ) response = query_engine.query( "Tell me more about what the author worked on at Interleaf", )
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about what the author worked on at Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['author', 'Interleaf', 'work'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 104 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]
已复制!
display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))
作者在 Interleaf 工作了许多项目,包括开发公司的旗舰产品 Interleaf Publisher。
可视化图谱¶
In [ ]
已复制!
## create graph
from pyvis.network import Network
g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("example.html")
## 创建图谱 from pyvis.network import Network g = index.get_networkx_graph() net = Network(notebook=True, cdn_resources="in_line", directed=True) net.from_nx(g) net.show("example.html")
example.html
Out[ ]
[可选] 尝试构建图并手动添加三元组!¶
In [ ]
已复制!
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import SentenceSplitter
In [ ]
已复制!
node_parser = SentenceSplitter()
node_parser = SentenceSplitter()
In [ ]
已复制!
nodes = node_parser.get_nodes_from_documents(documents)
nodes = node_parser.get_nodes_from_documents(documents)
In [ ]
已复制!
# initialize an empty index for now
index = KnowledgeGraphIndex(
[],
)
# 当前初始化一个空索引 index = KnowledgeGraphIndex( [], )
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
In [ ]
已复制!
# add keyword mappings and nodes manually
# add triplets (subject, relationship, object)
# for node 0
node_0_tups = [
("author", "worked on", "writing"),
("author", "worked on", "programming"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
# for node 1
node_1_tups = [
("Interleaf", "made software for", "creating documents"),
("Interleaf", "added", "scripting language"),
("software", "generate", "web sites"),
]
for tup in node_1_tups:
index.upsert_triplet_and_node(tup, nodes[1])
# 手动添加关键词映射和节点 # 添加三元组(主语、关系、宾语) # 对于节点 0 node_0_tups = [ ("author", "worked on", "writing"), ("author", "worked on", "programming"), ] for tup in node_0_tups: index.upsert_triplet_and_node(tup, nodes[0]) # 对于节点 1 node_1_tups = [ ("Interleaf", "made software for", "creating documents"), ("Interleaf", "added", "scripting language"), ("software", "generate", "web sites"), ] for tup in node_1_tups: index.upsert_triplet_and_node(tup, nodes[1])
In [ ]
已复制!
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
query_engine = index.as_query_engine( include_text=False, response_mode="tree_summarize" ) response = query_engine.query( "Tell me more about Interleaf", )
INFO:llama_index.indices.knowledge_graph.retrievers:> Starting query: Tell me more about Interleaf INFO:llama_index.indices.knowledge_graph.retrievers:> Query keywords: ['Interleaf', 'company', 'software', 'history'] ERROR:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage... INFO:llama_index.indices.knowledge_graph.retrievers:> Extracted relationships: The following are knowledge triplets in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]` INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 116 tokens INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
In [ ]
已复制!
str(response)
str(response)
Out[ ]
'\nInterleaf was a software company that developed and published document preparation and desktop publishing software. It was founded in 1986 and was headquartered in Waltham, Massachusetts. The company was acquired by Quark, Inc. in 2000.'