使用预定义模式构建属性图¶
在本笔记本中,我们将介绍如何使用 Neo4j、Ollama 和 Huggingface 构建属性图。
具体来说,我们将使用 SchemaLLMPathExtractor
,它允许我们指定一个精确的模式,其中包含可能的实体类型、关系类型,并定义它们如何相互连接。
当您想构建一个特定的图,并希望限制 LLM 的预测内容时,这非常有用。
%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
# Optional
%pip install llama-index-graph-stores-neo4j
%pip install llama-index-graph-stores-nebula
加载数据¶
首先,让我们下载一些样本数据来试用。
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-06-26 11:12:16-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.007s 2024-06-26 11:12:16 (10.4 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
import nest_asyncio
nest_asyncio.apply()
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
# best practice to use upper-case
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]
# define which entities can have which relations
validation_schema = {
"PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
"PLACE": ["HAS", "PART_OF", "WORKED_AT"],
"ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}
kg_extractor = SchemaLLMPathExtractor(
llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
# if false, allows for values outside of the schema
# useful for using the schema as a suggestion
strict=True,
)
现在,您可以使用 SimplePropertyGraph、Neo4j 或 NebulaGraph 来存储图。
选项 1. Neo4j
要在本地启动 Neo4j,首先确保您已安装 Docker。然后,您可以使用以下 Docker 命令启动数据库:
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latest
从这里,您可以在 http://localhost:7474/ 打开数据库。在此页面上,系统将要求您登录。使用默认的用户名/密码 neo4j
和 neo4j
。
首次登录后,系统会要求您更改密码。
完成后,您就可以创建您的第一个属性图了!
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="<password>",
url="bolt://localhost:7687",
)
vec_store = None
选项 2. NebulaGraph
要在本地启动 NebulaGraph,首先确保您已安装 Docker。然后,您可以使用以下 Docker 命令启动数据库。
mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up
完成后,您就可以创建您的第一个属性图了!
有关部署 NebulaGraph 的其他选项/详细信息,请参阅文档
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore
from llama_index.core.vector_stores.simple import SimpleVectorStore
graph_store = NebulaPropertyGraphStore(
space="llamaindex_nebula_property_graph", overwrite=True
)
vec_store = SimpleVectorStore()
如果您想使用 NebulaGraph Jupyter 扩展探索图,请运行以下命令。或者直接跳过这些步骤。
%pip install jupyter-nebulagraph
# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;
开始构建!
注意:与基于 API 的模型相比,使用本地模型进行提取会更慢。本地模型(如 Ollama)通常仅限于顺序处理。预计在 M2 Max 上需要大约 10 分钟。
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
index = PropertyGraphIndex.from_documents(
documents,
kg_extractors=[kg_extractor],
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
property_graph_store=graph_store,
vector_store=vec_store,
show_progress=True,
)
如果我们检查创建的图,可以看到它只包含了我们定义的关系和实体类型!
# If using NebulaGraph Jupyter extension
%ngql MATCH p=()-[]->() RETURN p LIMIT 20;
%ng_draw
或 Neo4j
有关所有 kg_extractors
的信息,请参阅文档。
from llama_index.core.indices.property_graph import (
LLMSynonymRetriever,
VectorContextRetriever,
)
llm_synonym = LLMSynonymRetriever(
index.property_graph_store,
llm=Ollama(model="llama3", request_timeout=3600),
include_text=False,
)
vector_context = VectorContextRetriever(
index.property_graph_store,
embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
include_text=False,
)
retriever = index.as_retriever(
sub_retrievers=[
llm_synonym,
vector_context,
]
)
nodes = retriever.retrieve("What happened at Interleaf?")
for node in nodes:
print(node.text)
Interleaf -> HAS -> Paul Graham Interleaf -> HAS -> Emacs Interleaf -> HAS -> Release Engineering Interleaf -> HAS -> Viaweb Interleaf -> HAS -> Y Combinator Interleaf -> HAS -> impressive technology Interleaf -> HAS -> smart people
我们也可以使用类似的语法创建一个查询引擎。
query_engine = index.as_query_engine(
sub_retrievers=[
llm_synonym,
vector_context,
],
llm=Ollama(model="llama3", request_timeout=3600),
)
response = query_engine.query("What happened at Interleaf?")
print(str(response))
Paul Graham worked there, as well as other smart people. Emacs was also present.
有关所有检索器的更多信息,请参阅完整指南。