使用预定义模式构建属性图¶

在本笔记本中，我们将介绍如何使用 Neo4j、Ollama 和 Huggingface 构建属性图。

具体来说，我们将使用 SchemaLLMPathExtractor，它允许我们指定一个精确的模式，其中包含可能的实体类型、关系类型，并定义它们如何相互连接。

当您想构建一个特定的图，并希望限制 LLM 的预测内容时，这非常有用。

In [ ]

已复制！





%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
# Optional
%pip install llama-index-graph-stores-neo4j
%pip install llama-index-graph-stores-nebula
%pip install llama-index %pip install llama-index-llms-ollama %pip install llama-index-embeddings-huggingface # Optional %pip install llama-index-graph-stores-neo4j %pip install llama-index-graph-stores-nebula

加载数据¶

首先，让我们下载一些样本数据来试用。

In [ ]

已复制！

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-06-26 11:12:16--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’

data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.007s  

2024-06-26 11:12:16 (10.4 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]

In [ ]

已复制！

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

图构建¶

为了构建我们的图，我们将利用 SchemaLLMPathExtractor 来完成。

给定图的某些模式，我们可以提取遵循此模式的实体和关系，而不是让 LLM 随意决定实体和关系。

In [ ]

已复制！

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

In [ ]

已复制！





from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# best practice to use upper-case
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

# define which entities can have which relations
validation_schema = {
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict=True,
)
from typing import Literal from llama_index.llms.ollama import Ollama from llama_index.core.indices.property_graph import SchemaLLMPathExtractor # best practice to use upper-case entities = Literal["PERSON", "PLACE", "ORGANIZATION"] relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"] # define which entities can have which relations validation_schema = { "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"], "PLACE": ["HAS", "PART_OF", "WORKED_AT"], "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"], } kg_extractor = SchemaLLMPathExtractor( llm=Ollama(model="llama3", json_mode=True, request_timeout=3600), possible_entities=entities, possible_relations=relations, kg_validation_schema=validation_schema, # if false, allows for values outside of the schema # useful for using the schema as a suggestion strict=True, )

现在，您可以使用 SimplePropertyGraph、Neo4j 或 NebulaGraph 来存储图。

选项 1. Neo4j

要在本地启动 Neo4j，首先确保您已安装 Docker。然后，您可以使用以下 Docker 命令启动数据库：

docker run \
    -p 7474:7474 -p 7687:7687 \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    --name neo4j-apoc \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
    neo4j:latest

从这里，您可以在 https://:7474/ 打开数据库。在此页面上，系统将要求您登录。使用默认的用户名/密码 neo4j 和 neo4j。

首次登录后，系统会要求您更改密码。

完成后，您就可以创建您的第一个属性图了！

In [ ]

已复制！





from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="<password>",
    url="bolt://:7687",
)
vec_store = None
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore graph_store = Neo4jPropertyGraphStore( username="neo4j", password="", url="bolt://:7687", ) vec_store = None

选项 2. NebulaGraph

要在本地启动 NebulaGraph，首先确保您已安装 Docker。然后，您可以使用以下 Docker 命令启动数据库。

mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up

完成后，您就可以创建您的第一个属性图了！

有关部署 NebulaGraph 的其他选项/详细信息，请参阅文档

Google Colab 中的 ad-hoc 集群.

Docker Desktop 扩展.

In [ ]

已复制！

from llama_index.graph_stores.nebula import NebulaPropertyGraphStore
from llama_index.core.vector_stores.simple import SimpleVectorStore

graph_store = NebulaPropertyGraphStore(
    space="llamaindex_nebula_property_graph", overwrite=True
)
vec_store = SimpleVectorStore()
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore from llama_index.core.vector_stores.simple import SimpleVectorStore graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph", overwrite=True ) vec_store = SimpleVectorStore()

如果您想使用 NebulaGraph Jupyter 扩展探索图，请运行以下命令。或者直接跳过这些步骤。

In [ ]

已复制！

%pip install jupyter-nebulagraph
%pip install jupyter-nebulagraph

In [ ]

已复制！

# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
# load NebulaGraph Jupyter extension to enable %ngql magic %load_ext ngql # connect to NebulaGraph service %ngql --address 127.0.0.1 --port 9669 --user root --password nebula %ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));

In [ ]

已复制！

# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;
# use the graph space, which is similar to "use database" in MySQL # The space was created in async way, so we need to wait for a while before using it, retry it if failed %ngql USE llamaindex_nebula_property_graph;

开始构建！

注意：与基于 API 的模型相比，使用本地模型进行提取会更慢。本地模型（如 Ollama）通常仅限于顺序处理。预计在 M2 Max 上需要大约 10 分钟。

In [ ]

已复制！





from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    property_graph_store=graph_store,
    vector_store=vec_store,
    show_progress=True,
)
from llama_index.core import PropertyGraphIndex from llama_index.embeddings.huggingface import HuggingFaceEmbedding index = PropertyGraphIndex.from_documents( documents, kg_extractors=[kg_extractor], embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"), property_graph_store=graph_store, vector_store=vec_store, show_progress=True, )

如果我们检查创建的图，可以看到它只包含了我们定义的关系和实体类型！

In [ ]

已复制！

# If using NebulaGraph Jupyter extension
%ngql MATCH p=()-[]->() RETURN p LIMIT 20;
# If using NebulaGraph Jupyter extension %ngql MATCH p=()-[]->() RETURN p LIMIT 20;

In [ ]

已复制！

%ng_draw
%ng_draw

或 Neo4j

local graph

有关所有 kg_extractors 的信息，请参阅文档。

查询¶

现在我们的图已经创建好了，我们可以查询它了。

如本笔记本的主题所示，我们将使用更底层的 API 并自行构建所有检索器！

In [ ]

已复制！





from llama_index.core.indices.property_graph import (
    LLMSynonymRetriever,
    VectorContextRetriever,
)


llm_synonym = LLMSynonymRetriever(
    index.property_graph_store,
    llm=Ollama(model="llama3", request_timeout=3600),
    include_text=False,
)
vector_context = VectorContextRetriever(
    index.property_graph_store,
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    include_text=False,
)
from llama_index.core.indices.property_graph import ( LLMSynonymRetriever, VectorContextRetriever, ) llm_synonym = LLMSynonymRetriever( index.property_graph_store, llm=Ollama(model="llama3", request_timeout=3600), include_text=False, ) vector_context = VectorContextRetriever( index.property_graph_store, embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"), include_text=False, )

In [ ]

已复制！





retriever = index.as_retriever(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ]
)
retriever = index.as_retriever( sub_retrievers=[ llm_synonym, vector_context, ] )

In [ ]

已复制！

nodes = retriever.retrieve("What happened at Interleaf?")

for node in nodes:
    print(node.text)
nodes = retriever.retrieve("What happened at Interleaf?") for node in nodes: print(node.text)

Interleaf -> HAS -> Paul Graham
Interleaf -> HAS -> Emacs
Interleaf -> HAS -> Release Engineering
Interleaf -> HAS -> Viaweb
Interleaf -> HAS -> Y Combinator
Interleaf -> HAS -> impressive technology
Interleaf -> HAS -> smart people

我们也可以使用类似的语法创建一个查询引擎。

In [ ]

已复制！





query_engine = index.as_query_engine(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ],
    llm=Ollama(model="llama3", request_timeout=3600),
)

response = query_engine.query("What happened at Interleaf?")

print(str(response))
query_engine = index.as_query_engine( sub_retrievers=[ llm_synonym, vector_context, ], llm=Ollama(model="llama3", request_timeout=3600), ) response = query_engine.query("What happened at Interleaf?") print(str(response))

Paul Graham worked there, as well as other smart people. Emacs was also present.

有关所有检索器的更多信息，请参阅完整指南。