属性图索引¶

在本 Notebook 中，我们将演示 LlamaIndex 中 PropertyGraphIndex 的一些基本用法。

此处的属性图索引将获取非结构化文档，从中提取属性图，并提供各种方法来查询该图。

输入 [ ]

已复制！

%pip install llama-index
%pip install llama-index

设置¶

输入 [ ]

已复制！

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os os.environ["OPENAI_API_KEY"] = "sk-proj-..."

输入 [ ]

已复制！

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

输入 [ ]

已复制！

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

输入 [ ]

已复制！

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

构建¶

输入 [ ]

已复制！





from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    show_progress=True,
)
from llama_index.core import PropertyGraphIndex from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.openai import OpenAI index = PropertyGraphIndex.from_documents( documents, llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3), embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), show_progress=True, )

/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-bXUwlEfH-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 25.46it/s]
Extracting paths from text: 100%|██████████| 22/22 [00:12<00:00,  1.72it/s]
Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 36186.15it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.14it/s]
Generating embeddings: 100%|██████████| 5/5 [00:00<00:00,  5.43it/s]

让我们回顾一下刚刚发生了什么

PropertyGraphIndex.from_documents() - 我们将文档加载到索引中
Parsing nodes - 索引将文档解析为节点
Extracting paths from text - 节点被传递给 LLM，LLM 被提示生成知识图三元组（即路径）
Extracting implicit paths - 每个 node.relationships 属性都被用来推断隐式路径
Generating embeddings - 为每个文本节点和图节点生成嵌入（因此此步骤会发生两次）

让我们探索一下我们创建的内容！为了调试目的，默认的 SimplePropertyGraphStore 包含一个辅助函数，可以将图的 networkx 表示形式保存到 html 文件中。

输入 [ ]

已复制！

index.property_graph_store.save_networkx_graph(name="./kg.html")
index.property_graph_store.save_networkx_graph(name="./kg.html")

在浏览器中打开 html 文件，我们就可以看到我们的图了！

如果放大，每个连接数较多的“密集”节点实际上是源块，提取的实体和关系从此分支出来。

example graph

定制低级构建¶

如果需要，我们可以使用低级 API 进行相同的摄取，利用 kg_extractors。

输入 [ ]

已复制！





from llama_index.core.indices.property_graph import (
    ImplicitPathExtractor,
    SimpleLLMPathExtractor,
)

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    kg_extractors=[
        ImplicitPathExtractor(),
        SimpleLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
            num_workers=4,
            max_paths_per_chunk=10,
        ),
    ],
    show_progress=True,
)
from llama_index.core.indices.property_graph import ( ImplicitPathExtractor, SimpleLLMPathExtractor, ) index = PropertyGraphIndex.from_documents( documents, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), kg_extractors=[ ImplicitPathExtractor(), SimpleLLMPathExtractor( llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3), num_workers=4, max_paths_per_chunk=10, ), ], show_progress=True, )

有关所有提取器的完整指南，请参阅详细用法页面。

查询¶

查询属性图索引通常包括使用一个或多个子检索器并组合结果。

图检索可以被认为是

选择节点
从这些节点遍历

默认情况下，同时使用两种类型的检索

同义词/关键词扩展 - 使用 LLM 从查询生成同义词和关键词
向量检索 - 使用嵌入在图中查找节点

找到节点后，您可以选择：

返回与所选节点相邻的路径（即三元组）
返回路径 + 块的原始源文本（如果可用）

输入 [ ]

已复制！

retriever = index.as_retriever(
    include_text=False,  # include source text, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)
retriever = index.as_retriever( include_text=False, # include source text, default True ) nodes = retriever.retrieve("What happened at Interleaf and Viaweb?") for node in nodes: print(node.text)

Interleaf -> Was -> On the way down
Viaweb -> Had -> Code editor
Interleaf -> Built -> Impressive technology
Interleaf -> Added -> Scripting language
Interleaf -> Made -> Scripting language
Viaweb -> Suggested -> Take to hospital
Interleaf -> Had done -> Something bold
Viaweb -> Called -> After
Interleaf -> Made -> Dialect of lisp
Interleaf -> Got crushed by -> Moore's law
Dan giffin -> Worked for -> Viaweb
Interleaf -> Had -> Smart people
Interleaf -> Had -> Few years to live
Interleaf -> Made -> Software
Interleaf -> Made -> Software for creating documents
Paul graham -> Started -> Viaweb
Scripting language -> Was -> Dialect of lisp
Scripting language -> Is -> Dialect of lisp
Software -> Will be affected by -> Rapid change
Code editor -> Was -> In viaweb
Software -> Worked via -> Web
Programs -> Typed on -> Punch cards
Computers -> Skipped -> Step
Idea -> Was clear from -> Experience
Apartment -> Wasn't -> Rent-controlled

输入 [ ]

已复制！

query_engine = index.as_query_engine(
    include_text=True,
)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))
query_engine = index.as_query_engine( include_text=True, ) response = query_engine.query("What happened at Interleaf and Viaweb?") print(str(response))

Interleaf had smart people and built impressive technology, including adding a scripting language that was a dialect of Lisp. However, despite their efforts, they were eventually impacted by Moore's Law and faced challenges. Viaweb, on the other hand, was started by Paul Graham and had a code editor where users could define their own page styles using Lisp expressions. Viaweb also suggested taking someone to the hospital and called something "After."

有关定制检索和查询的完整详细信息，请参阅文档页面。

存储¶

默认情况下，存储使用我们简单的内存抽象——用于嵌入的 SimpleVectorStore 和用于属性图的 SimplePropertyGraphStore。

我们可以将它们保存到磁盘或从磁盘加载。

输入 [ ]

已复制！

index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage

index = load_index_from_storage(
    StorageContext.from_defaults(persist_dir="./storage")
)
index.storage_context.persist(persist_dir="./storage") from llama_index.core import StorageContext, load_index_from_storage index = load_index_from_storage( StorageContext.from_defaults(persist_dir="./storage") )

向量存储¶

虽然一些图数据库支持向量（例如 Neo4j），但在不支持向量或您想要覆盖的情况下，您仍然可以指定在图上使用的向量存储。

下面我们将把 ChromaVectorStore 与默认的 SimplePropertyGraphStore 结合使用。

输入 [ ]

已复制！

%pip install llama-index-vector-stores-chroma
%pip install llama-index-vector-stores-chroma

输入 [ ]

已复制！





from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

client = chromadb.PersistentClient("./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    graph_store=SimplePropertyGraphStore(),
    vector_store=ChromaVectorStore(collection=collection),
    show_progress=True,
)

index.storage_context.persist(persist_dir="./storage")
from llama_index.core.graph_stores import SimplePropertyGraphStore from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb client = chromadb.PersistentClient("./chroma_db") collection = client.get_or_create_collection("my_graph_vector_db") index = PropertyGraphIndex.from_documents( documents, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), graph_store=SimplePropertyGraphStore(), vector_store=ChromaVectorStore(collection=collection), show_progress=True, ) index.storage_context.persist(persist_dir="./storage")

然后加载

输入 [ ]

已复制！

index = PropertyGraphIndex.from_existing(
    SimplePropertyGraphStore.from_persist_dir("./storage"),
    vector_store=ChromaVectorStore(chroma_collection=collection),
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
)
index = PropertyGraphIndex.from_existing( SimplePropertyGraphStore.from_persist_dir("./storage"), vector_store=ChromaVectorStore(chroma_collection=collection), llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3), )

这与纯粹使用存储上下文略有不同，但现在我们将它们混合在一起后，语法更简洁。