Neo4j 属性图索引¶
Neo4j 是一个生产级图数据库,能够存储属性图、执行向量搜索、过滤等操作。
最简单的入门方法是使用 Neo4j Aura 的云托管实例
在本笔记本中,我们将介绍如何使用 docker 在本地运行数据库。
如果您已经有一个现有图,请跳到本笔记本的末尾。
In [ ]
已复制!
%pip install llama-index llama-index-graph-stores-neo4j
%pip install llama-index llama-index-graph-stores-neo4j
Docker 设置¶
要在本地启动 Neo4j,首先请确保已安装 docker。然后,可以使用以下 docker 命令启动数据库
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latest
从这里,您可以在 http://localhost:7474/ 打开数据库。在此页面上,系统会要求您登录。使用默认用户名/密码 neo4j
和 neo4j
。
首次登录后,系统会要求您更改密码。
之后,您就可以创建您的第一个属性图了!
环境设置¶
我们需要一些环境设置才能开始。
In [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os os.environ["OPENAI_API_KEY"] = "sk-proj-..."
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]
已复制!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()
In [ ]
已复制!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-caVs7DDe-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
索引构建¶
In [ ]
已复制!
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
# Note: used to be `Neo4jPGStore`
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="llamaindex",
url="bolt://localhost:7687",
)
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore # 注意:以前是 `Neo4jPGStore` graph_store = Neo4jPropertyGraphStore( username="neo4j", password="llamaindex", url="bolt://localhost:7687", )
In [ ]
已复制!
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
from llama_index.core import PropertyGraphIndex from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.openai import OpenAI from llama_index.core.indices.property_graph import SchemaLLMPathExtractor index = PropertyGraphIndex.from_documents( documents, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), kg_extractors=[ SchemaLLMPathExtractor( llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0) ) ], property_graph_store=graph_store, show_progress=True, )
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 21.63it/s] Extracting paths from text with schema: 100%|██████████| 22/22 [01:06<00:00, 3.02s/it] Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 1.06it/s] Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 1.89it/s]
图创建完成后,我们可以访问 http://localhost:7474/ 在 UI 中探索它。
查看整个图的最简单方法是在顶部使用 cypher 命令,例如 "match n=() return n"
。
要删除整个图,一个有用的命令是 "match n=() detach delete n"
。
查询和检索¶
In [ ]
已复制!
retriever = index.as_retriever(
include_text=False, # include source text in returned nodes, default True
)
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
for node in nodes:
print(node.text)
retriever = index.as_retriever( include_text=False, # 在返回的节点中包含源文本,默认为 True ) nodes = retriever.retrieve("What happened at Interleaf and Viaweb?") for node in nodes: print(node.text)
Interleaf -> Got crushed by -> Moore's law Interleaf -> Made -> Scripting language Interleaf -> Had -> Smart people Interleaf -> Inspired by -> Emacs Interleaf -> Had -> Few years to live Interleaf -> Made -> Software Interleaf -> Had done -> Something bold Interleaf -> Added -> Scripting language Interleaf -> Built -> Impressive technology Interleaf -> Was -> Company Viaweb -> Was -> Profitable Viaweb -> Was -> Growing rapidly Viaweb -> Suggested -> Hospital Idea -> Was clear from -> Experience Idea -> Would have to be embodied as -> Company Painting department -> Seemed to be -> Rigorous
In [ ]
已复制!
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
query_engine = index.as_query_engine(include_text=True) response = query_engine.query("What happened at Interleaf and Viaweb?") print(str(response))
Interleaf had smart people and built impressive technology but got crushed by Moore's Law. Viaweb was profitable and growing rapidly.
从现有图加载¶
如果您有一个现有图(无论是否使用 LlamaIndex 创建),我们可以连接并使用它!
注意:如果您的图是在 LlamaIndex 外部创建的,最实用的检索器将是文本到 cypher 或 cypher 模板。其他检索器依赖于 LlamaIndex 插入的属性。
In [ ]
已复制!
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="794613852",
url="bolt://localhost:7687",
)
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore from llama_index.core import PropertyGraphIndex from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.openai import OpenAI graph_store = Neo4jPropertyGraphStore( username="neo4j", password="794613852", url="bolt://localhost:7687", ) index = PropertyGraphIndex.from_existing( property_graph_store=graph_store, llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3), embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"), )
从这里,我们仍然可以插入更多文档!
In [ ]
已复制!
from llama_index.core import Document
document = Document(text="LlamaIndex is great!")
index.insert(document)
from llama_index.core import Document document = Document(text="LlamaIndex 很棒!") index.insert(document)
In [ ]
已复制!
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")
print(nodes[0].text)
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex") print(nodes[0].text)
Llamaindex -> Is -> Great
有关属性图的构建、检索和查询的完整详细信息,请参阅完整文档页面。