NebulaGraph 属性图索引¶

NebulaGraph 是一款开源分布式图数据库，专为超大规模图数据设计，提供毫秒级延迟。

如果您已有现有图，请跳至本 Notebook 的末尾。

In [ ]

已复制！

%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph

Docker 设置¶

要在本地启动 NebulaGraph，首先请确保您已安装 Docker。然后，您可以使用以下 Docker 命令启动数据库。

mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up

之后，您就可以创建您的第一个属性图了！

有关部署 NebulaGraph 的其他选项/详情，请参阅文档

Google Colab 中的临时集群.

Docker Desktop 扩展.

In [ ]

已复制！





# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
# create a graph space(think of a Database Instance) named: llamaindex_nebula_property_graph
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
# 加载 NebulaGraph Jupyter 扩展以启用 %ngql magic %load_ext ngql # 连接到 NebulaGraph 服务 %ngql --address 127.0.0.1 --port 9669 --user root --password nebula # 创建一个名为 llamaindex_nebula_property_graph 的图空间（可看作数据库实例）%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));

Connection Pool Created

Out[ ]

In [ ]

已复制！

# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;
# 使用图空间，类似于 MySQL 中的 "use database"# 空间以异步方式创建，因此需要等待一段时间才能使用，如果失败请重试 %ngql USE llamaindex_nebula_property_graph;

Out[ ]

环境设置¶

我们需要做一些环境设置才能开始。

In [ ]

已复制！

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os os.environ["OPENAI_API_KEY"] = "sk-proj-..."

In [ ]

已复制！

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]

已复制！

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

In [ ]

已复制！

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

我们选择使用 gpt-4o 和本地嵌入模型 intfloat/multilingual-e5-large。您可以通过编辑以下行来更改为您喜欢的模型

In [ ]

已复制！

%pip install llama-index-embeddings-huggingface
%pip install llama-index-embeddings-huggingface

In [ ]

已复制！





from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
Settings.embed_model = HuggingFaceEmbedding(
    model_name="intfloat/multilingual-e5-large"
)
# Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
from llama_index.core import Settings from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.embeddings.huggingface import HuggingFaceEmbedding Settings.llm = OpenAI(model="gpt-4o", temperature=0.3) Settings.embed_model = HuggingFaceEmbedding( model_name="intfloat/multilingual-e5-large" ) # Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

索引构建¶

准备属性图存储

In [ ]

已复制！

from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(
    space="llamaindex_nebula_property_graph", overwrite=True
)
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph", overwrite=True )

以及向量存储

In [ ]

已复制！

from llama_index.core.vector_stores.simple import SimpleVectorStore

vec_store = SimpleVectorStore()
from llama_index.core.vector_stores.simple import SimpleVectorStore vec_store = SimpleVectorStore()

最后，构建索引！

In [ ]

已复制！





from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    vector_store=vec_store,
    show_progress=True,
)

index.storage_context.vector_store.persist("./data/nebula_vec_store.json")
from llama_index.core.indices.property_graph import PropertyGraphIndex from llama_index.core.storage.storage_context import StorageContext from llama_index.llms.openai import OpenAI index = PropertyGraphIndex.from_documents( documents, property_graph_store=graph_store, vector_store=vec_store, show_progress=True, ) index.storage_context.vector_store.persist("./data/nebula_vec_store.json")

/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-caVs7DDe-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 20.96it/s]
Extracting paths from text: 100%|██████████| 22/22 [00:19<00:00,  1.15it/s]
Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 25253.06it/s]
Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.06s/it]
Generating embeddings: 100%|██████████| 5/5 [00:02<00:00,  2.50it/s]

现在图已创建完成，我们可以使用 jupyter-nebulagraph 来探索它

In [ ]

已复制！

%ngql SHOW TAGS
%ngql SHOW TAGS

Out[ ]

	名称
0	Chunk__
1	Entity__
2	Node__
3	Props__

In [ ]

已复制！

%ngql SHOW EDGES
%ngql SHOW EDGES

Out[ ]

	名称
0	Relation__
1	__meta__node_label__
2	__meta__rel_label__

In [ ]

已复制！

%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;

Out[ ]

	src	关系	dest
0	We	Charged	小型商店每月 100 美元
1	We	Charged	大型商店每月 300 美元
2	We	开始工作	构建软件
3	We	Started	公司
4	We	开始	投资公司
5	We	开业	1996 年 1 月
6	We	Had	一个可行的
7	We	决定尝试制作	商店生成器的版本
8	增长率	Takes care of	绝对数字
9	股票	上涨	5 倍
10	Jessica Livingston	负责	波士顿投资银行的营销
11	语言	将是	Lisp 方言
12	语言	使用	Fortran 的早期版本
13	Arc	编译成	Scheme
14	Deal	成为	Y Combinator 的模型

In [ ]

已复制！

%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 2;
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 2;

Out[ ]

p

0

("We" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "4145ba08-a096-4ac1-8f7c-f40642c857cc"} :Node__{label: "entity"} :Entity__{name: "We"})-[:Relation__@0{label: "Charged", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_name: "paul_graham_essay.txt", file_type: "text/plain", file_size: 75042, _node_type: __NULL__, creation_date: "2024-05-31", document_id: __NULL__, last_modified_date: "2024-05-31", doc_id: __NULL__, _node_content: __NULL__, ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"}]->("$100 a month for a small store" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"} :Node__{label: "entity"} :Entity__{name: "$100 a month for a small store"})

1

("We" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "4145ba08-a096-4ac1-8f7c-f40642c857cc"} :Node__{label: "entity"} :Entity__{name: "We"})-[:Relation__@0{label: "Charged", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_name: "paul_graham_essay.txt", file_type: "text/plain", file_size: 75042, _node_type: __NULL__, creation_date: "2024-05-31", document_id: __NULL__, last_modified_date: "2024-05-31", doc_id: __NULL__, _node_content: __NULL__, ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"}]->("$300 a month for a big store" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"} :Node__{label: "entity"} :Entity__{name: "$300 a month for a big store"})

In [ ]

已复制！

%ng_draw
%ng_draw

查询与检索¶

In [ ]

已复制！

retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)
retriever = index.as_retriever( include_text=False, # 在返回的节点中包含源文本，默认为 True) nodes = retriever.retrieve("What happened at Interleaf and Viaweb?") for node in nodes: print(node.text)

Interleaf -> Got a job at -> I
Interleaf -> Crushed -> Moore's law
Interleaf -> Was -> Company
Interleaf -> Built -> Impressive technology
Interleaf -> Added -> Scripting language
Interleaf -> Had -> Smart people
Interleaf -> Made -> Software for creating documents
Viaweb -> Called -> Company
Viaweb -> Worked for -> Dan giffin
Viaweb -> Was -> Application service provider
In viaweb -> Was -> Code editor
Viaweb stock -> Was -> Valuable
Viaweb logo -> Had -> White v on red circle

In [ ]

已复制！

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))
query_engine = index.as_query_engine(include_text=True) response = query_engine.query("What happened at Interleaf and Viaweb?") print(str(response))

Interleaf was a company that built impressive technology and had smart people, but it was ultimately crushed by Moore's Law in the 1990s due to the exponential growth in the power of commodity processors. Despite adding a scripting language and making software for creating documents, it could not keep up with the rapid advancements in hardware.

Viaweb, on the other hand, was an application service provider that created a code editor for users to define their own page styles, which were actually Lisp expressions. The company was eventually bought by Yahoo in the summer of 1998. The Viaweb stock became valuable, and the acquisition marked a significant turning point for its founders. The Viaweb logo featured a white "V" on a red circle, which later inspired the Y Combinator logo.

从现有图加载¶

如果您有现有图，我们可以连接并使用它！

In [ ]

已复制！

from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(
    space="llamaindex_nebula_property_graph"
)

from llama_index.core.vector_stores.simple import SimpleVectorStore

vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json")

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    vector_store=vec_store,
)
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph" ) from llama_index.core.vector_stores.simple import SimpleVectorStore vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json") index = PropertyGraphIndex.from_existing( property_graph_store=graph_store, vector_store=vec_store, )

从这里，我们仍然可以插入更多文档！

In [ ]

已复制！

from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)
from llama_index.core import Document document = Document(text="LlamaIndex is great!") index.insert(document)

In [ ]

已复制！

nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex") print(nodes[0].text)

Llamaindex -> Is -> Great