In [ ]
已复制!
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph
Docker 设置¶
要在本地启动 NebulaGraph,首先请确保您已安装 Docker。然后,您可以使用以下 Docker 命令启动数据库。
mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up
之后,您就可以创建您的第一个属性图了!
有关部署 NebulaGraph 的其他选项/详情,请参阅文档
In [ ]
已复制!
# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
# create a graph space(think of a Database Instance) named: llamaindex_nebula_property_graph
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
# 加载 NebulaGraph Jupyter 扩展以启用 %ngql magic %load_ext ngql # 连接到 NebulaGraph 服务 %ngql --address 127.0.0.1 --port 9669 --user root --password nebula # 创建一个名为 llamaindex_nebula_property_graph 的图空间(可看作数据库实例)%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
Connection Pool Created
Out[ ]
In [ ]
已复制!
# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;
# 使用图空间,类似于 MySQL 中的 "use database"# 空间以异步方式创建,因此需要等待一段时间才能使用,如果失败请重试 %ngql USE llamaindex_nebula_property_graph;
Out[ ]
环境设置¶
我们需要做一些环境设置才能开始。
In [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os os.environ["OPENAI_API_KEY"] = "sk-proj-..."
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]
已复制!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()
In [ ]
已复制!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
我们选择使用 gpt-4o 和本地嵌入模型 intfloat/multilingual-e5-large。您可以通过编辑以下行来更改为您喜欢的模型
In [ ]
已复制!
%pip install llama-index-embeddings-huggingface
%pip install llama-index-embeddings-huggingface
In [ ]
已复制!
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
Settings.embed_model = HuggingFaceEmbedding(
model_name="intfloat/multilingual-e5-large"
)
# Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
from llama_index.core import Settings from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.embeddings.huggingface import HuggingFaceEmbedding Settings.llm = OpenAI(model="gpt-4o", temperature=0.3) Settings.embed_model = HuggingFaceEmbedding( model_name="intfloat/multilingual-e5-large" ) # Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
索引构建¶
准备属性图存储
In [ ]
已复制!
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore
graph_store = NebulaPropertyGraphStore(
space="llamaindex_nebula_property_graph", overwrite=True
)
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph", overwrite=True )
以及向量存储
In [ ]
已复制!
from llama_index.core.vector_stores.simple import SimpleVectorStore
vec_store = SimpleVectorStore()
from llama_index.core.vector_stores.simple import SimpleVectorStore vec_store = SimpleVectorStore()
最后,构建索引!
In [ ]
已复制!
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.llms.openai import OpenAI
index = PropertyGraphIndex.from_documents(
documents,
property_graph_store=graph_store,
vector_store=vec_store,
show_progress=True,
)
index.storage_context.vector_store.persist("./data/nebula_vec_store.json")
from llama_index.core.indices.property_graph import PropertyGraphIndex from llama_index.core.storage.storage_context import StorageContext from llama_index.llms.openai import OpenAI index = PropertyGraphIndex.from_documents( documents, property_graph_store=graph_store, vector_store=vec_store, show_progress=True, ) index.storage_context.vector_store.persist("./data/nebula_vec_store.json")
/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-caVs7DDe-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 20.96it/s] Extracting paths from text: 100%|██████████| 22/22 [00:19<00:00, 1.15it/s] Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 25253.06it/s] Generating embeddings: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it] Generating embeddings: 100%|██████████| 5/5 [00:02<00:00, 2.50it/s]
现在图已创建完成,我们可以使用 jupyter-nebulagraph 来探索它
In [ ]
已复制!
%ngql SHOW TAGS
%ngql SHOW TAGS
Out[ ]
名称 | |
---|---|
0 | Chunk__ |
1 | Entity__ |
2 | Node__ |
3 | Props__ |
In [ ]
已复制!
%ngql SHOW EDGES
%ngql SHOW EDGES
Out[ ]
名称 | |
---|---|
0 | Relation__ |
1 | __meta__node_label__ |
2 | __meta__rel_label__ |
In [ ]
已复制!
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;
Out[ ]
src | 关系 | dest | |
---|---|---|---|
0 | We | Charged | 小型商店每月 100 美元 |
1 | We | Charged | 大型商店每月 300 美元 |
2 | We | 开始工作 | 构建软件 |
3 | We | Started | 公司 |
4 | We | 开始 | 投资公司 |
5 | We | 开业 | 1996 年 1 月 |
6 | We | Had | 一个可行的 |
7 | We | 决定尝试制作 | 商店生成器的版本 |
8 | 增长率 | Takes care of | 绝对数字 |
9 | 股票 | 上涨 | 5 倍 |
10 | Jessica Livingston | 负责 | 波士顿投资银行的营销 |
11 | 语言 | 将是 | Lisp 方言 |
12 | 语言 | 使用 | Fortran 的早期版本 |
13 | Arc | 编译成 | Scheme |
14 | Deal | 成为 | Y Combinator 的模型 |
In [ ]
已复制!
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 2;
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 2;
Out[ ]
p | |
---|---|
0 | ("We" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "4145ba08-a096-4ac1-8f7c-f40642c857cc"} :Node__{label: "entity"} :Entity__{name: "We"})-[:Relation__@0{label: "Charged", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_name: "paul_graham_essay.txt", file_type: "text/plain", file_size: 75042, _node_type: __NULL__, creation_date: "2024-05-31", document_id: __NULL__, last_modified_date: "2024-05-31", doc_id: __NULL__, _node_content: __NULL__, ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"}]->("$100 a month for a small store" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"} :Node__{label: "entity"} :Entity__{name: "$100 a month for a small store"}) |
1 | ("We" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "4145ba08-a096-4ac1-8f7c-f40642c857cc"} :Node__{label: "entity"} :Entity__{name: "We"})-[:Relation__@0{label: "Charged", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_name: "paul_graham_essay.txt", file_type: "text/plain", file_size: 75042, _node_type: __NULL__, creation_date: "2024-05-31", document_id: __NULL__, last_modified_date: "2024-05-31", doc_id: __NULL__, _node_content: __NULL__, ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"}]->("$300 a month for a big store" :Props__{_node_content: __NULL__, _node_type: __NULL__, creation_date: "2024-05-31", doc_id: __NULL__, document_id: __NULL__, file_name: "paul_graham_essay.txt", file_path: "/Users/loganmarkewich/giant_change/llama_index/docs/docs/examples/property_graph/data/paul_graham/paul_graham_essay.txt", file_size: 75042, file_type: "text/plain", last_modified_date: "2024-05-31", ref_doc_id: __NULL__, triplet_source_id: "0faa4540-57bb-4b94-8bc2-46431d980182"} :Node__{label: "entity"} :Entity__{name: "$300 a month for a big store"}) |
In [ ]
已复制!
%ng_draw
%ng_draw
查询与检索¶
In [ ]
已复制!
retriever = index.as_retriever(
include_text=False, # include source text in returned nodes, default True
)
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
for node in nodes:
print(node.text)
retriever = index.as_retriever( include_text=False, # 在返回的节点中包含源文本,默认为 True) nodes = retriever.retrieve("What happened at Interleaf and Viaweb?") for node in nodes: print(node.text)
Interleaf -> Got a job at -> I Interleaf -> Crushed -> Moore's law Interleaf -> Was -> Company Interleaf -> Built -> Impressive technology Interleaf -> Added -> Scripting language Interleaf -> Had -> Smart people Interleaf -> Made -> Software for creating documents Viaweb -> Called -> Company Viaweb -> Worked for -> Dan giffin Viaweb -> Was -> Application service provider In viaweb -> Was -> Code editor Viaweb stock -> Was -> Valuable Viaweb logo -> Had -> White v on red circle
In [ ]
已复制!
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
query_engine = index.as_query_engine(include_text=True) response = query_engine.query("What happened at Interleaf and Viaweb?") print(str(response))
Interleaf was a company that built impressive technology and had smart people, but it was ultimately crushed by Moore's Law in the 1990s due to the exponential growth in the power of commodity processors. Despite adding a scripting language and making software for creating documents, it could not keep up with the rapid advancements in hardware. Viaweb, on the other hand, was an application service provider that created a code editor for users to define their own page styles, which were actually Lisp expressions. The company was eventually bought by Yahoo in the summer of 1998. The Viaweb stock became valuable, and the acquisition marked a significant turning point for its founders. The Viaweb logo featured a white "V" on a red circle, which later inspired the Y Combinator logo.
从现有图加载¶
如果您有现有图,我们可以连接并使用它!
In [ ]
已复制!
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore
graph_store = NebulaPropertyGraphStore(
space="llamaindex_nebula_property_graph"
)
from llama_index.core.vector_stores.simple import SimpleVectorStore
vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json")
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
vector_store=vec_store,
)
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore graph_store = NebulaPropertyGraphStore( space="llamaindex_nebula_property_graph" ) from llama_index.core.vector_stores.simple import SimpleVectorStore vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json") index = PropertyGraphIndex.from_existing( property_graph_store=graph_store, vector_store=vec_store, )
从这里,我们仍然可以插入更多文档!
In [ ]
已复制!
from llama_index.core import Document
document = Document(text="LlamaIndex is great!")
index.insert(document)
from llama_index.core import Document document = Document(text="LlamaIndex is great!") index.insert(document)
In [ ]
已复制!
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")
print(nodes[0].text)
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex") print(nodes[0].text)
Llamaindex -> Is -> Great