Redis Docstore+Index Store 演示¶

本指南向您展示如何直接使用基于 Redis 的 DocumentStore 抽象和 IndexStore 抽象。通过将节点放入文档存储中，您可以在同一个底层文档存储上定义多个索引，而不是在不同索引中重复数据。

索引本身也通过 IndexStore 存储在 Redis 中。

如果您在 colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制!

%pip install llama-index-storage-docstore-redis
%pip install llama-index-storage-index-store-redis
%pip install llama-index-llms-openai
%pip install llama-index-storage-docstore-redis %pip install llama-index-storage-index-store-redis %pip install llama-index-llms-openai

In [ ]

已复制!

!pip install llama-index
!pip install llama-index

In [ ]

已复制!

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

In [ ]

已复制!

import logging
import sys
import os

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging import sys import os logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [ ]

已复制!





from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex
from llama_index.core import SummaryIndex
from llama_index.core import ComposableGraph
from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_response
from llama_index.core import Settings
from llama_index.core import SimpleDirectoryReader, StorageContext from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex from llama_index.core import SummaryIndex from llama_index.core import ComposableGraph from llama_index.llms.openai import OpenAI from llama_index.core.response.notebook_utils import display_response from llama_index.core import Settings

INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.

/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

下载数据¶

In [ ]

已复制!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档¶

In [ ]

已复制!

reader = SimpleDirectoryReader("./data/paul_graham/")
documents = reader.load_data()
reader = SimpleDirectoryReader("./data/paul_graham/") documents = reader.load_data()

解析为节点¶

In [ ]

已复制!

from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)
from llama_index.core.node_parser import SentenceSplitter nodes = SentenceSplitter().get_nodes_from_documents(documents)

添加到 Docstore¶

In [ ]

已复制!

REDIS_HOST = os.getenv("REDIS_HOST", "127.0.0.1")
REDIS_PORT = os.getenv("REDIS_PORT", 6379)
REDIS_HOST = os.getenv("REDIS_HOST", "127.0.0.1") REDIS_PORT = os.getenv("REDIS_PORT", 6379)

In [ ]

已复制!

from llama_index.storage.docstore.redis import RedisDocumentStore
from llama_index.storage.index_store.redis import RedisIndexStore
from llama_index.storage.docstore.redis import RedisDocumentStore from llama_index.storage.index_store.redis import RedisIndexStore

/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

In [ ]

已复制!





storage_context = StorageContext.from_defaults(
    docstore=RedisDocumentStore.from_host_and_port(
        host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
    ),
    index_store=RedisIndexStore.from_host_and_port(
        host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
    ),
)
storage_context = StorageContext.from_defaults( docstore=RedisDocumentStore.from_host_and_port( host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index" ), index_store=RedisIndexStore.from_host_and_port( host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index" ), )

In [ ]

已复制!

storage_context.docstore.add_documents(nodes)
storage_context.docstore.add_documents(nodes)

In [ ]

已复制!

len(storage_context.docstore.docs)
len(storage_context.docstore.docs)

Out[ ]

定义多个索引¶

每个索引使用相同的底层节点。

In [ ]

已复制!

summary_index = SummaryIndex(nodes, storage_context=storage_context)
summary_index = SummaryIndex(nodes, storage_context=storage_context)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens

In [ ]

已复制!

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17050 tokens
> [build_index_from_nodes] Total embedding token usage: 17050 tokens

In [ ]

已复制!

keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)
keyword_table_index = SimpleKeywordTableIndex( nodes, storage_context=storage_context )

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens

In [ ]

已复制!

# NOTE: the docstore still has the same nodes
len(storage_context.docstore.docs)
# 注意: 文档存储仍然拥有相同的节点 len(storage_context.docstore.docs)

Out[ ]

测试保存和加载¶

In [ ]

已复制!

# NOTE: docstore and index_store is persisted in Redis by default
# NOTE: here only need to persist simple vector store to disk
storage_context.persist(persist_dir="./storage")
# 注意: 文档存储和索引存储默认持久化在 Redis 中 # 注意: 这里只需要将简单的向量存储持久化到磁盘 storage_context.persist(persist_dir="./storage")

In [ ]

已复制!

# note down index IDs
list_id = summary_index.index_id
vector_id = vector_index.index_id
keyword_id = keyword_table_index.index_id
# 记下索引 ID list_id = summary_index.index_id vector_id = vector_index.index_id keyword_id = keyword_table_index.index_id

In [ ]

已复制!





from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=RedisDocumentStore.from_host_and_port(
        host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
    ),
    index_store=RedisIndexStore.from_host_and_port(
        host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index"
    ),
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
vector_index = load_index_from_storage(
    storage_context=storage_context, index_id=vector_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)
from llama_index.core import load_index_from_storage # 重新创建存储上下文 storage_context = StorageContext.from_defaults( docstore=RedisDocumentStore.from_host_and_port( host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index" ), index_store=RedisIndexStore.from_host_and_port( host=REDIS_HOST, port=REDIS_PORT, namespace="llama_index" ), ) # 加载索引 summary_index = load_index_from_storage( storage_context=storage_context, index_id=list_id ) vector_index = load_index_from_storage( storage_context=storage_context, index_id=vector_id ) keyword_table_index = load_index_from_storage( storage_context=storage_context, index_id=keyword_id )

INFO:llama_index.indices.loading:Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc']
Loading indices with ids: ['24e98f9b-9586-4fc6-8341-8dce895e5bcc']
INFO:llama_index.indices.loading:Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284']
Loading indices with ids: ['f7b2aeb3-4dad-4750-8177-78d5ae706284']
INFO:llama_index.indices.loading:Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd']
Loading indices with ids: ['9a9198b4-7cb9-4c96-97a7-5f404f43b9cd']

测试一些查询¶

In [ ]

已复制!

chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = chatgpt
Settings.chunk_size = 1024
chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo") Settings.llm = chatgpt Settings.chunk_size = 1024

In [ ]

已复制!

query_engine = summary_index.as_query_engine()
list_response = query_engine.query("What is a summary of this document?")
query_engine = summary_index.as_query_engine() list_response = query_engine.query("What is a summary of this document?")

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 26111 tokens
> [get_response] Total LLM token usage: 26111 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

In [ ]

已复制!

display_response(list_response)
display_response(list_response)

最终回应： 这篇文档叙述了作者从年轻时写作和编程到追求艺术事业的历程。他描述了自己在高中、大学和研究生阶段的经历，以及最终如何决定将艺术作为职业。他申请了艺术学校，并最终被 RISD 和佛罗伦萨 Accademia di Belli Arti 录取。他通过了 Accademia 的入学考试并在那里开始学习艺术。随后，他搬到纽约，一边自由职业，一边撰写一本关于 Lisp 的书。他最终创办了一家公司，旨在将艺术画廊搬到线上，但没有成功。然后，他转向创建用于构建在线商店的软件，该软件最终获得了成功。他想到了在服务器上运行软件，并通过点击链接让用户控制它的想法，这意味着用户只需要一个浏览器。这种被称为“互联网店面”的软件最终取得了成功。他和他的团队努力使软件用户友好且廉价，最终公司被雅虎收购。收购后，他离开去追求他的绘画梦想，最终在纽约取得了成功。他能够负担得起出租车和餐馆等奢侈品，并尝试了一种新型静物画。他还萌生了创建用于制作网络应用的 Web 应用的想法，并最终付诸实践并取得了成功。随后，他用自己的钱以及朋友 Robert 和 Trevor 的帮助，创办了投资公司 Y Combinator，专注于帮助初创公司。他撰写论文和书籍，邀请本科生申请夏季创始人计划，并最终与 Jessica Livingston 结婚。母亲去世后，他决定退出 Y Combinator，继续追求绘画，但最终精力耗尽，再次开始写作论文并研究 Lisp。他用 Arc 编写了一种新的 Lisp，称为 Bel，耗时四年完成。在此期间，他努力使该语言用户友好且精确，同时也花时间与家人享受生活。他一路遇到了各种障碍，例如即使导致限制的规定已经消失，习俗仍然约束着他；他还必须应对论坛上对他论文的误解。最终，他成功地创建了 Bel，并得以实现他的绘画梦想。

In [ ]

已复制!

query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("What did the author do growing up?")
query_engine = vector_index.as_query_engine() vector_response = query_engine.query("What did the author do growing up?")

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens
> [retrieve] Total embedding token usage: 8 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens
> [get_response] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

In [ ]

已复制!

display_response(vector_response)
display_response(vector_response)

最终回应： 无

In [ ]

已复制!

query_engine = keyword_table_index.as_query_engine()
keyword_response = query_engine.query(
    "What did the author do after his time at YC?"
)
query_engine = keyword_table_index.as_query_engine() keyword_response = query_engine.query( "What did the author do after his time at YC?" )

INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What did the author do after his time at YC?
> Starting query: What did the author do after his time at YC?
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['action', 'yc', 'after', 'time', 'author']
query keywords: ['action', 'yc', 'after', 'time', 'author']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['yc', 'time']
> Extracted keywords: ['yc', 'time']
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 10216 tokens
> [get_response] Total LLM token usage: 10216 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

In [ ]

已复制!

display_response(keyword_response)
display_response(keyword_response)

最终回应： 在 YC 之后，作者决定继续追求绘画和写作。他想看看如果全身心投入，自己能达到什么水平，所以他在停止 YC 工作的那天就开始绘画。他将 2014 年的大部分时间用于绘画，并且比以前有所进步。他还写论文，并在 2015 年 3 月重新开始研究 Lisp。随后，他花了四年时间研究一种新的 Lisp，叫做 Bel，他用 Arc 自己编写了 Bel。在此期间的大部分时间里，他不得不禁止自己写论文，并于 2016 年夏天搬到英格兰。他还写了一本关于 Lisp 黑客的书，叫做 On Lisp，于 1993 年出版。2019 年秋天，Bel 终于完成。他还尝试了一种新型静物画，并试图构建一个用于制作网络应用的 Web 应用，他将其命名为 Aspra。最终，他决定将此应用的一个子集作为一个开源项目来构建，这就是他称之为 Arc 的新 Lisp 方言。