Upstash 向量存储¶
我们将看看如何使用 LlamaIndex 与 Upstash Vector 交互!
输入 [ ]
已复制!
! pip install -q llama-index upstash-vector
! pip install -q llama-index upstash-vector
输入 [ ]
已复制!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.vector_stores import UpstashVectorStore
from llama_index.core import StorageContext
import textwrap
import openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.core.vector_stores import UpstashVectorStore from llama_index.core import StorageContext import textwrap import openai
输入 [ ]
已复制!
# Setup the OpenAI API
openai.api_key = "sk-..."
# 设置 OpenAI API openai.api_key = "sk-..."
输入 [ ]
已复制!
# Download data
! mkdir -p 'data/paul_graham/'
! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
# 下载数据 ! mkdir -p 'data/paul_graham/' ! wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-02-03 20:04:25-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.01s 2024-02-03 20:04:25 (5.96 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
现在,我们可以使用 LlamaIndex 的 SimpleDirectoryReader 加载文档
输入 [ ]
已复制!
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("# Documents:", len(documents))
documents = SimpleDirectoryReader("./data/paul_graham/").load_data() print("# 文档数量:", len(documents))
# Documents: 1
要在 Upstash 上创建索引,请访问 https://console.upstash.com/vector,创建一个维度为 1536、距离度量为 Cosine
的索引。复制下面的 URL 和令牌
输入 [ ]
已复制!
vector_store = UpstashVectorStore(url="https://...", token="...")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
vector_store = UpstashVectorStore(url="https://...", token="...") storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )
现在我们已成功创建一个索引,并用论文中的向量填充它!数据需要一点时间来索引,然后就可以进行查询了。
输入 [ ]
已复制!
query_engine = index.as_query_engine()
res1 = query_engine.query("What did the author learn?")
print(textwrap.fill(str(res1), 100))
print("\n")
res2 = query_engine.query("What is the author's opinion on startups?")
print(textwrap.fill(str(res2), 100))
query_engine = index.as_query_engine() res1 = query_engine.query("What did the author learn?") print(textwrap.fill(str(res1), 100)) print("\n") res2 = query_engine.query("What is the author's opinion on startups?") print(textwrap.fill(str(res2), 100))
The author learned that the study of philosophy in college did not live up to their expectations. They found that other fields took up most of the space of ideas, leaving little room for what they perceived as the ultimate truths that philosophy was supposed to explore. As a result, they decided to switch to studying AI. The author's opinion on startups is that they are in need of help and support, especially in the beginning stages. The author believes that founders of startups are often helpless and face various challenges, such as getting incorporated and understanding the intricacies of running a company. The author's investment firm, Y Combinator, aims to provide seed funding and comprehensive support to startups, offering them the guidance and resources they need to succeed.
元数据过滤¶
您可以使用 MetadataFilters
与您的 VectorStoreQuery
一起,来过滤从 Upstash 向量存储返回的节点。
输入 [ ]
已复制!
import os
from llama_index.vector_stores.upstash import UpstashVectorStore
from llama_index.core.vector_stores.types import (
MetadataFilter,
MetadataFilters,
FilterOperator,
)
vector_store = UpstashVectorStore(
url=os.environ.get("UPSTASH_VECTOR_URL") or "",
token=os.environ.get("UPSTASH_VECTOR_TOKEN") or "",
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
filters = MetadataFilters(
filters=[
MetadataFilter(
key="author", value="Marie Curie", operator=FilterOperator.EQ
)
],
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
import os from llama_index.vector_stores.upstash import UpstashVectorStore from llama_index.core.vector_stores.types import ( MetadataFilter, MetadataFilters, FilterOperator, ) vector_store = UpstashVectorStore( url=os.environ.get("UPSTASH_VECTOR_URL") or "", token=os.environ.get("UPSTASH_VECTOR_TOKEN") or "", ) index = VectorStoreIndex.from_vector_store(vector_store=vector_store) filters = MetadataFilters( filters=[ MetadataFilter( key="author", value="Marie Curie", operator=FilterOperator.EQ ) ], ) retriever = index.as_retriever(filters=filters) retriever.retrieve("What is inception about?")
我们还可以使用 AND
或 OR
条件组合多个 MetadataFilters
输入 [ ]
已复制!
from llama_index.core.vector_stores import FilterOperator, FilterCondition
filters = MetadataFilters(
filters=[
MetadataFilter(
key="theme",
value=["Fiction", "Horror"],
operator=FilterOperator.IN,
),
MetadataFilter(key="year", value=1997, operator=FilterOperator.GT),
],
condition=FilterCondition.AND,
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("Harry Potter?")
from llama_index.core.vector_stores import FilterOperator, FilterCondition filters = MetadataFilters( filters=[ MetadataFilter( key="theme", value=["Fiction", "Horror"], operator=FilterOperator.IN, ), MetadataFilter(key="year", value=1997, operator=FilterOperator.GT), ], condition=FilterCondition.AND, ) retriever = index.as_retriever(filters=filters) retriever.retrieve("Harry Potter?")