向量存储索引使用示例¶
在本指南中,我们将展示如何将向量存储索引与不同的向量存储实现一起使用。
从如何使用默认的内存向量存储和默认查询配置开始,只需几行代码,到如何使用自定义托管向量存储,以及如何使用元数据过滤器等高级设置。
In [ ]
已复制!
from llama_index import VectorStoreIndex, SimpleDirectoryReader
# Load documents and build index
documents = SimpleDirectoryReader(
"../../examples/data/paul_graham"
).load_data()
index = VectorStoreIndex.from_documents(documents)
from llama_index import VectorStoreIndex, SimpleDirectoryReader # 加载文档并构建索引 documents = SimpleDirectoryReader( "../../examples/data/paul_graham" ).load_data() index = VectorStoreIndex.from_documents(documents)
自定义向量存储
您可以按如下方式使用自定义向量存储(在本例中为 PineconeVectorStore
)
In [ ]
已复制!
import pinecone
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores import PineconeVectorStore
# init pinecone
pinecone.init(api_key="<api_key>", environment="<environment>")
pinecone.create_index(
"quickstart", dimension=1536, metric="euclidean", pod_type="p1"
)
# construct vector store and customize storage context
storage_context = StorageContext.from_defaults(
vector_store=PineconeVectorStore(pinecone.Index("quickstart"))
)
# Load documents and build index
documents = SimpleDirectoryReader(
"../../examples/data/paul_graham"
).load_data()
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
import pinecone from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext from llama_index.vector_stores import PineconeVectorStore # 初始化 pinecone pinecone.init(api_key="", environment="") pinecone.create_index( "quickstart", dimension=1536, metric="euclidean", pod_type="p1" ) # 构建向量存储并自定义存储上下文 storage_context = StorageContext.from_defaults( vector_store=PineconeVectorStore(pinecone.Index("quickstart")) ) # 加载文档并构建索引 documents = SimpleDirectoryReader( "../../examples/data/paul_graham" ).load_data() index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )
有关如何初始化不同向量存储的更多示例,请参阅 向量存储集成。
连接到外部向量存储(使用现有嵌入)¶
如果您已经计算了嵌入并将其导入到外部向量存储中(例如 Pinecone, Chroma),您可以通过以下方式将其与 LlamaIndex 一起使用
In [ ]
已复制!
vector_store = PineconeVectorStore(pinecone.Index("quickstart"))
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
vector_store = PineconeVectorStore(pinecone.Index("quickstart")) index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
In [ ]
已复制!
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?")
配置标准查询设置
要配置查询设置,您可以在构建查询引擎时直接将其作为关键字参数传入
In [ ]
已复制!
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
query_engine = index.as_query_engine(
similarity_top_k=3,
vector_store_query_mode="default",
filters=MetadataFilters(
filters=[
ExactMatchFilter(key="name", value="paul graham"),
]
),
alpha=None,
doc_ids=None,
)
response = query_engine.query("what did the author do growing up?")
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters query_engine = index.as_query_engine( similarity_top_k=3, vector_store_query_mode="default", filters=MetadataFilters( filters=[ ExactMatchFilter(key="name", value="paul graham"), ] ), alpha=None, doc_ids=None, ) response = query_engine.query("what did the author do growing up?")
请注意,元数据过滤是针对 Node.metadata
中指定的元数据应用的。
或者,如果您使用的是较低层次的组合式 API
In [ ]
已复制!
from llama_index import get_response_synthesizer
from llama_index.indices.vector_store.retrievers import VectorIndexRetriever
from llama_index.query_engine.retriever_query_engine import (
RetrieverQueryEngine,
)
# build retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=3,
vector_store_query_mode="default",
filters=[ExactMatchFilter(key="name", value="paul graham")],
alpha=None,
doc_ids=None,
)
# build query engine
query_engine = RetrieverQueryEngine(
retriever=retriever, response_synthesizer=get_response_synthesizer()
)
# query
response = query_engine.query("what did the author do growing up?")
from llama_index import get_response_synthesizer from llama_index.indices.vector_store.retrievers import VectorIndexRetriever from llama_index.query_engine.retriever_query_engine import ( RetrieverQueryEngine, ) # 构建检索器 retriever = VectorIndexRetriever( index=index, similarity_top_k=3, vector_store_query_mode="default", filters=[ExactMatchFilter(key="name", value="paul graham")], alpha=None, doc_ids=None, ) # 构建查询引擎 query_engine = RetrieverQueryEngine( retriever=retriever, response_synthesizer=get_response_synthesizer() ) # 查询 response = query_engine.query("what did the author do growing up?")
配置向量存储特定的关键字参数
您还可以通过传入 vector_store_kwargs
来自定义特定向量存储实现独有的关键字参数
In [ ]
已复制!
query_engine = index.as_query_engine(
similarity_top_k=3,
# only works for pinecone
vector_store_kwargs={
"filter": {"name": "paul graham"},
},
)
response = query_engine.query("what did the author do growing up?")
query_engine = index.as_query_engine( similarity_top_k=3, # 仅适用于 pinecone vector_store_kwargs={ "filter": {"name": "paul graham"}, }, ) response = query_engine.query("what did the author do growing up?")
使用自动检索器
您还可以使用 LLM 为您自动决定查询设置!目前,我们支持自动设置精确匹配元数据过滤器和 top k 参数。
In [ ]
已复制!
from llama_index import get_response_synthesizer
from llama_index.indices.vector_store.retrievers import (
VectorIndexAutoRetriever,
)
from llama_index.query_engine.retriever_query_engine import (
RetrieverQueryEngine,
)
from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo
vector_store_info = VectorStoreInfo(
content_info="brief biography of celebrities",
metadata_info=[
MetadataInfo(
name="category",
type="str",
description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]",
),
MetadataInfo(
name="country",
type="str",
description="Country of the celebrity, one of [United States, Barbados, Portugal]",
),
],
)
# build retriever
retriever = VectorIndexAutoRetriever(
index, vector_store_info=vector_store_info
)
# build query engine
query_engine = RetrieverQueryEngine(
retriever=retriever, response_synthesizer=get_response_synthesizer()
)
# query
response = query_engine.query(
"Tell me about two celebrities from United States"
)
from llama_index import get_response_synthesizer from llama_index.indices.vector_store.retrievers import ( VectorIndexAutoRetriever, ) from llama_index.query_engine.retriever_query_engine import ( RetrieverQueryEngine, ) from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo vector_store_info = VectorStoreInfo( content_info="brief biography of celebrities", metadata_info=[ MetadataInfo( name="category", type="str", description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]", ), MetadataInfo( name="country", type="str", description="Country of the celebrity, one of [United States, Barbados, Portugal]", ), ], ) # 构建检索器 retriever = VectorIndexAutoRetriever( index, vector_store_info=vector_store_info ) # 构建查询引擎 query_engine = RetrieverQueryEngine( retriever=retriever, response_synthesizer=get_response_synthesizer() ) # 查询 response = query_engine.query( "Tell me about two celebrities from United States" )