阿里云 OpenSearch 向量存储¶

阿里云 OpenSearch 向量搜索版是由阿里巴巴集团开发的大规模分布式搜索引擎。阿里云 OpenSearch 向量搜索版为整个阿里巴巴集团（包括淘宝、天猫、菜鸟、优酷）以及为中国大陆以外地区客户提供的其他电商平台提供搜索服务。阿里云 OpenSearch 向量搜索版也是阿里云 OpenSearch 的基础引擎。经过多年的发展，阿里云 OpenSearch 向量搜索版已经满足了高可用、高时效性和成本效益的业务需求。阿里云 OpenSearch 向量搜索版还提供自动化运维系统，您可以在其上根据您的业务特点构建自定义搜索服务。

要运行，您需要有一个实例。

设置¶

如果您在 colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制!

%pip install llama-index-vector-stores-alibabacloud-opensearch
%pip install llama-index-vector-stores-alibabacloud-opensearch

In [ ]

已复制!

%pip install llama-index
%pip install llama-index

In [ ]

已复制!

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

请提供 OpenAI 访问密钥¶

要使用 OpenAI 的 embeddings，您需要提供 OpenAI API 密钥

In [ ]

已复制!

import openai

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")
openai.api_key = OPENAI_API_KEY
import openai OPENAI_API_KEY = getpass.getpass("OpenAI API Key:") openai.api_key = OPENAI_API_KEY

下载数据¶

In [ ]

已复制!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档¶

In [ ]

已复制!

from llama_index.core import SimpleDirectoryReader
from IPython.display import Markdown, display
from llama_index.core import SimpleDirectoryReader from IPython.display import Markdown, display

In [ ]

已复制!

# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(f"Total documents: {len(documents)}")
# load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() print(f"Total documents: {len(documents)}")

Total documents: 1

创建阿里云 OpenSearch 向量存储对象：¶

要运行下一步，您应该有一个阿里云 OpenSearch 向量服务实例，并配置一个表。

In [ ]

已复制!

# if run fllowing cells raise async io exception, run this
import nest_asyncio

nest_asyncio.apply()
# if run fllowing cells raise async io exception, run this import nest_asyncio nest_asyncio.apply()

In [ ]

已复制!





# initialize without metadata filter
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.alibabacloud_opensearch import (
    AlibabaCloudOpenSearchStore,
    AlibabaCloudOpenSearchConfig,
)

config = AlibabaCloudOpenSearchConfig(
    endpoint="*****",
    instance_id="*****",
    username="your_username",
    password="your_password",
    table_name="llama",
)

vector_store = AlibabaCloudOpenSearchStore(config)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
# initialize without metadata filter from llama_index.core import StorageContext, VectorStoreIndex from llama_index.vector_stores.alibabacloud_opensearch import ( AlibabaCloudOpenSearchStore, AlibabaCloudOpenSearchConfig, ) config = AlibabaCloudOpenSearchConfig( endpoint="*****", instance_id="*****", username="your_username", password="your_password", table_name="llama", ) vector_store = AlibabaCloudOpenSearchStore(config) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

查询索引¶

In [ ]

已复制!

# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
# set Logging to DEBUG for more detailed outputs query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?")

In [ ]

已复制!

display(Markdown(f"<b>{response}</b>"))
display(Markdown(f"{response}"))

作者在大学之前一直从事写作和编程。他们在9年级时写短篇故事，并尝试使用早期版本的 Fortran 在 IBM 1401 上编写程序。

连接到现有存储¶

由于此存储由阿里云 OpenSearch 提供支持，因此根据定义它是持久的。因此，如果您想连接到先前创建并填充的存储，方法如下：

In [ ]

已复制!





from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.alibabacloud_opensearch import (
    AlibabaCloudOpenSearchStore,
    AlibabaCloudOpenSearchConfig,
)

config = AlibabaCloudOpenSearchConfig(
    endpoint="***",
    instance_id="***",
    username="your_username",
    password="your_password",
    table_name="llama",
)

vector_store = AlibabaCloudOpenSearchStore(config)

# Create index from existing stored vectors
index = VectorStoreIndex.from_vector_store(vector_store)
query_engine = index.as_query_engine()
response = query_engine.query(
    "What did the author study prior to working on AI?"
)

display(Markdown(f"<b>{response}</b>"))
from llama_index.core import VectorStoreIndex from llama_index.vector_stores.alibabacloud_opensearch import ( AlibabaCloudOpenSearchStore, AlibabaCloudOpenSearchConfig, ) config = AlibabaCloudOpenSearchConfig( endpoint="***", instance_id="***", username="your_username", password="your_password", table_name="llama", ) vector_store = AlibabaCloudOpenSearchStore(config) # Create index from existing stored vectors index = VectorStoreIndex.from_vector_store(vector_store) query_engine = index.as_query_engine() response = query_engine.query( "What did the author study prior to working on AI?" ) display(Markdown(f"{response}"))

元数据过滤¶

阿里云 OpenSearch 向量存储支持在查询时进行元数据过滤。以下单元格在一个全新的表上演示此功能。

在此演示中，为了简洁起见，仅加载了一个源文档（../data/paul_graham/paul_graham_essay.txt 文本文件）。尽管如此，您将为文档附加一些自定义元数据，以说明如何通过文档附带的元数据上的条件来限制查询。

In [ ]

已复制!





from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.alibabacloud_opensearch import (
    AlibabaCloudOpenSearchStore,
    AlibabaCloudOpenSearchConfig,
)

config = AlibabaCloudOpenSearchConfig(
    endpoint="****",
    instance_id="****",
    username="your_username",
    password="your_password",
    table_name="llama",
)

md_storage_context = StorageContext.from_defaults(
    vector_store=AlibabaCloudOpenSearchStore(config)
)


def my_file_metadata(file_name: str):
    """Depending on the input file name, associate a different metadata."""
    if "essay" in file_name:
        source_type = "essay"
    elif "dinosaur" in file_name:
        # this (unfortunately) will not happen in this demo
        source_type = "dinos"
    else:
        source_type = "other"
    return {"source_type": source_type}


# Load documents and build index
md_documents = SimpleDirectoryReader(
    "../data/paul_graham", file_metadata=my_file_metadata
).load_data()
md_index = VectorStoreIndex.from_documents(
    md_documents, storage_context=md_storage_context
)
from llama_index.core import StorageContext, VectorStoreIndex from llama_index.vector_stores.alibabacloud_opensearch import ( AlibabaCloudOpenSearchStore, AlibabaCloudOpenSearchConfig, ) config = AlibabaCloudOpenSearchConfig( endpoint="****", instance_id="****", username="your_username", password="your_password", table_name="llama", ) md_storage_context = StorageContext.from_defaults( vector_store=AlibabaCloudOpenSearchStore(config) ) def my_file_metadata(file_name: str): """Depending on the input file name, associate a different metadata.""" if "essay" in file_name: source_type = "essay" elif "dinosaur" in file_name: # this (unfortunately) will not happen in this demo source_type = "dinos" else: source_type = "other" return {"source_type": source_type} # Load documents and build index md_documents = SimpleDirectoryReader( "../data/paul_graham", file_metadata=my_file_metadata ).load_data() md_index = VectorStoreIndex.from_documents( md_documents, storage_context=md_storage_context )

将过滤器添加到查询引擎

In [ ]

已复制!





from llama_index.core.vector_stores import MetadataFilter, MetadataFilters

md_query_engine = md_index.as_query_engine(
    filters=MetadataFilters(
        filters=[MetadataFilter(key="source_type", value="essay")]
    )
)
md_response = md_query_engine.query(
    "How long it took the author to write his thesis?"
)

display(Markdown(f"<b>{md_response}</b>"))
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters md_query_engine = md_index.as_query_engine( filters=MetadataFilters( filters=[MetadataFilter(key="source_type", value="essay")] ) ) md_response = md_query_engine.query( "How long it took the author to write his thesis?" ) display(Markdown(f"{md_response}"))

为了测试过滤是否生效，请尝试更改为仅使用 "dinos" 文档...这次将没有答案 :)