Oracle AI Vector Search：矢量存储

Oracle AI Vector Search 专为人工智能 (AI) 工作负载设计，允许您基于语义而非关键词查询数据。Oracle AI Vector Search 的最大优势之一在于可以将非结构化数据的语义搜索与业务数据的关系搜索结合在一个系统中。这不仅功能强大，而且效率显著更高，因为您无需添加专门的矢量数据库，从而消除了多系统间数据碎片化的痛点。

此外，您的矢量可以受益于 Oracle Database 最强大的功能，例如：

本指南演示了如何在 Oracle AI Vector Search 中使用矢量功能。

如果您刚刚开始使用 Oracle Database，请考虑探索免费的 Oracle 23 AI，它提供了设置数据库环境的极佳介绍。在使用数据库时，通常建议避免默认使用 system 用户；相反，您可以创建自己的用户以增强安全性和自定义。有关创建用户的详细步骤，请参阅我们的端到端指南，其中也展示了如何在 Oracle 中设置用户。此外，了解用户权限对于有效管理数据库安全性至关重要。您可以在 Oracle 官方指南中了解更多关于管理用户账户和安全性的信息。

前提条件

请安装 Oracle Python 客户端驱动程序，以便将 Llama Index 与 Oracle AI Vector Search 一起使用。

In [ ]

已复制！

%pip install llama-index-vector-stores-oracledb
%pip install llama-index-vector-stores-oracledb

连接到 Oracle AI Vector Search

以下示例代码将展示如何连接到 Oracle Database。默认情况下，python-oracledb 在“精简 (Thin)”模式下运行，直接连接到 Oracle Database。此模式不需要 Oracle 客户端库。但是，当 python-oracledb 使用这些库时，会提供一些附加功能。当使用 Oracle 客户端库时，python-oracledb 被称为处于“厚重 (Thick)”模式。这两种模式都具有支持 Python Database API v2.0 规范的全面功能。请参阅以下指南，其中讨论了每种模式支持的功能。如果您无法使用精简模式，可能需要切换到厚重模式。

In [ ]

已复制！





import oracledb

# please update with your username, password, hostname and service_name
username = "<username>"
password = "<password>"
dsn = "<hostname>/<service_name>"

try:
    connection = oracledb.connect(user=username, password=password, dsn=dsn)
    print("Connection successful!")
except Exception as ex:
    print("Exception occurred while index creation", ex)
import oracledb # 请使用您的用户名、密码、主机名和服务名更新 username = """ password = """ dsn = ""/" try: connection = oracledb.connect(user=username, password=password, dsn=dsn) print("连接成功！") except Exception as ex: print("索引创建过程中发生异常", ex)

导入必要的依赖项以使用 Oracle AI Vector Search

In [ ]

已复制！





import sys
import os


from llama_index.core.schema import NodeRelationship, RelatedNodeInfo, TextNode
from llama_index.core.vector_stores.types import (
    ExactMatchFilter,
    MetadataFilters,
    VectorStoreQuery,
)

from llama_index.vector_stores.oracledb import base as orallamavs
from llama_index.vector_stores.oracledb import OraLlamaVS, DistanceStrategy
import sys import os from llama_index.core.schema import NodeRelationship, RelatedNodeInfo, TextNode from llama_index.core.vector_stores.types import ( ExactMatchFilter, MetadataFilters, VectorStoreQuery, ) from llama_index.vector_stores.oracledb import base as orallamavs from llama_index.vector_stores.oracledb import OraLlamaVS, DistanceStrategy

加载文档

In [ ]

已复制！





# Define a list of documents (These dummy examples are 4 random documents )

text_json_list = [
    {
        "text": "If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.",
        "id_": "cncpt_15.5.3.2.2_P4",
        "embedding": [1.0, 0.0],
        "relationships": "test-0",
        "metadata": {
            "weight": 1.0,
            "rank": "a",
            "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-5387D7B2-C0CA-4C1E-811B-C7EB9B636442",
        },
    },
    {
        "text": "A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.",
        "id_": "cncpt_15.5.5_P1",
        "embedding": [0.0, 1.0],
        "relationships": "test-1",
        "metadata": {
            "weight": 2.0,
            "rank": "c",
            "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-D02B2220-E6F5-40D9-AFB5-BC69BCEF6CD4",
        },
    },
    {
        "text": "The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.",
        "id_": "cncpt_22.3.4.3.1_P2",
        "embedding": [1.0, 1.0],
        "relationships": "test-2",
        "metadata": {
            "weight": 3.0,
            "rank": "d",
            "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866",
        },
    },
    {
        "text": "The LOB segment stores data in pieces called chunks. A chunk is a logically contiguous set of data blocks and is the smallest unit of allocation for a LOB. A row in the table stores a pointer called a LOB locator, which points to the LOB index. When the table is queried, the database uses the LOB index to quickly locate the LOB chunks.",
        "id_": "cncpt_22.3.4.3.1_P3",
        "embedding": [2.0, 1.0],
        "relationships": "test-3",
        "metadata": {
            "weight": 4.0,
            "rank": "e",
            "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866",
        },
    },
]
# 定义文档列表（这些是 4 个随机的示例文档） text_json_list = [ { "text": "如果任何前面问题的答案是肯定的，则数据库停止搜索并从指定的表空间分配空间；否则，空间将从数据库默认共享临时表空间分配。", "id_": "cncpt_15.5.3.2.2_P4", "embedding": [1.0, 0.0], "relationships": "test-0", "metadata": { "weight": 1.0, "rank": "a", "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-5387D7B2-C0CA-4C1E-811B-C7EB9B636442", }, }, { "text": "数据库打开时，表空间可以是在线（可访问）或离线（不可访问）的。\n表空间通常是在线的，以便其数据可供用户访问。SYSTEM 表空间和临时表空间不能离线。", "id_": "cncpt_15.5.5_P1", "embedding": [0.0, 1.0], "relationships": "test-1", "metadata": { "weight": 2.0, "rank": "c", "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-D02B2220-E6F5-40D9-AFB5-BC69BCEF6CD4", }, }, { "text": "数据库存储 LOB 与其他数据类型不同。创建 LOB 列会隐式创建 LOB 段和 LOB 索引。包含 LOB 段和 LOB 索引（它们总是存储在一起）的表空间可能与包含表的表空间不同。\n有时数据库可以将少量 LOB 数据存储在表本身中，而不是单独的 LOB 段中。", "id_": "cncpt_22.3.4.3.1_P2", "embedding": [1.0, 1.0], "relationships": "test-2", "metadata": { "weight": 3.0, "rank": "d", "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866", }, }, { "text": "LOB 段以称为“块”的片段存储数据。块是逻辑上连续的一组数据块，是 LOB 的最小分配单位。表中的一行存储一个称为 LOB 定位器的指针，该指针指向 LOB 索引。当查询表时，数据库使用 LOB 索引快速定位 LOB 块。", "id_": "cncpt_22.3.4.3.1_P3", "embedding": [2.0, 1.0], "relationships": "test-3", "metadata": { "weight": 4.0, "rank": "e", "url": "https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866", }, }, ]

In [ ]

已复制！





# Create Llama Text Nodes
text_nodes = []
for text_json in text_json_list:
    # Construct the relationships using RelatedNodeInfo
    relationships = {
        NodeRelationship.SOURCE: RelatedNodeInfo(
            node_id=text_json["relationships"]
        )
    }

    # Prepare the metadata dictionary; you might want to exclude certain metadata fields if necessary
    metadata = {
        "weight": text_json["metadata"]["weight"],
        "rank": text_json["metadata"]["rank"],
    }

    # Create a TextNode instance
    text_node = TextNode(
        text=text_json["text"],
        id_=text_json["id_"],
        embedding=text_json["embedding"],
        relationships=relationships,
        metadata=metadata,
    )

    text_nodes.append(text_node)
print(text_nodes)
# 创建 Llama 文本节点 text_nodes = [] for text_json in text_json_list: # 使用 RelatedNodeInfo 构造关系 relationships = { NodeRelationship.SOURCE: RelatedNodeInfo( node_id=text_json["relationships"] ) } # 准备元数据字典；如有必要，您可以排除某些元数据字段 metadata = { "weight": text_json["metadata"]["weight"], "rank": text_json["metadata"]["rank"], } # 创建 TextNode 实例 text_node = TextNode( text=text_json["text"], id_=text_json["id_"], embedding=text_json["embedding"], relationships=relationships, metadata=metadata, ) text_nodes.append(text_node) print(text_nodes)

使用 AI Vector Search 创建一组具有不同距离策略的矢量存储

首先，我们将创建三个向量存储，每个存储使用不同的距离函数。由于我们尚未在其中创建索引，目前它们只会创建表。稍后我们将使用这些向量存储创建 HNSW 索引。

您可以手动连接到 Oracle 数据库，并看到三个表：Documents_DOT、Documents_COSINE 和 Documents_EUCLIDEAN。

然后，我们将创建另外三个表：Documents_DOT_IVF、Documents_COSINE_IVF 和 Documents_EUCLIDEAN_IVF，这些表将用于在这些表上创建 IVF 索引，而不是 HNSW 索引。

要了解有关 Oracle AI 向量搜索支持的不同索引类型的更多信息，请参考以下指南

In [ ]

已复制！





# Ingest documents into Oracle Vector Store using different distance strategies

vector_store_dot = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_DOT",
    client=connection,
    distance_strategy=DistanceStrategy.DOT_PRODUCT,
)
vector_store_max = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_COSINE",
    client=connection,
    distance_strategy=DistanceStrategy.COSINE,
)
vector_store_euclidean = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_EUCLIDEAN",
    client=connection,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

# Ingest documents into Oracle Vector Store using different distance strategies
vector_store_dot_ivf = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_DOT_IVF",
    client=connection,
    distance_strategy=DistanceStrategy.DOT_PRODUCT,
)
vector_store_max_ivf = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_COSINE_IVF",
    client=connection,
    distance_strategy=DistanceStrategy.COSINE,
)
vector_store_euclidean_ivf = OraLlamaVS.from_documents(
    text_nodes,
    table_name="Documents_EUCLIDEAN_IVF",
    client=connection,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)
# 使用不同的距离策略将文档摄取到 Oracle Vector Store 中 vector_store_dot = OraLlamaVS.from_documents( text_nodes, table_name="Documents_DOT", client=connection, distance_strategy=DistanceStrategy.DOT_PRODUCT, ) vector_store_max = OraLlamaVS.from_documents( text_nodes, table_name="Documents_COSINE", client=connection, distance_strategy=DistanceStrategy.COSINE, ) vector_store_euclidean = OraLlamaVS.from_documents( text_nodes, table_name="Documents_EUCLIDEAN", client=connection, distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE, ) # 使用不同的距离策略将文档摄取到 Oracle Vector Store 中 vector_store_dot_ivf = OraLlamaVS.from_documents( text_nodes, table_name="Documents_DOT_IVF", client=connection, distance_strategy=DistanceStrategy.DOT_PRODUCT, ) vector_store_max_ivf = OraLlamaVS.from_documents( text_nodes, table_name="Documents_COSINE_IVF", client=connection, distance_strategy=DistanceStrategy.COSINE, ) vector_store_euclidean_ivf = OraLlamaVS.from_documents( text_nodes, table_name="Documents_EUCLIDEAN_IVF", client=connection, distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE, )

演示文本的添加、删除操作以及基本的相似性搜索¶

In [ ]

已复制！





def manage_texts(vector_stores):
    """
    Adds texts to each vector store, demonstrates error handling for duplicate additions,
    and performs deletion of texts. Showcases similarity searches and index creation for each vector store.

    Args:
    - vector_stores (list): A list of OracleVS instances.
    """
    for i, vs in enumerate(vector_stores, start=1):
        # Adding texts
        try:
            vs.add_texts(text_nodes, metadata)
            print(f"\n\n\nAdd texts complete for vector store {i}\n\n\n")
        except Exception as ex:
            print(
                f"\n\n\nExpected error on duplicate add for vector store {i}\n\n\n"
            )

        # Deleting texts using the value of 'id'
        vs.delete("test-1")
        print(f"\n\n\nDelete texts complete for vector store {i}\n\n\n")

        # Similarity search
        query = VectorStoreQuery(
            query_embedding=[1.0, 1.0], similarity_top_k=3
        )
        results = vs.query(query=query)
        print(
            f"\n\n\nSimilarity search results for vector store {i}: {results}\n\n\n"
        )


vector_store_list = [
    vector_store_dot,
    vector_store_max,
    vector_store_euclidean,
    vector_store_dot_ivf,
    vector_store_max_ivf,
    vector_store_euclidean_ivf,
]
manage_texts(vector_store_list)
def manage_texts(vector_stores): """ 将文本添加到每个向量存储中，演示了处理重复添加时的错误处理，并执行文本删除操作。 展示了每个向量存储的相似性搜索和索引创建。 参数： - vector_stores (list): OracleVS 实例列表。 """ for i, vs in enumerate(vector_stores, start=1): # Adding texts try: vs.add_texts(text_nodes, metadata) print(f"\n\n\n为向量存储 {i} 添加文本完成\n\n\n") except Exception as ex: print( f"\n\n\n对于向量存储 {i} 的重复添加操作，预期会出现错误\n\n\n" ) # Deleting texts using the value of 'id' vs.delete("test-1") print(f"\n\n\n为向量存储 {i} 删除文本完成\n\n\n") # Similarity search query = VectorStoreQuery( query_embedding=[1.0, 1.0], similarity_top_k=3 ) results = vs.query(query=query) print( f"\n\n\n向量存储 {i} 的相似性搜索结果：{results}\n\n\n" ) vector_store_list = [ vector_store_dot, vector_store_max, vector_store_euclidean, vector_store_dot_ivf, vector_store_max_ivf, vector_store_euclidean_ivf, ] manage_texts(vector_store_list)

演示使用特定参数为每个距离策略创建索引¶

In [ ]

已复制！





def create_search_indices(connection):
    """
    Creates search indices for the vector stores, each with specific parameters tailored to their distance strategy.
    """
    # Index for DOT_PRODUCT strategy
    # Notice we are creating a HNSW index with default parameters
    # This will default to creating a HNSW index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search
    orallamavs.create_index(
        connection,
        vector_store_dot,
        params={"idx_name": "hnsw_idx1", "idx_type": "HNSW"},
    )

    # Index for COSINE strategy with specific parameters
    # Notice we are creating a HNSW index with parallel 16 and Target Accuracy Specification as 97 percent
    orallamavs.create_index(
        connection,
        vector_store_max,
        params={
            "idx_name": "hnsw_idx2",
            "idx_type": "HNSW",
            "accuracy": 97,
            "parallel": 16,
        },
    )

    # Index for EUCLIDEAN_DISTANCE strategy with specific parameters
    # Notice we are creating a HNSW index by specifying Power User Parameters which are neighbors = 64 and efConstruction = 100
    orallamavs.create_index(
        connection,
        vector_store_euclidean,
        params={
            "idx_name": "hnsw_idx3",
            "idx_type": "HNSW",
            "neighbors": 64,
            "efConstruction": 100,
        },
    )

    # Index for DOT_PRODUCT strategy with specific parameters
    # Notice we are creating an IVF index with default parameters
    # This will default to creating an IVF index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search
    orallamavs.create_index(
        connection,
        vector_store_dot_ivf,
        params={
            "idx_name": "ivf_idx1",
            "idx_type": "IVF",
        },
    )

    # Index for COSINE strategy with specific parameters
    # Notice we are creating an IVF index with parallel 32 and Target Accuracy Specification as 90 percent
    orallamavs.create_index(
        connection,
        vector_store_max_ivf,
        params={
            "idx_name": "ivf_idx2",
            "idx_type": "IVF",
            "accuracy": 90,
            "parallel": 32,
        },
    )

    # Index for EUCLIDEAN_DISTANCE strategy with specific parameters
    # Notice we are creating an IVF index by specifying Power User Parameters which is neighbor_part = 64
    orallamavs.create_index(
        connection,
        vector_store_euclidean_ivf,
        params={
            "idx_name": "ivf_idx3",
            "idx_type": "IVF",
            "neighbor_part": 64,
        },
    )

    print("Index creation complete.")


create_search_indices(connection)
def create_search_indices(connection): """ 为向量存储创建搜索索引，每个索引都根据其距离策略定制了特定参数。 """ # DOT_PRODUCT 策略的索引 # 注意我们正在使用默认参数创建一个 HNSW 索引 # 这将默认创建一个具有 8 个并行工作线程的 HNSW 索引，并使用 Oracle AI Vector Search 使用的默认精度 orallamavs.create_index( connection, vector_store_dot, params={"idx_name": "hnsw_idx1", "idx_type": "HNSW"}, ) # COSINE 策略的索引，带有特定参数 # 注意我们正在创建一个并行度为 16、目标精度规范为 97% 的 HNSW 索引 orallamavs.create_index( connection, vector_store_max, params={ "idx_name": "hnsw_idx2", "idx_type": "HNSW", "accuracy": 97, "parallel": 16, }, ) # EUCLIDEAN_DISTANCE 策略的索引，带有特定参数 # 注意我们正在通过指定高级用户参数来创建一个 HNSW 索引，这些参数是 neighbors = 64 和 efConstruction = 100 orallamavs.create_index( connection, vector_store_euclidean, params={ "idx_name": "hnsw_idx3", "idx_type": "HNSW", "neighbors": 64, "efConstruction": 100, }, ) # DOT_PRODUCT 策略的索引，带有特定参数 # 注意我们正在使用默认参数创建一个 IVF 索引 # 这将默认创建一个具有 8 个并行工作线程的 IVF 索引，并使用 Oracle AI Vector Search 使用的默认精度 orallamavs.create_index( connection, vector_store_dot_ivf, params={ "idx_name": "ivf_idx1", "idx_type": "IVF", }, ) # COSINE 策略的索引，带有特定参数 # 注意我们正在创建一个并行度为 32、目标精度规范为 90% 的 IVF 索引 orallamavs.create_index( connection, vector_store_max_ivf, params={ "idx_name": "ivf_idx2", "idx_type": "IVF", "accuracy": 90, "parallel": 32, }, ) # EUCLIDEAN_DISTANCE 策略的索引，带有特定参数 # 注意我们正在通过指定高级用户参数 neighbor_part = 64 来创建一个 IVF 索引 orallamavs.create_index( connection, vector_store_euclidean_ivf, params={ "idx_name": "ivf_idx3", "idx_type": "IVF", "neighbor_part": 64, }, ) print("索引创建完成。") create_search_indices(connection)

现在，我们将在所有六个向量存储上进行一系列高级搜索。这三种搜索中的每一种都有带过滤器和不带过滤器的版本。过滤器仅选择 id 为 101 的文档，并过滤掉所有其他文档。¶

In [ ]

已复制！





# Conduct advanced searches after creating the indices
def conduct_advanced_searches(vector_stores):
    # Constructing a filter for direct comparison against document metadata
    # This filter aims to include documents whose metadata 'id' is exactly '2'

    for i, vs in enumerate(vector_stores, start=1):

        def query_without_filters_returns_all_rows_sorted_by_similarity():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search without a filter
            print("\nSimilarity search results without filter:")
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], similarity_top_k=3
            )
            print(vs.query(query=query))

        query_without_filters_returns_all_rows_sorted_by_similarity()

        def query_with_filters_returns_multiple_matches():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search with filter
            print("\nSimilarity search results without filter:")
            filters = MetadataFilters(
                filters=[ExactMatchFilter(key="rank", value="c")]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1
            )
            result = vs.query(query)
            print(result.ids)

        query_with_filters_returns_multiple_matches()

        def query_with_filter_applies_top_k():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search with a filter
            print("\nSimilarity search results with filter:")
            filters = MetadataFilters(
                filters=[ExactMatchFilter(key="rank", value="c")]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1
            )
            result = vs.query(query)
            print(result.ids)

        query_with_filter_applies_top_k()

        def query_with_filter_applies_node_id_filter():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search with a filter
            print("\nSimilarity search results with filter:")
            filters = MetadataFilters(
                filters=[ExactMatchFilter(key="rank", value="c")]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0],
                filters=filters,
                similarity_top_k=3,
                node_ids=["452D24AB-F185-414C-A352-590B4B9EE51B"],
            )
            result = vs.query(query)
            print(result.ids)

        query_with_filter_applies_node_id_filter()

        def query_with_exact_filters_returns_single_match():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search with a filter
            print("\nSimilarity search results with filter:")
            filters = MetadataFilters(
                filters=[
                    ExactMatchFilter(key="rank", value="c"),
                    ExactMatchFilter(key="weight", value=2),
                ]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], filters=filters
            )
            result = vs.query(query)
            print(result.ids)

        query_with_exact_filters_returns_single_match()

        def query_with_contradictive_filter_returns_no_matches():
            filters = MetadataFilters(
                filters=[
                    ExactMatchFilter(key="weight", value=2),
                    ExactMatchFilter(key="weight", value=3),
                ]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], filters=filters
            )
            result = vs.query(query)
            print(result.ids)

        query_with_contradictive_filter_returns_no_matches()

        def query_with_filter_on_unknown_field_returns_no_matches():
            print(f"\n--- Vector Store {i} Advanced Searches ---")
            # Similarity search with a filter
            print("\nSimilarity search results with filter:")
            filters = MetadataFilters(
                filters=[ExactMatchFilter(key="unknown_field", value="c")]
            )
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], filters=filters
            )
            result = vs.query(query)
            print(result.ids)

        query_with_filter_on_unknown_field_returns_no_matches()

        def delete_removes_document_from_query_results():
            vs.delete("test-1")
            query = VectorStoreQuery(
                query_embedding=[1.0, 1.0], similarity_top_k=2
            )
            result = vs.query(query)
            print(result.ids)

        delete_removes_document_from_query_results()


conduct_advanced_searches(vector_store_list)
# 创建索引后执行高级搜索 def conduct_advanced_searches(vector_stores): # 构建一个用于直接与文档元数据进行比较的过滤器 # 这个过滤器旨在包含元数据 'id' 恰好是 '2' 的文档 for i, vs in enumerate(vector_stores, start=1): def query_without_filters_returns_all_rows_sorted_by_similarity(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search without a filter print("\n无过滤器相似性搜索结果：") query = VectorStoreQuery( query_embedding=[1.0, 1.0], similarity_top_k=3 ) print(vs.query(query=query)) query_without_filters_returns_all_rows_sorted_by_similarity() def query_with_filters_returns_multiple_matches(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search with filter print("\n无过滤器相似性搜索结果：") filters = MetadataFilters( filters=[ExactMatchFilter(key="rank", value="c")] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1 ) result = vs.query(query) print(result.ids) query_with_filters_returns_multiple_matches() def query_with_filter_applies_top_k(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search with a filter print("\n有过滤器相似性搜索结果：") filters = MetadataFilters( filters=[ExactMatchFilter(key="rank", value="c")] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1 ) result = vs.query(query) print(result.ids) query_with_filter_applies_top_k() def query_with_filter_applies_node_id_filter(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search with a filter print("\n有过滤器相似性搜索结果：") filters = MetadataFilters( filters=[ExactMatchFilter(key="rank", value="c")] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=3, node_ids=["452D24AB-F185-414C-A352-590B4B9EE51B"], ) result = vs.query(query) print(result.ids) query_with_filter_applies_node_id_filter() def query_with_exact_filters_returns_single_match(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search with a filter print("\n有过滤器相似性搜索结果：") filters = MetadataFilters( filters=[ ExactMatchFilter(key="rank", value="c"), ExactMatchFilter(key="weight", value=2), ] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters ) result = vs.query(query) print(result.ids) query_with_exact_filters_returns_single_match() def query_with_contradictive_filter_returns_no_matches(): filters = MetadataFilters( filters=[ ExactMatchFilter(key="weight", value=2), ExactMatchFilter(key="weight", value=3), ] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters ) result = vs.query(query) print(result.ids) query_with_contradictive_filter_returns_no_matches() def query_with_filter_on_unknown_field_returns_no_matches(): print(f"\n--- 向量存储 {i} 高级搜索 ---") # Similarity search with a filter print("\n有过滤器相似性搜索结果：") filters = MetadataFilters( filters=[ExactMatchFilter(key="unknown_field", value="c")] ) query = VectorStoreQuery( query_embedding=[1.0, 1.0], filters=filters ) result = vs.query(query) print(result.ids) query_with_filter_on_unknown_field_returns_no_matches() def delete_removes_document_from_query_results(): vs.delete("test-1") query = VectorStoreQuery( query_embedding=[1.0, 1.0], similarity_top_k=2 ) result = vs.query(query) print(result.ids) delete_removes_document_from_query_results() conduct_advanced_searches(vector_store_list)

端到端演示¶

请参考我们的完整演示指南 Oracle AI Vector Search 端到端演示指南，以借助 Oracle AI Vector Search 构建端到端 RAG 管线。