Vertex AI Search Retriever¶

本 notebook 将指导你如何设置一个可以从 Vertex AI Search 数据存储中获取数据的 Retriever。

前提条件¶

设置一个 Google Cloud 项目
设置一个 Vertex AI Search 数据存储
启用 Vertex AI API

安装库¶

In [ ]

已复制！

%pip install llama-index-retrievers-vertexai-search
%pip install llama-index-retrievers-vertexai-search

重启当前运行时¶

为了在本 Jupyter 运行时中使用新安装的软件包，你必须重启运行时。你可以通过运行下面的单元格来完成，它将重启当前内核。

In [ ]

已复制！

# Colab only
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)
# Colab only # Automatically restart kernel after installs so that your environment can access the new packages import IPython app = IPython.Application.instance() app.kernel.do_shutdown(True)

验证你的 notebook 环境（仅限 Colab）¶

如果你在 Google Colab 上运行此 notebook，你需要验证你的环境。为此，请运行下面的新单元格。如果你使用 Vertex AI Workbench，则无需执行此步骤。

In [ ]

已复制！

# Colab only
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()
# Colab only import sys if "google.colab" in sys.modules: from google.colab import auth auth.authenticate_user()

In [ ]

已复制！

# If you're using JupyterLab instance, uncomment and run the below code.
#!gcloud auth login
# If you're using JupyterLab instance, uncomment and run the below code. #!gcloud auth login

In [ ]

已复制！

from llama_index.retrievers.vertexai_search import VertexAISearchRetriever

# Please note it's underscore '_' in vertexai_search
from llama_index.retrievers.vertexai_search import VertexAISearchRetriever # Please note it's underscore '_' in vertexai_search

设置 Google Cloud 项目信息并初始化 Vertex AI SDK¶

要开始使用 Vertex AI，你必须拥有一个现有的 Google Cloud 项目并启用 Vertex AI API。

了解更多关于设置项目和开发环境的信息。

In [ ]

已复制！

PROJECT_ID = "{your project id}"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)
PROJECT_ID = "{your project id}" # @param {type:"string"} LOCATION = "us-central1" # @param {type:"string"} import vertexai vertexai.init(project=PROJECT_ID, location=LOCATION)

测试结构化数据存储¶

In [ ]

已复制！

DATA_STORE_ID = "{your id}"  # @param {type:"string"}
LOCATION_ID = "global"
DATA_STORE_ID = "{your id}" # @param {type:"string"} LOCATION_ID = "global"

In [ ]

已复制！





struct_retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=DATA_STORE_ID,
    location_id=LOCATION_ID,
    engine_data_type=1,
)
struct_retriever = VertexAISearchRetriever( project_id=PROJECT_ID, data_store_id=DATA_STORE_ID, location_id=LOCATION_ID, engine_data_type=1, )

In [ ]

已复制！

query = "harry potter"
retrieved_results = struct_retriever.retrieve(query)
query = "harry potter" retrieved_results = struct_retriever.retrieve(query)

In [ ]

已复制！

print(retrieved_results[0])
print(retrieved_results[0])

测试非结构化数据存储¶

In [ ]

已复制！

DATA_STORE_ID = "{your id}"
LOCATION_ID = "global"
DATA_STORE_ID = "{your id}" LOCATION_ID = "global"

In [ ]

已复制！





unstruct_retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=DATA_STORE_ID,
    location_id=LOCATION_ID,
    engine_data_type=0,
)
unstruct_retriever = VertexAISearchRetriever( project_id=PROJECT_ID, data_store_id=DATA_STORE_ID, location_id=LOCATION_ID, engine_data_type=0, )

In [ ]

已复制！

query = "alphabet 2018 earning"
retrieved_results2 = unstruct_retriever.retrieve(query)
query = "alphabet 2018 earning" retrieved_results2 = unstruct_retriever.retrieve(query)

In [ ]

已复制！

print(retrieved_results2[0])
print(retrieved_results2[0])

测试网站数据存储¶

In [ ]

已复制！





DATA_STORE_ID = "{your id}"
LOCATION_ID = "global"
website_retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=DATA_STORE_ID,
    location_id=LOCATION_ID,
    engine_data_type=2,
)
DATA_STORE_ID = "{your id}" LOCATION_ID = "global" website_retriever = VertexAISearchRetriever( project_id=PROJECT_ID, data_store_id=DATA_STORE_ID, location_id=LOCATION_ID, engine_data_type=2, )

In [ ]

已复制！

query = "what's diamaxol"
retrieved_results3 = website_retriever.retrieve(query)
query = "what's diamaxol" retrieved_results3 = website_retriever.retrieve(query)

In [ ]

已复制！

print(retrieved_results3[0])
print(retrieved_results3[0])

在查询引擎中使用¶

In [ ]

已复制！

# import modules needed
from llama_index.core import Settings
from llama_index.llms.vertex import Vertex
from llama_index.embeddings.vertex import VertexTextEmbedding
# import modules needed from llama_index.core import Settings from llama_index.llms.vertex import Vertex from llama_index.embeddings.vertex import VertexTextEmbedding

In [ ]

已复制！





vertex_gemini = Vertex(
    model="gemini-1.5-pro",
    temperature=0,
    context_window=100000,
    additional_kwargs={},
)
# setup the index/query llm
Settings.llm = vertex_gemini
vertex_gemini = Vertex( model="gemini-1.5-pro", temperature=0, context_window=100000, additional_kwargs={}, ) # setup the index/query llm Settings.llm = vertex_gemini

In [ ]

已复制！

from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(struct_retriever)
from llama_index.core.query_engine import RetrieverQueryEngine query_engine = RetrieverQueryEngine.from_args(struct_retriever)

In [ ]

已复制！

response = query_engine.query("Tell me about harry potter")
print(str(response))
response = query_engine.query("Tell me about harry potter") print(str(response))