Google Vertex AI Vector Search¶
本 Notebook 演示了如何使用与 Google Cloud Vertex AI Vector Search
向量数据库相关的功能。
Google Vertex AI Vector Search,以前称为 Vertex AI Matching Engine,提供行业领先的高规模、低延迟向量数据库。这些向量数据库通常被称为向量相似性匹配或近似最近邻 (ANN) 服务。
注意:LlamaIndex 期望 Vertex AI Vector Search 端点和已部署的索引已经创建。创建空索引可能需要长达一分钟的时间,将索引部署到端点可能需要长达 30 分钟。
要了解如何创建索引,请参阅创建索引并将其部署到端点部分。
如果您已经部署了索引,请跳到从文本创建 VectorStore。
安装¶
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
输入 [ ]
已复制!
! pip install llama-index llama-index-vector-stores-vertexaivectorsearch llama-index-llms-vertex
! pip install llama-index llama-index-vector-stores-vertexaivectorsearch llama-index-llms-vertex
创建索引并将其部署到端点¶
- 本部分演示了创建新索引并将其部署到端点。
输入 [ ]
已复制!
# TODO : Set values as per your requirements
# Project and Storage Constants
PROJECT_ID = "[your_project_id]"
REGION = "[your_region]"
GCS_BUCKET_NAME = "[your_gcs_bucket]"
GCS_BUCKET_URI = f"gs://{GCS_BUCKET_NAME}"
# The number of dimensions for the textembedding-gecko@003 is 768
# If other embedder is used, the dimensions would probably need to change.
VS_DIMENSIONS = 768
# Vertex AI Vector Search Index configuration
# parameter description here
# https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex#google_cloud_aiplatform_MatchingEngineIndex_create_tree_ah_index
VS_INDEX_NAME = "llamaindex-doc-index" # @param {type:"string"}
VS_INDEX_ENDPOINT_NAME = "llamaindex-doc-endpoint" # @param {type:"string"}
# TODO : 根据您的要求设置值 # 项目和存储常量 PROJECT_ID = "[your_project_id]" REGION = "[your_region]" GCS_BUCKET_NAME = "[your_gcs_bucket]" GCS_BUCKET_URI = f"gs://{GCS_BUCKET_NAME}" # textembedding-gecko@003 的维度数量为 768 # 如果使用其他嵌入模型,维度可能需要更改。 VS_DIMENSIONS = 768 # Vertex AI Vector Search 索引配置 # 参数说明在这里 # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex#google_cloud_aiplatform_MatchingEngineIndex_create_tree_ah_index VS_INDEX_NAME = "llamaindex-doc-index" # @param {type:"string"} VS_INDEX_ENDPOINT_NAME = "llamaindex-doc-endpoint" # @param {type:"string"}
输入 [ ]
已复制!
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=REGION)
from google.cloud import aiplatform aiplatform.init(project=PROJECT_ID, location=REGION)
创建 Cloud Storage 存储桶¶
输入 [ ]
已复制!
# Create a bucket.
! gsutil mb -l $REGION -p $PROJECT_ID $GCS_BUCKET_URI
# 创建一个存储桶。 ! gsutil mb -l $REGION -p $PROJECT_ID $GCS_BUCKET_URI
输入 [ ]
已复制!
# NOTE : This operation can take upto 30 seconds
# check if index exists
index_names = [
index.resource_name
for index in aiplatform.MatchingEngineIndex.list(
filter=f"display_name={VS_INDEX_NAME}"
)
]
if len(index_names) == 0:
print(f"Creating Vector Search index {VS_INDEX_NAME} ...")
vs_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
display_name=VS_INDEX_NAME,
dimensions=VS_DIMENSIONS,
distance_measure_type="DOT_PRODUCT_DISTANCE",
shard_size="SHARD_SIZE_SMALL",
index_update_method="STREAM_UPDATE", # allowed values BATCH_UPDATE , STREAM_UPDATE
)
print(
f"Vector Search index {vs_index.display_name} created with resource name {vs_index.resource_name}"
)
else:
vs_index = aiplatform.MatchingEngineIndex(index_name=index_names[0])
print(
f"Vector Search index {vs_index.display_name} exists with resource name {vs_index.resource_name}"
)
# 注意:此操作可能需要长达 30 秒 # 检查索引是否存在 index_names = [ index.resource_name for index in aiplatform.MatchingEngineIndex.list( filter=f"display_name={VS_INDEX_NAME}" ) ] if len(index_names) == 0: print(f"正在创建 Vector Search 索引 {VS_INDEX_NAME} ...") vs_index = aiplatform.MatchingEngineIndex.create_tree_ah_index( display_name=VS_INDEX_NAME, dimensions=VS_DIMENSIONS, distance_measure_type="DOT_PRODUCT_DISTANCE", shard_size="SHARD_SIZE_SMALL", index_update_method="STREAM_UPDATE", # 允许的值 BATCH_UPDATE , STREAM_UPDATE ) print( f"Vector Search 索引 {vs_index.display_name} 已创建,资源名称为 {vs_index.resource_name}" ) else: vs_index = aiplatform.MatchingEngineIndex(index_name=index_names[0]) print( f"Vector Search 索引 {vs_index.display_name} 已存在,资源名称为 {vs_index.resource_name}" )
输入 [ ]
已复制!
endpoint_names = [
endpoint.resource_name
for endpoint in aiplatform.MatchingEngineIndexEndpoint.list(
filter=f"display_name={VS_INDEX_ENDPOINT_NAME}"
)
]
if len(endpoint_names) == 0:
print(
f"Creating Vector Search index endpoint {VS_INDEX_ENDPOINT_NAME} ..."
)
vs_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
display_name=VS_INDEX_ENDPOINT_NAME, public_endpoint_enabled=True
)
print(
f"Vector Search index endpoint {vs_endpoint.display_name} created with resource name {vs_endpoint.resource_name}"
)
else:
vs_endpoint = aiplatform.MatchingEngineIndexEndpoint(
index_endpoint_name=endpoint_names[0]
)
print(
f"Vector Search index endpoint {vs_endpoint.display_name} exists with resource name {vs_endpoint.resource_name}"
)
endpoint_names = [ endpoint.resource_name for endpoint in aiplatform.MatchingEngineIndexEndpoint.list( filter=f"display_name={VS_INDEX_ENDPOINT_NAME}" ) ] if len(endpoint_names) == 0: print( f"正在创建 Vector Search 索引端点 {VS_INDEX_ENDPOINT_NAME} ..." ) vs_endpoint = aiplatform.MatchingEngineIndexEndpoint.create( display_name=VS_INDEX_ENDPOINT_NAME, public_endpoint_enabled=True ) print( f"Vector Search 索引端点 {vs_endpoint.display_name} 已创建,资源名称为 {vs_endpoint.resource_name}" ) else: vs_endpoint = aiplatform.MatchingEngineIndexEndpoint( index_endpoint_name=endpoint_names[0] ) print( f"Vector Search 索引端点 {vs_endpoint.display_name} 已存在,资源名称为 {vs_endpoint.resource_name}" )
输入 [ ]
已复制!
# check if endpoint exists
index_endpoints = [
(deployed_index.index_endpoint, deployed_index.deployed_index_id)
for deployed_index in vs_index.deployed_indexes
]
if len(index_endpoints) == 0:
print(
f"Deploying Vector Search index {vs_index.display_name} at endpoint {vs_endpoint.display_name} ..."
)
vs_deployed_index = vs_endpoint.deploy_index(
index=vs_index,
deployed_index_id=VS_INDEX_NAME,
display_name=VS_INDEX_NAME,
machine_type="e2-standard-16",
min_replica_count=1,
max_replica_count=1,
)
print(
f"Vector Search index {vs_index.display_name} is deployed at endpoint {vs_deployed_index.display_name}"
)
else:
vs_deployed_index = aiplatform.MatchingEngineIndexEndpoint(
index_endpoint_name=index_endpoints[0][0]
)
print(
f"Vector Search index {vs_index.display_name} is already deployed at endpoint {vs_deployed_index.display_name}"
)
# 检查端点是否存在 index_endpoints = [ (deployed_index.index_endpoint, deployed_index.deployed_index_id) for deployed_index in vs_index.deployed_indexes ] if len(index_endpoints) == 0: print( f"正在将 Vector Search 索引 {vs_index.display_name} 部署到端点 {vs_endpoint.display_name} ..." ) vs_deployed_index = vs_endpoint.deploy_index( index=vs_index, deployed_index_id=VS_INDEX_NAME, display_name=VS_INDEX_NAME, machine_type="e2-standard-16", min_replica_count=1, max_replica_count=1, ) print( f"Vector Search 索引 {vs_index.display_name} 已部署到端点 {vs_deployed_index.display_name}" ) else: vs_deployed_index = aiplatform.MatchingEngineIndexEndpoint( index_endpoint_name=index_endpoints[0][0] ) print( f"Vector Search 索引 {vs_index.display_name} 已部署到端点 {vs_deployed_index.display_name}" )
从文本创建向量存储¶
注意:如果您有现有的 Vertex AI Vector Search 索引和端点,可以使用以下代码进行分配
输入 [ ]
已复制!
# TODO : replace 1234567890123456789 with your actual index ID
vs_index = aiplatform.MatchingEngineIndex(index_name="1234567890123456789")
# TODO : replace 1234567890123456789 with your actual endpoint ID
vs_endpoint = aiplatform.MatchingEngineIndexEndpoint(
index_endpoint_name="1234567890123456789"
)
# TODO : 将 1234567890123456789 替换为您实际的索引 ID vs_index = aiplatform.MatchingEngineIndex(index_name="1234567890123456789") # TODO : 将 1234567890123456789 替换为您实际的端点 ID vs_endpoint = aiplatform.MatchingEngineIndexEndpoint( index_endpoint_name="1234567890123456789" )
输入 [ ]
已复制!
# import modules needed
from llama_index.core import (
StorageContext,
Settings,
VectorStoreIndex,
SimpleDirectoryReader,
)
from llama_index.core.schema import TextNode
from llama_index.core.vector_stores.types import (
MetadataFilters,
MetadataFilter,
FilterOperator,
)
from llama_index.llms.vertex import Vertex
from llama_index.embeddings.vertex import VertexTextEmbedding
from llama_index.vector_stores.vertexaivectorsearch import VertexAIVectorStore
# 导入所需模块 from llama_index.core import ( StorageContext, Settings, VectorStoreIndex, SimpleDirectoryReader, ) from llama_index.core.schema import TextNode from llama_index.core.vector_stores.types import ( MetadataFilters, MetadataFilter, FilterOperator, ) from llama_index.llms.vertex import Vertex from llama_index.embeddings.vertex import VertexTextEmbedding from llama_index.vector_stores.vertexaivectorsearch import VertexAIVectorStore
从纯文本创建不带元数据过滤器的简单向量存储¶
输入 [ ]
已复制!
# setup storage
vector_store = VertexAIVectorStore(
project_id=PROJECT_ID,
region=REGION,
index_id=vs_index.resource_name,
endpoint_id=vs_endpoint.resource_name,
gcs_bucket_name=GCS_BUCKET_NAME,
)
# set storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 设置存储 vector_store = VertexAIVectorStore( project_id=PROJECT_ID, region=REGION, index_id=vs_index.resource_name, endpoint_id=vs_endpoint.resource_name, gcs_bucket_name=GCS_BUCKET_NAME, ) # 设置存储上下文 storage_context = StorageContext.from_defaults(vector_store=vector_store)
使用Vertex AI Embeddings作为嵌入模型¶
输入 [ ]
已复制!
# configure embedding model
embed_model = VertexTextEmbedding(
model_name="textembedding-gecko@003",
project=PROJECT_ID,
location=REGION,
)
# setup the index/query process, ie the embedding model (and completion if used)
Settings.embed_model = embed_model
# 配置嵌入模型 embed_model = VertexTextEmbedding( model_name="textembedding-gecko@003", project=PROJECT_ID, location=REGION, ) # 设置索引/查询过程,即嵌入模型(以及如果使用的话,补全模型) Settings.embed_model = embed_model
将向量和映射的文本块添加到向量存储¶
输入 [ ]
已复制!
# Input texts
texts = [
"The cat sat on",
"the mat.",
"I like to",
"eat pizza for",
"dinner.",
"The sun sets",
"in the west.",
]
nodes = [
TextNode(text=text, embedding=embed_model.get_text_embedding(text))
for text in texts
]
vector_store.add(nodes)
# 输入文本 texts = [ "猫坐在", "垫子上。", "我喜欢", "吃披萨做", "晚餐。", "太阳下山", "在西方。", ] nodes = [ TextNode(text=text, embedding=embed_model.get_text_embedding(text)) for text in texts ] vector_store.add(nodes)
运行相似性搜索¶
输入 [ ]
已复制!
# define index from vector store
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store, embed_model=embed_model
)
retriever = index.as_retriever()
# 从向量存储定义索引 index = VectorStoreIndex.from_vector_store( vector_store=vector_store, embed_model=embed_model ) retriever = index.as_retriever()
输入 [ ]
已复制!
response = retriever.retrieve("pizza")
for row in response:
print(f"Score: {row.get_score():.3f} Text: {row.get_text()}")
response = retriever.retrieve("披萨") for row in response: print(f"得分: {row.get_score():.3f} 文本: {row.get_text()}")
Score: 0.703 Text: eat pizza for Score: 0.626 Text: dinner.
添加带有元数据属性的文档并使用过滤器¶
输入 [ ]
已复制!
# Input text with metadata
records = [
{
"description": "A versatile pair of dark-wash denim jeans."
"Made from durable cotton with a classic straight-leg cut, these jeans"
" transition easily from casual days to dressier occasions.",
"price": 65.00,
"color": "blue",
"season": ["fall", "winter", "spring"],
},
{
"description": "A lightweight linen button-down shirt in a crisp white."
" Perfect for keeping cool with breathable fabric and a relaxed fit.",
"price": 34.99,
"color": "white",
"season": ["summer", "spring"],
},
{
"description": "A soft, chunky knit sweater in a vibrant forest green. "
"The oversized fit and cozy wool blend make this ideal for staying warm "
"when the temperature drops.",
"price": 89.99,
"color": "green",
"season": ["fall", "winter"],
},
{
"description": "A classic crewneck t-shirt in a soft, heathered blue. "
"Made from comfortable cotton jersey, this t-shirt is a wardrobe essential "
"that works for every season.",
"price": 19.99,
"color": "blue",
"season": ["fall", "winter", "summer", "spring"],
},
{
"description": "A flowing midi-skirt in a delicate floral print. "
"Lightweight and airy, this skirt adds a touch of feminine style "
"to warmer days.",
"price": 45.00,
"color": "white",
"season": ["spring", "summer"],
},
]
nodes = []
for record in records:
text = record.pop("description")
embedding = embed_model.get_text_embedding(text)
metadata = {**record}
nodes.append(TextNode(text=text, embedding=embedding, metadata=metadata))
vector_store.add(nodes)
# 输入带元数据的记录 records = [ { "description": "一条多功能深色水洗牛仔裤。" "采用耐磨棉质面料,经典直筒剪裁,轻松从休闲日过渡到更正式的场合。", "price": 65.00, "color": "blue", "season": ["fall", "winter", "spring"], }, { "description": "一件轻质亚麻纽扣衬衫,采用挺括的白色。" "透气面料和宽松剪裁,非常适合保持凉爽。", "price": 34.99, "color": "white", "season": ["summer", "spring"], }, { "description": "一件柔软厚实的针织毛衣,采用充满活力的森林绿色。" "宽松剪裁和舒适羊毛混纺,是气温下降时保暖的理想选择。", "price": 89.99, "color": "green", "season": ["fall", "winter"], }, { "description": "一件经典圆领 T 恤,采用柔软的杂蓝色。" "采用舒适的棉质平纹针织面料,这件 T 恤是衣橱必备品,适合每个季节。", "price": 19.99, "color": "blue", "season": ["fall", "winter", "summer", "spring"], }, { "description": "一条飘逸的中长裙,带有精致的碎花图案。" "轻盈透气,这条裙子为温暖的日子增添一丝女性风格。", "price": 45.00, "color": "white", "season": ["spring", "summer"], }, ] nodes = [] for record in records: text = record.pop("description") embedding = embed_model.get_text_embedding(text) metadata = {**record} nodes.append(TextNode(text=text, embedding=embedding, metadata=metadata)) vector_store.add(nodes)
运行带过滤器的相似性搜索¶
输入 [ ]
已复制!
# define index from vector store
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store, embed_model=embed_model
)
# 从向量存储定义索引 index = VectorStoreIndex.from_vector_store( vector_store=vector_store, embed_model=embed_model )
输入 [ ]
已复制!
# simple similarity search without filter
retriever = index.as_retriever()
response = retriever.retrieve("pants")
for row in response:
print(f"Text: {row.get_text()}")
print(f" Score: {row.get_score():.3f}")
print(f" Metadata: {row.metadata}")
# 不带过滤器的简单相似性搜索 retriever = index.as_retriever() response = retriever.retrieve("裤子") for row in response: print(f"文本: {row.get_text()}") print(f" 得分: {row.get_score():.3f}") print(f" 元数据: {row.metadata}")
Text: A pair of well-tailored dress pants in a neutral grey. Made from a wrinkle-resistant blend, these pants look sharp and professional for workwear or formal occasions. Score: 0.669 Metadata: {'price': 69.99, 'color': 'grey', 'season': ['fall', 'winter', 'summer', 'spring']} Text: A pair of tailored black trousers in a comfortable stretch fabric. Perfect for work or dressier events, these trousers provide a sleek, polished look. Score: 0.642 Metadata: {'price': 59.99, 'color': 'black', 'season': ['fall', 'winter', 'spring']}
输入 [ ]
已复制!
# similarity search with text filter
filters = MetadataFilters(filters=[MetadataFilter(key="color", value="blue")])
retriever = index.as_retriever(filters=filters, similarity_top_k=3)
response = retriever.retrieve("denims")
for row in response:
print(f"Text: {row.get_text()}")
print(f" Score: {row.get_score():.3f}")
print(f" Metadata: {row.metadata}")
# 带文本过滤器的相似性搜索 filters = MetadataFilters(filters=[MetadataFilter(key="color", value="blue")]) retriever = index.as_retriever(filters=filters, similarity_top_k=3) response = retriever.retrieve("牛仔裤") for row in response: print(f"文本: {row.get_text()}") print(f" 得分: {row.get_score():.3f}") print(f" 元数据: {row.metadata}")
Text: A versatile pair of dark-wash denim jeans.Made from durable cotton with a classic straight-leg cut, these jeans transition easily from casual days to dressier occasions. Score: 0.704 Metadata: {'price': 65.0, 'color': 'blue', 'season': ['fall', 'winter', 'spring']} Text: A denim jacket with a faded wash and distressed details. This wardrobe staple adds a touch of effortless cool to any outfit. Score: 0.667 Metadata: {'price': 79.99, 'color': 'blue', 'season': ['fall', 'spring', 'summer']}
输入 [ ]
已复制!
# similarity search with text and numeric filter
filters = MetadataFilters(
filters=[
MetadataFilter(key="color", value="blue"),
MetadataFilter(key="price", operator=FilterOperator.GT, value=70.0),
]
)
retriever = index.as_retriever(filters=filters, similarity_top_k=3)
response = retriever.retrieve("denims")
for row in response:
print(f"Text: {row.get_text()}")
print(f" Score: {row.get_score():.3f}")
print(f" Metadata: {row.metadata}")
# 带文本和数字过滤器的相似性搜索 filters = MetadataFilters( filters=[ MetadataFilter(key="color", value="blue"), MetadataFilter(key="price", operator=FilterOperator.GT, value=70.0), ] ) retriever = index.as_retriever(filters=filters, similarity_top_k=3) response = retriever.retrieve("牛仔裤") for row in response: print(f"文本: {row.get_text()}") print(f" 得分: {row.get_score():.3f}") print(f" 元数据: {row.metadata}")
Text: A denim jacket with a faded wash and distressed details. This wardrobe staple adds a touch of effortless cool to any outfit. Score: 0.667 Metadata: {'price': 79.99, 'color': 'blue', 'season': ['fall', 'spring', 'summer']}
使用 Vertex AI Vector Search 和 Gemini Pro 解析、索引和查询 PDF¶
输入 [ ]
已复制!
! mkdir -p ./data/arxiv/
! wget 'https://arxiv.org/pdf/1706.03762.pdf' -O ./data/arxiv/test.pdf
! mkdir -p ./data/arxiv/ ! wget 'https://arxiv.org/pdf/1706.03762.pdf' -O ./data/arxiv/test.pdf
E0501 00:56:50.842446801 266241 backup_poller.cc:127] Run client channel backup poller: UNKNOWN:pollset_work {created_time:"2024-05-01T00:56:50.841935606+00:00", children:[UNKNOWN:Bad file descriptor {created_time:"2024-05-01T00:56:50.841810434+00:00", errno:9, os_error:"Bad file descriptor", syscall:"epoll_wait"}]} --2024-05-01 00:56:52-- https://arxiv.org/pdf/1706.03762.pdf Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.195.42, 151.101.131.42, ... Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://arxiv.org/pdf/1706.03762 [following] --2024-05-01 00:56:52-- http://arxiv.org/pdf/1706.03762 Connecting to arxiv.org (arxiv.org)|151.101.67.42|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2215244 (2.1M) [application/pdf] Saving to: ‘./data/arxiv/test.pdf’ ./data/arxiv/test.p 100%[===================>] 2.11M --.-KB/s in 0.07s 2024-05-01 00:56:52 (31.5 MB/s) - ‘./data/arxiv/test.pdf’ saved [2215244/2215244]
输入 [ ]
已复制!
# load documents
documents = SimpleDirectoryReader("./data/arxiv/").load_data()
print(f"# of documents = {len(documents)}")
# 加载文档 documents = SimpleDirectoryReader("./data/arxiv/").load_data() print(f"文档数量 = {len(documents)}")
# of documents = 15
输入 [ ]
已复制!
# setup storage
vector_store = VertexAIVectorStore(
project_id=PROJECT_ID,
region=REGION,
index_id=vs_index.resource_name,
endpoint_id=vs_endpoint.resource_name,
gcs_bucket_name=GCS_BUCKET_NAME,
)
# set storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# configure embedding model
embed_model = VertexTextEmbedding(
model_name="textembedding-gecko@003",
project=PROJECT_ID,
location=REGION,
)
vertex_gemini = Vertex(
model="gemini-pro",
context_window=100000,
temperature=0,
additional_kwargs={},
)
# setup the index/query process, ie the embedding model (and completion if used)
Settings.llm = vertex_gemini
Settings.embed_model = embed_model
# 设置存储 vector_store = VertexAIVectorStore( project_id=PROJECT_ID, region=REGION, index_id=vs_index.resource_name, endpoint_id=vs_endpoint.resource_name, gcs_bucket_name=GCS_BUCKET_NAME, ) # 设置存储上下文 storage_context = StorageContext.from_defaults(vector_store=vector_store) # 配置嵌入模型 embed_model = VertexTextEmbedding( model_name="textembedding-gecko@003", project=PROJECT_ID, location=REGION, ) vertex_gemini = Vertex( model="gemini-pro", context_window=100000, temperature=0, additional_kwargs={}, ) # 设置索引/查询过程,即嵌入模型(以及如果使用的话,补全模型) Settings.llm = vertex_gemini Settings.embed_model = embed_model
输入 [ ]
已复制!
# define index from vector store
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# 从向量存储定义索引 index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )
输入 [ ]
已复制!
query_engine = index.as_query_engine()
query_engine = index.as_query_engine()
输入 [ ]
已复制!
response = query_engine.query(
"who are the authors of paper Attention is All you need?"
)
print(f"Response:")
print("-" * 80)
print(response.response)
print("-" * 80)
print(f"Source Documents:")
print("-" * 80)
for source in response.source_nodes:
print(f"Sample Text: {source.text[:50]}")
print(f"Relevance score: {source.get_score():.3f}")
print(f"File Name: {source.metadata.get('file_name')}")
print(f"Page #: {source.metadata.get('page_label')}")
print(f"File Path: {source.metadata.get('file_path')}")
print("-" * 80)
response = query_engine.query( "论文《Attention is All you need》的作者是谁?" ) print(f"响应:") print("-" * 80) print(response.response) print("-" * 80) print(f"源文档:") print("-" * 80) for source in response.source_nodes: print(f"样本文本: {source.text[:50]}") print(f"相关性得分: {source.get_score():.3f}") print(f"文件名: {source.metadata.get('file_name')}") print(f"页码: {source.metadata.get('page_label')}") print(f"文件路径: {source.metadata.get('file_path')}") print("-" * 80)
Response: -------------------------------------------------------------------------------- The authors of the paper "Attention Is All You Need" are: * Ashish Vaswani * Noam Shazeer * Niki Parmar * Jakob Uszkoreit * Llion Jones * Aidan N. Gomez * Łukasz Kaiser * Illia Polosukhin -------------------------------------------------------------------------------- Source Documents: -------------------------------------------------------------------------------- Sample Text: Provided proper attribution is provided, Google he Relevance score: 0.720 File Name: test.pdf Page #: 1 File Path: /home/jupyter/llama_index/docs/docs/examples/vector_stores/data/arxiv/test.pdf -------------------------------------------------------------------------------- Sample Text: length nis smaller than the representation dimensi Relevance score: 0.678 File Name: test.pdf Page #: 7 File Path: /home/jupyter/llama_index/docs/docs/examples/vector_stores/data/arxiv/test.pdf --------------------------------------------------------------------------------
清理¶
请在运行实验后删除 Vertex AI Vector Search 索引和索引端点,以避免产生额外费用。请注意,只要端点正在运行,就会收费。
⚠️ 注意:启用 `CLEANUP_RESOURCES` 标志会删除 Vector Search 索引、索引端点和 Cloud Storage 存储桶。请谨慎运行。
输入 [ ]
已复制!
CLEANUP_RESOURCES = False
CLEANUP_RESOURCES = False
- 取消部署索引并删除索引端点
输入 [ ]
已复制!
if CLEANUP_RESOURCES:
print(
f"Undeploying all indexes and deleting the index endpoint {vs_endpoint.display_name}"
)
vs_endpoint.undeploy_all()
vs_endpoint.delete()
if CLEANUP_RESOURCES: print( f"正在取消部署所有索引并删除索引端点 {vs_endpoint.display_name}" ) vs_endpoint.undeploy_all() vs_endpoint.delete()
- 删除索引
输入 [ ]
已复制!
if CLEANUP_RESOURCES:
print(f"Deleting the index {vs_index.display_name}")
vs_index.delete()
if CLEANUP_RESOURCES: print(f"正在删除索引 {vs_index.display_name}") vs_index.delete()
- 删除 Cloud Storage 存储桶中的内容
输入 [ ]
已复制!
if CLEANUP_RESOURCES and "GCS_BUCKET_NAME" in globals():
print(f"Deleting contents from the Cloud Storage bucket {GCS_BUCKET_NAME}")
shell_output = ! gsutil du -ash gs://$GCS_BUCKET_NAME
print(shell_output)
print(
f"Size of the bucket {GCS_BUCKET_NAME} before deleting = {' '.join(shell_output[0].split()[:2])}"
)
# uncomment below line to delete contents of the bucket
# ! gsutil -m rm -r gs://$GCS_BUCKET_NAME
if CLEANUP_RESOURCES and "GCS_BUCKET_NAME" in globals(): print(f"正在删除 Cloud Storage 存储桶 {GCS_BUCKET_NAME} 中的内容") shell_output = ! gsutil du -ash gs://$GCS_BUCKET_NAME print(shell_output) print( f"删除前存储桶 {GCS_BUCKET_NAME} 的大小 = {' '.join(shell_output[0].split()[:2])}" ) # 取消注释下一行可删除存储桶中的内容 # ! gsutil -m rm -r gs://$GCS_BUCKET_NAME