TiDB Cloud 是一个全面的数据库即服务 (DBaaS) 解决方案,提供专用和 Serverless 选项。TiDB Serverless 正在将内置向量搜索集成到 MySQL 生态系统中。通过此增强功能,您无需新的数据库或额外的技术栈,即可使用 TiDB Serverless 无缝开发 AI 应用程序。创建免费的 TiDB Serverless 集群,并在 https://pingcap.com/ai 开始使用向量搜索功能。
本 Notebook 提供了在 LlamaIndex 中使用 TiDB 向量搜索的详细指南。
设置环境¶
In [ ]
%pip install llama-index-vector-stores-tidbvector
%pip install llama-index
import textwrap
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.tidbvector import TiDBVectorStore
import getpass import os os.environ["OPENAI_API_KEY"] = getpass.getpass("请输入您的 OpenAI API Key:")
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass("Input your OpenAI API key:")
前往您的 TiDB Cloud 集群控制台,导航到 Connect 页面。
- 选择使用 SQLAlchemy 和 PyMySQL 连接的选项,并复制提供的连接 URL (不含密码)。
- 将连接 URL 粘贴到您的代码中,替换 tidb_connection_string_template 变量。
- 输入您的密码。
- # 替换为从 TiDB Cloud 控制台获取的 TiDB 连接字符串 tidb_connection_string_template = "mysql+pymysql://
# replace with your tidb connect string from tidb cloud console
tidb_connection_string_template = "mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true"
# type your tidb password
tidb_password = getpass.getpass("Input your TiDB password:")
tidb_connection_url = tidb_connection_string_template.replace(
"<PASSWORD>", tidb_password
)
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
for index, document in enumerate(documents):
document.metadata = {"book": "paul_graham"}
Document ID: 86e12675-2e9a-4097-847c-8b981dd41806
下面的代码片段在 TiDB 中创建了一个名为 VECTOR_TABLE_NAME 的表,该表针对向量搜索进行了优化。成功执行此代码后,您将能够在 TiDB 数据库环境中直接查看和访问 VECTOR_TABLE_NAME 表
VECTOR_TABLE_NAME = "paul_graham_test" tidbvec = TiDBVectorStore( connection_string=tidb_connection_url, table_name=VECTOR_TABLE_NAME, distance_strategy="cosine", vector_dimension=1536, drop_existing_table=False, )
VECTOR_TABLE_NAME = "paul_graham_test"
tidbvec = TiDBVectorStore(
connection_string=tidb_connection_url,
table_name=VECTOR_TABLE_NAME,
distance_strategy="cosine",
vector_dimension=1536,
drop_existing_table=False,
)
storage_context = StorageContext.from_defaults(vector_store=tidbvec) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, show_progress=True )
storage_context = StorageContext.from_defaults(vector_store=tidbvec)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, show_progress=True
)
语义相似度搜索¶
storage_context = StorageContext.from_defaults(vector_store=tidbvec)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, insert_batch_size=1000, show_progress=True
)
本节重点介绍向量搜索基础知识以及如何使用元数据过滤器优化结果。请注意,TiDB 向量仅支持 Deafult VectorStoreQueryMode。
query_engine = index.as_query_engine() response = query_engine.query("作者做了什么?") print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do?")
print(textwrap.fill(str(response), 100))
The author wrote a book.
使用元数据过滤器执行搜索,以检索与应用过滤器对齐的指定数量的最近邻结果。
from llama_index.core.vector_stores.types import ( MetadataFilter, MetadataFilters, ) query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ MetadataFilter(key="book", value="paul_graham", operator="!="), ] ), similarity_top_k=2, ) response = query_engine.query("作者学到了什么?") print(textwrap.fill(str(response), 100))
from llama_index.core.vector_stores.types import (
MetadataFilter,
MetadataFilters,
)
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
MetadataFilter(key="book", value="paul_graham", operator="!="),
]
),
similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
Empty Response
from llama_index.core.vector_stores.types import ( MetadataFilter, MetadataFilters, ) query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ MetadataFilter(key="book", value="paul_graham", operator="=="), ] ), similarity_top_k=2, ) response = query_engine.query("作者学到了什么?") print(textwrap.fill(str(response), 100))
from llama_index.core.vector_stores.types import (
MetadataFilter,
MetadataFilters,
)
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
MetadataFilter(key="book", value="paul_graham", operator="=="),
]
),
similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
The author learned valuable lessons from his experiences.
tidbvec.delete(documents[0].doc_id)
tidbvec.delete(documents[0].doc_id)
query_engine = index.as_query_engine() response = query_engine.query("作者学到了什么?") print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
Empty Response