Google Cloud SQL for PostgreSQL - `PostgresReader`¶

Cloud SQL 是一项全托管的关系型数据库服务，提供高性能、无缝集成和出色的可扩展性。它支持 MySQL、PostgreSQL 和 SQL Server 数据库引擎。利用 Cloud SQL 的 LlamaIndex 集成，扩展您的数据库应用以构建 AI 驱动的体验。

本笔记本将介绍如何使用 Cloud SQL for PostgreSQL 通过 PostgresReader 类将数据作为文档检索出来。

在 GitHub 上了解有关该包的更多信息。

开始之前¶

要运行本笔记本，您需要执行以下操作：

🦙 库安装¶

安装集成库 llama-index-cloud-sql-pg。

**仅适用于 Colab：** 取消注释以下单元格以重启内核，或使用按钮重启内核。对于 Vertex AI Workbench，您可以使用顶部的按钮重启终端。

In [ ]

Copied!

# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
# # Automatically restart kernel after installs so that your environment can access the new packages # import IPython # app = IPython.Application.instance() # app.kernel.do_shutdown(True)

🔐 身份验证¶

以登录此笔记本的 IAM 用户身份向 Google Cloud 进行身份验证，以便访问您的 Google Cloud 项目。

如果您使用 Colab 运行此笔记本，请使用以下单元格并继续。
如果您使用 Vertex AI Workbench，请查看此处的设置说明 here。

In [ ]

Copied!

from google.colab import auth

auth.authenticate_user()
from google.colab import auth auth.authenticate_user()

☁ 设置您的 Google Cloud 项目¶

设置您的 Google Cloud 项目，以便您可以在本笔记本中利用 Google Cloud 资源。

如果您不知道您的项目 ID，请尝试以下方法：

运行 gcloud config list。
运行 gcloud projects list。
请参阅支持页面：查找项目 ID。

In [ ]

Copied!

# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
# @markdown 请在下方填写您的 Google Cloud 项目 ID 并运行此单元格。 PROJECT_ID = "my-project-id" # @param {type:"string"} # Set the project id !gcloud config set project {PROJECT_ID}

基本用法¶

设置 Cloud SQL 数据库值¶

在 Cloud SQL 实例页面中找到您的数据库值。

In [ ]

Copied!





# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "reader_table"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}
# @title 在此处设置您的值 { display-mode: "form" } REGION = "us-central1" # @param {type: "string"} INSTANCE = "my-primary" # @param {type: "string"} DATABASE = "my-database" # @param {type: "string"} TABLE_NAME = "reader_table" # @param {type: "string"} USER = "postgres" # @param {type: "string"} PASSWORD = "my-password" # @param {type: "string"}

PostgresEngine 连接池¶

要将 Cloud SQL 建立为读取器，其中一个要求和参数是 PostgresEngine 对象。PostgresEngine 为您的 Cloud SQL 数据库配置连接池，使您的应用程序能够成功连接并遵循行业最佳实践。

使用 PostgresEngine.from_instance() 创建 PostgresEngine 需要提供以下 4 个参数：

project_id : Cloud SQL 实例所在的 Google Cloud 项目 ID。
region : Cloud SQL 实例所在的区域。
instance : Cloud SQL 实例的名称。
database : 要连接的 Cloud SQL 实例上的数据库名称。

默认情况下，将使用 IAM 数据库身份验证作为数据库身份验证方法。此库使用源自环境的应用默认凭据 (ADC) 所属的 IAM 主体。

有关 IAM 数据库身份验证的更多信息，请参阅：

或者，也可以使用用户名和密码的内置数据库身份验证来访问 Cloud SQL 数据库。只需向 PostgresEngine.from_instance() 提供可选的 user 和 password 参数即可。

user : 用于内置数据库身份验证和登录的数据库用户。
password : 用于内置数据库身份验证和登录的数据库密码。

**注意：** 本教程演示异步接口。所有异步方法都有对应的同步方法。

In [ ]

Copied!





from llama_index_cloud_sql_pg import PostgresEngine

engine = await PostgresEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)
from llama_index_cloud_sql_pg import PostgresEngine engine = await PostgresEngine.afrom_instance( project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE, user=USER, password=PASSWORD, )

创建 PostgresReader¶

创建 PostgresReader 以从 Cloud SQL Postgres 中获取数据时，您有两种主要选项来指定要加载的数据：

使用 table_name 参数 - 当您指定 table_name 参数时，您是在告诉读取器从给定表中获取所有数据。
使用 query 参数 - 当您指定 query 参数时，您可以提供自定义 SQL 查询来获取数据。这允许您完全控制 SQL 查询，包括选择特定列、应用筛选器、排序、连接表等。

使用 `table_name` 参数加载文档¶

通过默认表加载文档¶

读取器返回一个文档列表，使用第一列作为文本，所有其他列作为元数据，每行一个文档。默认表将把第一列作为文本，第二列作为元数据（JSON）。每行成为一个文档。

In [ ]

Copied!





from llama_index_cloud_sql_pg import PostgresReader

# Creating a basic PostgresReader object
reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
)
from llama_index_cloud_sql_pg import PostgresReader # Creating a basic PostgresReader object reader = await PostgresReader.create( engine, table_name=TABLE_NAME, # schema_name=SCHEMA_NAME, )

通过自定义表/元数据或自定义页面内容列加载文档¶

In [ ]

Copied!





reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name"],  # Optional
    metadata_columns=["id"],  # Optional
)
reader = await PostgresReader.create( engine, table_name=TABLE_NAME, # schema_name=SCHEMA_NAME, content_columns=["product_name"], # Optional metadata_columns=["id"], # Optional )

使用 SQL 查询加载文档¶

query 参数允许用户指定自定义 SQL 查询，其中可以包含筛选器以从数据库中加载特定文档。

In [ ]

Copied!





table_name = "products"
content_columns = ["product_name", "description"]
metadata_columns = ["id", "content"]

reader = PostgresReader.create(
    engine=engine,
    query=f"SELECT * FROM {table_name};",
    content_columns=content_columns,
    metadata_columns=metadata_columns,
)
table_name = "products" content_columns = ["product_name", "description"] metadata_columns = ["id", "content"] reader = PostgresReader.create( engine=engine, query=f"SELECT * FROM {table_name};", content_columns=content_columns, metadata_columns=metadata_columns, )

**注意**：如果未指定 content_columns 和 metadata_columns，读取器将自动将返回的第一列视为文档的 text，并将所有后续列视为 metadata。

设置页面内容格式¶

读取器返回一个文档列表，每行一个文档，页面内容采用指定的字符串格式，例如 text（空格分隔的串联）、JSON、YAML、CSV 等。JSON 和 YAML 格式包含标头，而 text 和 CSV 不包含字段标头。

In [ ]

Copied!





reader = await PostgresReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name", "description"],
    format="YAML",
)
reader = await PostgresReader.create( engine, table_name=TABLE_NAME, # schema_name=SCHEMA_NAME, content_columns=["product_name", "description"], format="YAML", )

加载文档¶

您可以选择通过两种方式加载文档：

一次性加载所有数据
延迟加载数据

一次性加载所有数据¶

In [ ]

Copied!

docs = await reader.aload_data()

print(docs)
docs = await reader.aload_data() print(docs)

延迟加载数据¶

In [ ]

Copied!

docs_iterable = reader.alazy_load_data()

docs = []
async for doc in docs_iterable:
    docs.append(doc)

print(docs)
docs_iterable = reader.alazy_load_data() docs = [] async for doc in docs_iterable: docs.append(doc) print(docs)

Google Cloud SQL for PostgreSQL - PostgresReader¶

开始之前¶

🦙 库安装¶

🔐 身份验证¶

☁ 设置您的 Google Cloud 项目¶

基本用法¶

设置 Cloud SQL 数据库值¶

PostgresEngine 连接池¶

创建 PostgresReader¶

使用 table_name 参数加载文档¶

通过默认表加载文档¶

通过自定义表/元数据或自定义页面内容列加载文档¶

使用 SQL 查询加载文档¶

设置页面内容格式¶

加载文档¶

一次性加载所有数据¶

延迟加载数据¶

Google Cloud SQL for PostgreSQL - `PostgresReader`¶

使用 `table_name` 参数加载文档¶