OpenAI Agent + 查询引擎实验手册¶

在本 notebook 中，我们将通过多种查询引擎工具和数据集来试用 OpenAIAgent。我们将探索 OpenAIAgent 如何与我们现有的检索器/查询引擎解决的工作流进行比较或替代。

自动检索
SQL 和向量联合搜索

注意：任何 Text-to-SQL 应用程序都应该意识到执行任意 SQL 查询可能存在安全风险。建议根据需要采取预防措施，例如使用受限角色、只读数据库、沙箱等。

从向量数据库自动检索¶

我们现有的“自动检索”能力（在 VectorIndexAutoRetriever 中）允许 LLM 推断向量数据库的正确查询参数，包括查询字符串和元数据过滤器。

由于 OpenAI Function API 可以推断函数参数，我们在此探索其执行自动检索的能力。

如果您在 Colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制！

%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-vector-stores-pinecone
%pip install llama-index %pip install llama-index-llms-openai %pip install llama-index-readers-wikipedia %pip install llama-index-vector-stores-pinecone

In [ ]

已复制！

import os

os.environ["PINECONE_API_KEY"] = "..."
os.environ["OPENAI_API_KEY"] = "..."
import os os.environ["PINECONE_API_KEY"] = "..." os.environ["OPENAI_API_KEY"] = "..."

In [ ]

已复制！

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core import Settings Settings.llm = OpenAI(model="gpt-4o-mini") Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [ ]

已复制！

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

In [ ]

已复制！





# dimensions are for text-embedding-3-small
pc.create_index(
    name="quickstart-index",
    dimension=1536,
    metric="euclidean",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# may need to wait for index to be created
import time

time.sleep(10)
# dimensions are for text-embedding-3-small pc.create_index( name="quickstart-index", dimension=1536, metric="euclidean", spec=ServerlessSpec(cloud="aws", region="us-east-1"), ) # may need to wait for index to be created import time time.sleep(10)

Out[ ]

{
    "name": "quickstart-index",
    "metric": "euclidean",
    "host": "quickstart-index-c2e1535.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 1536,
    "deletion_protection": "disabled",
    "tags": null
}

In [ ]

已复制！

index = pc.Index("quickstart-index")
index = pc.Index("quickstart-index")

In [ ]

已复制！

# Optional: delete data in your pinecone index
# index.delete(deleteAll=True, namespace="test")
# Optional: delete data in your pinecone index # index.delete(deleteAll=True, namespace="test")

In [ ]

已复制！

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.pinecone import PineconeVectorStore

In [ ]

已复制！





from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text=(
            "Michael Jordan is a retired professional basketball player,"
            " widely regarded as one of the greatest basketball players of all"
            " time."
        ),
        metadata={
            "category": "Sports",
            "country": "United States",
            "gender": "male",
            "born": 1963,
        },
    ),
    TextNode(
        text=(
            "Angelina Jolie is an American actress, filmmaker, and"
            " humanitarian. She has received numerous awards for her acting"
            " and is known for her philanthropic work."
        ),
        metadata={
            "category": "Entertainment",
            "country": "United States",
            "gender": "female",
            "born": 1975,
        },
    ),
    TextNode(
        text=(
            "Elon Musk is a business magnate, industrial designer, and"
            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
            " Tesla, Inc., Neuralink, and The Boring Company."
        ),
        metadata={
            "category": "Business",
            "country": "United States",
            "gender": "male",
            "born": 1971,
        },
    ),
    TextNode(
        text=(
            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
            " has achieved significant success in the music industry and is"
            " known for her versatile musical style."
        ),
        metadata={
            "category": "Music",
            "country": "Barbados",
            "gender": "female",
            "born": 1988,
        },
    ),
    TextNode(
        text=(
            "Cristiano Ronaldo is a Portuguese professional footballer who is"
            " considered one of the greatest football players of all time. He"
            " has won numerous awards and set multiple records during his"
            " career."
        ),
        metadata={
            "category": "Sports",
            "country": "Portugal",
            "gender": "male",
            "born": 1985,
        },
    ),
]
from llama_index.core.schema import TextNode nodes = [ TextNode( text=( "Michael Jordan is a retired professional basketball player," " widely regarded as one of the greatest basketball players of all" " time." ), metadata={ "category": "Sports", "country": "United States", "gender": "male", "born": 1963, }, ), TextNode( text=( "Angelina Jolie is an American actress, filmmaker, and" " humanitarian. She has received numerous awards for her acting" " and is known for her philanthropic work." ), metadata={ "category": "Entertainment", "country": "United States", "gender": "female", "born": 1975, }, ), TextNode( text=( "Elon Musk is a business magnate, industrial designer, and" " engineer. He is the founder, CEO, and lead designer of SpaceX," " Tesla, Inc., Neuralink, and The Boring Company." ), metadata={ "category": "Business", "country": "United States", "gender": "male", "born": 1971, }, ), TextNode( text=( "Rihanna is a Barbadian singer, actress, and businesswoman. She" " has achieved significant success in the music industry and is" " known for her versatile musical style." ), metadata={ "category": "Music", "country": "Barbados", "gender": "female", "born": 1988, ), ), TextNode( text=( "Cristiano Ronaldo is a Portuguese professional footballer who is" " considered one of the greatest football players of all time. He" " has won numerous awards and set multiple records during his" " career." ), metadata={ "category": "Sports", "country": "Portugal", "gender": "male", "born": 1985, }, ), ]

In [ ]

已复制！

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import StorageContext

vector_store = PineconeVectorStore(pinecone_index=index, namespace="test")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
from llama_index.vector_stores.pinecone import PineconeVectorStore from llama_index.core import StorageContext vector_store = PineconeVectorStore(pinecone_index=index, namespace="test") storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [ ]

已复制！

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex(nodes, storage_context=storage_context)
from llama_index.core import VectorStoreIndex index = VectorStoreIndex(nodes, storage_context=storage_context)

Upserted vectors:   0%|          | 0/5 [00:00<?, ?it/s]

定义函数工具¶

在此，我们定义了函数接口，该接口将被传递给 OpenAI 以执行自动检索。

我们未能使 OpenAI 支持嵌套的 pydantic 对象或元组作为参数，因此我们将元数据过滤器键和值转换为列表，以便函数 API 可以使用。

In [ ]

已复制！





# define function tool
from llama_index.core.tools import FunctionTool
from llama_index.core.vector_stores import (
    VectorStoreInfo,
    MetadataInfo,
    MetadataFilter,
    MetadataFilters,
    FilterCondition,
    FilterOperator,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

from typing import List, Tuple, Any
from pydantic import BaseModel, Field


# define vector store info describing schema of vector store
vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(
            name="category",
            type="str",
            description=(
                "Category of the celebrity, one of [Sports, Entertainment,"
                " Business, Music]"
            ),
        ),
        MetadataInfo(
            name="country",
            type="str",
            description=(
                "Country of the celebrity, one of [United States, Barbados,"
                " Portugal]"
            ),
        ),
        MetadataInfo(
            name="gender",
            type="str",
            description=("Gender of the celebrity, one of [male, female]"),
        ),
        MetadataInfo(
            name="born",
            type="int",
            description=("Born year of the celebrity, could be any integer"),
        ),
    ],
)
# define function tool from llama_index.core.tools import FunctionTool from llama_index.core.vector_stores import ( VectorStoreInfo, MetadataInfo, MetadataFilter, MetadataFilters, FilterCondition, FilterOperator, ) from llama_index.core.retrievers import VectorIndexRetriever from llama_index.core.query_engine import RetrieverQueryEngine from typing import List, Tuple, Any from pydantic import BaseModel, Field # define vector store info describing schema of vector store vector_store_info = VectorStoreInfo( content_info="brief biography of celebrities", metadata_info=[ MetadataInfo( name="category", type="str", description=( "Category of the celebrity, one of [Sports, Entertainment," " Business, Music]" ), ), MetadataInfo( name="country", type="str", description=( "Country of the celebrity, one of [United States, Barbados," " Portugal]" ), ), MetadataInfo( name="gender", type="str", description=("Gender of the celebrity, one of [male, female]"), ), MetadataInfo( name="born", type="int", description=("Born year of the celebrity, could be any integer"), ), ], )

定义自动检索函数

In [ ]

已复制！





from typing import Any, Annotated


async def auto_retrieve_fn(
    query: Annotated[str, "The natural language query/question to answer."],
    filter_key_list: Annotated[
        List[str], "List of metadata filter field names"
    ],
    filter_value_list: Annotated[
        List[Any],
        "List of metadata filter field values (corresponding to names in filter_key_list)",
    ],
    filter_operator_list: Annotated[
        List[str],
        "Metadata filters conditions (could be one of <, <=, >, >=, ==, !=)",
    ],
    filter_condition: Annotated[
        str, "Metadata filters condition values (could be AND or OR)"
    ],
    top_k: Annotated[
        int, "The number of results to return from the vector database."
    ],
):
    """Auto retrieval function.

    Performs auto-retrieval from a vector database, and then applies a set of filters.

    """
    query = query or "Query"

    metadata_filters = [
        MetadataFilter(key=k, value=v, operator=op)
        for k, v, op in zip(
            filter_key_list, filter_value_list, filter_operator_list
        )
    ]
    retriever = VectorIndexRetriever(
        index,
        filters=MetadataFilters(
            filters=metadata_filters, condition=filter_condition.lower()
        ),
        top_k=top_k,
    )
    query_engine = RetrieverQueryEngine.from_args(retriever)

    response = await query_engine.aquery(query)
    return str(response)


description = f"""\
Use this tool to look up biographical information about celebrities.
The vector database schema is given below:

<schema>
{vector_store_info.model_dump_json()}
</schema>
"""

auto_retrieve_tool = FunctionTool.from_defaults(
    auto_retrieve_fn,
    name="celebrity_bios",
    description=description,
)
from typing import Any, Annotated async def auto_retrieve_fn( query: Annotated[str, "The natural language query/question to answer."], filter_key_list: Annotated[ List[str], "List of metadata filter field names" ], filter_value_list: Annotated[ List[Any], "List of metadata filter field values (corresponding to names in filter_key_list)", ], filter_operator_list: Annotated[ List[str], "Metadata filters conditions (could be one of <, <=, >, >=, ==, !=)", ], filter_condition: Annotated[ str, "Metadata filters condition values (could be AND or OR)" ], top_k: Annotated[ int, "The number of results to return from the vector database." ], ): """Auto retrieval function. Performs auto-retrieval from a vector database, and then applies a set of filters. """ query = query or "Query" metadata_filters = [ MetadataFilter(key=k, value=v, operator=op) for k, v, op in zip( filter_key_list, filter_value_list, filter_operator_list ) ] retriever = VectorIndexRetriever( index, filters=MetadataFilters( filters=metadata_filters, condition=filter_condition.lower() ), top_k=top_k, ) query_engine = RetrieverQueryEngine.from_args(retriever) response = await query_engine.aquery(query) return str(response) description = f"""\ Use this tool to look up biographical information about celebrities. The vector database schema is given below{vector_store_info.model_dump_json()}""" auto_retrieve_tool = FunctionTool.from_defaults( auto_retrieve_fn, name="celebrity_bios", description=description, )

初始化 Agent¶

In [ ]

已复制！





from llama_index.core.agent.workflow import FunctionAgent
from llama_index.core.workflow import Context
from llama_index.llms.openai import OpenAI

agent = FunctionAgent(
    tools=[auto_retrieve_tool],
    llm=OpenAI(model="gpt-4o"),
    system_prompt=(
        "You are a helpful assistant that can answer questions about celebrities by writing a filtered query to a vector database. "
        "Unless the user is asking to compare things, you generally only need to make one call to the retriever."
    ),
)

# hold the context/session state for the agent
ctx = Context(agent)
from llama_index.core.agent.workflow import FunctionAgent from llama_index.core.workflow import Context from llama_index.llms.openai import OpenAI agent = FunctionAgent( tools=[auto_retrieve_tool], llm=OpenAI(model="gpt-4o"), system_prompt=( "你是一个有用的助手，可以通过向向量数据库编写过滤查询来回答有关名人的问题。除非用户要求比较事物，否则通常只需调用检索器一次。" ), ) # hold the context/session state for the agent ctx = Context(agent)

In [ ]

已复制！





from llama_index.core.agent.workflow import (
    ToolCallResult,
    ToolCall,
    AgentStream,
    AgentInput,
    AgentOutput,
)

handler = agent.run(
    "Tell me about two celebrities from the United States. ", ctx=ctx
)

async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print(
            f"\nCalled tool {ev.tool_name} with args {ev.tool_kwargs}, got response: {ev.tool_output}"
        )
    elif isinstance(ev, AgentStream):
        print(ev.delta, end="", flush=True)

response = await handler
from llama_index.core.agent.workflow import ( ToolCallResult, ToolCall, AgentStream, AgentInput, AgentOutput, ) handler = agent.run( "告诉我关于来自美国的两位名人的信息。 ", ctx=ctx ) async for ev in handler.stream_events(): if isinstance(ev, ToolCallResult): print( f"\n调用了工具 {ev.tool_name}，参数为 {ev.tool_kwargs}，得到响应：{ev.tool_output}" ) elif isinstance(ev, AgentStream): print(ev.delta, end="", flush=True) response = await handler

Called tool celebrity_bios with args {'query': 'celebrities from the United States', 'filter_key_list': ['country'], 'filter_value_list': ['United States'], 'filter_operator_list': ['=='], 'filter_condition': 'AND', 'top_k': 2}, got response: Angelina Jolie and Elon Musk are notable celebrities from the United States.
Here are two celebrities from the United States:

1. **Angelina Jolie**: She is a renowned actress, filmmaker, and humanitarian. Jolie has received numerous accolades, including an Academy Award and three Golden Globe Awards. She is also known for her humanitarian efforts, particularly her work with refugees as a Special Envoy for the United Nations High Commissioner for Refugees (UNHCR).

2. **Elon Musk**: He is a prominent entrepreneur and business magnate. Musk is the CEO and lead designer of SpaceX, CEO and product architect of Tesla, Inc., and has been involved in numerous other ventures, including Neuralink and The Boring Company. He is known for his ambitious vision of the future, including space exploration and sustainable energy.

In [ ]

已复制！





handler = agent.run("Tell me about two celebrities born after 1980. ", ctx=ctx)

async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print(
            f"\nCalled tool {ev.tool_name} with args {ev.tool_kwargs}, got response: {ev.tool_output}"
        )
    elif isinstance(ev, AgentStream):
        print(ev.delta, end="", flush=True)

response = await handler
handler = agent.run("告诉我关于出生在 1980 年后的两位名人的信息。 ", ctx=ctx) async for ev in handler.stream_events(): if isinstance(ev, ToolCallResult): print( f"\n调用了工具 {ev.tool_name}，参数为 {ev.tool_kwargs}，得到响应：{ev.tool_output}" ) elif isinstance(ev, AgentStream): print(ev.delta, end="", flush=True) response = await handler

Called tool celebrity_bios with args {'query': 'celebrities born after 1980', 'filter_key_list': ['born'], 'filter_value_list': [1980], 'filter_operator_list': ['>'], 'filter_condition': 'AND', 'top_k': 2}, got response: Rihanna, born in 1988, is a celebrity who fits the criteria of being born after 1980.
Here is a celebrity born after 1980:

- **Rihanna**: Born in 1988, Rihanna is a Barbadian singer, actress, and businesswoman. She gained worldwide fame with her music career, producing hits like "Umbrella," "Diamonds," and "Work." Beyond music, Rihanna has made a significant impact in the fashion and beauty industries with her Fenty brand, known for its inclusivity and innovation.

In [ ]

已复制！

response = await agent.run(
    "Tell me about few celebrities under category business and born after 1950. ",
    ctx=ctx,
)
print(str(response))
response = await agent.run( "告诉我关于属于商业类别且出生在 1950 年后的几位名人的信息。 ", ctx=ctx, ) print(str(response))

Here is a celebrity in the business category who was born after 1950:

- **Elon Musk**: He is a prominent entrepreneur and business magnate, born in 1971. Musk is the CEO and lead designer of SpaceX, CEO and product architect of Tesla, Inc., and has been involved in numerous other ventures, including Neuralink and The Boring Company. He is known for his ambitious vision of the future, including space exploration and sustainable energy.

Text-to-SQL 和语义联合搜索¶

这目前由我们的 SQLAutoVectorQueryEngine 处理。

让我们通过为我们的 OpenAIAgent 提供两个查询工具（SQL 和 Vector）的访问权限来实现这一点。

注意：任何 Text-to-SQL 应用程序都应该意识到执行任意 SQL 查询可能存在安全风险。建议根据需要采取预防措施，例如使用受限角色、只读数据库、沙箱等。

加载和索引结构化数据¶

我们将示例结构化数据点加载到 SQL 数据库中并进行索引。

In [ ]

已复制！





from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    select,
    column,
)
from llama_index.core import SQLDatabase
from llama_index.core.indices import SQLStructStoreIndex

engine = create_engine("sqlite:///:memory:", future=True)
metadata_obj = MetaData()
from sqlalchemy import ( create_engine, MetaData, Table, Column, String, Integer, select, column, ) from llama_index.core import SQLDatabase from llama_index.core.indices import SQLStructStoreIndex engine = create_engine("sqlite:///:memory:", future=True) metadata_obj = MetaData()

In [ ]

已复制！





# create city SQL table
table_name = "city_stats"
city_stats_table = Table(
    table_name,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)

metadata_obj.create_all(engine)
# create city SQL table table_name = "city_stats" city_stats_table = Table( table_name, metadata_obj, Column("city_name", String(16), primary_key=True), Column("population", Integer), Column("country", String(16), nullable=False), ) metadata_obj.create_all(engine)

In [ ]

已复制！

# print tables
metadata_obj.tables.keys()
# print tables metadata_obj.tables.keys()

Out[ ]

dict_keys(['city_stats'])

In [ ]

已复制！





from sqlalchemy import insert

rows = [
    {"city_name": "Toronto", "population": 2930000, "country": "Canada"},
    {"city_name": "Tokyo", "population": 13960000, "country": "Japan"},
    {"city_name": "Berlin", "population": 3645000, "country": "Germany"},
]
for row in rows:
    stmt = insert(city_stats_table).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)
from sqlalchemy import insert rows = [ {"city_name": "Toronto", "population": 2930000, "country": "Canada"}, {"city_name": "Tokyo", "population": 13960000, "country": "Japan"}, {"city_name": "Berlin", "population": 3645000, "country": "Germany"}, ] for row in rows: stmt = insert(city_stats_table).values(**row) with engine.begin() as connection: cursor = connection.execute(stmt)

In [ ]

已复制！

with engine.connect() as connection:
    cursor = connection.exec_driver_sql("SELECT * FROM city_stats")
    print(cursor.fetchall())
with engine.connect() as connection: cursor = connection.exec_driver_sql("SELECT * FROM city_stats") print(cursor.fetchall())

[('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Berlin', 3645000, 'Germany')]

In [ ]

已复制！

sql_database = SQLDatabase(engine, include_tables=["city_stats"])
sql_database = SQLDatabase(engine, include_tables=["city_stats"])

In [ ]

已复制！

from llama_index.core.query_engine import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats"],
)
from llama_index.core.query_engine import NLSQLTableQueryEngine query_engine = NLSQLTableQueryEngine( sql_database=sql_database, tables=["city_stats"], )

加载和索引非结构化数据¶

我们将非结构化数据加载到由 Pinecone 支持的向量索引中。

In [ ]

已复制！

# install wikipedia python package
%pip install wikipedia llama-index-readers-wikipedia
# install wikipedia python package %pip install wikipedia llama-index-readers-wikipedia

In [ ]

已复制！

from llama_index.readers.wikipedia import WikipediaReader

cities = ["Toronto", "Berlin", "Tokyo"]
wiki_docs = WikipediaReader().load_data(pages=cities)
from llama_index.readers.wikipedia import WikipediaReader cities = ["Toronto", "Berlin", "Tokyo"] wiki_docs = WikipediaReader().load_data(pages=cities)

In [ ]

已复制！

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

In [ ]

已复制！





# dimensions are for text-embedding-3-small
pc.create_index(
    name="quickstart-sql",
    dimension=1536,
    metric="euclidean",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# may need to wait for index to be created
import time

time.sleep(10)
# dimensions are for text-embedding-3-small pc.create_index( name="quickstart-sql", dimension=1536, metric="euclidean", spec=ServerlessSpec(cloud="aws", region="us-east-1"), ) # may need to wait for index to be created import time time.sleep(10)

In [ ]

已复制！

# define pinecone index
index = pc.Index("quickstart-sql")
# define pinecone index index = pc.Index("quickstart-sql")

In [ ]

已复制！

# OPTIONAL: delete all
index.delete(deleteAll=True)
# OPTIONAL: delete all index.delete(deleteAll=True)

In [ ]

已复制！





from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# define node parser and LLM
Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = TokenTextSplitter(chunk_size=1024)

# define pinecone vector index
vector_store = PineconeVectorStore(
    pinecone_index=index, namespace="wiki_cities"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex([], storage_context=storage_context)
from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.pinecone import PineconeVectorStore from llama_index.core.node_parser import TokenTextSplitter from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding # define node parser and LLM Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini") Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small") Settings.node_parser = TokenTextSplitter(chunk_size=1024) # define pinecone vector index vector_store = PineconeVectorStore( pinecone_index=index, namespace="wiki_cities" ) storage_context = StorageContext.from_defaults(vector_store=vector_store) vector_index = VectorStoreIndex([], storage_context=storage_context)

In [ ]

已复制！





# Insert documents into vector index
# Each document has metadata of the city attached
for city, wiki_doc in zip(cities, wiki_docs):
    nodes = Settings.node_parser.get_nodes_from_documents([wiki_doc])
    # add metadata to each node
    for node in nodes:
        node.metadata = {"title": city}
    vector_index.insert_nodes(nodes)
# Insert documents into vector index # Each document has metadata of the city attached for city, wiki_doc in zip(cities, wiki_docs): nodes = Settings.node_parser.get_nodes_from_documents([wiki_doc]) # add metadata to each node for node in nodes: node.metadata = {"title": city} vector_index.insert_nodes(nodes)

定义查询引擎/工具¶

In [ ]

已复制！





from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool


vector_store_info = VectorStoreInfo(
    content_info="articles about different cities",
    metadata_info=[
        MetadataInfo(
            name="title", type="str", description="The name of the city"
        ),
    ],
)

# pre-built auto-retriever, this works similarly to our custom auto-retriever above
vector_auto_retriever = VectorIndexAutoRetriever(
    vector_index, vector_store_info=vector_store_info
)

retriever_query_engine = RetrieverQueryEngine.from_args(
    vector_auto_retriever,
)
from llama_index.core.retrievers import VectorIndexAutoRetriever from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.tools import QueryEngineTool vector_store_info = VectorStoreInfo( content_info="articles about different cities", metadata_info=[ MetadataInfo( name="title", type="str", description="The name of the city" ), ], ) # pre-built auto-retriever, this works similarly to our custom auto-retriever above vector_auto_retriever = VectorIndexAutoRetriever( vector_index, vector_store_info=vector_store_info ) retriever_query_engine = RetrieverQueryEngine.from_args( vector_auto_retriever, )

In [ ]

已复制！





sql_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="sql_tool",
    description=(
        "Useful for translating a natural language query into a SQL query over"
        " a table containing: city_stats, containing the population/country of"
        " each city"
    ),
)
vector_tool = QueryEngineTool.from_defaults(
    query_engine=retriever_query_engine,
    name="vector_tool",
    description=(
        "Useful for answering semantic questions about different cities"
    ),
)
sql_tool = QueryEngineTool.from_defaults( query_engine=query_engine, name="sql_tool", description=( "用于将自然语言查询转换为针对包含 city_stats 表的 SQL 查询，该表包含每个城市的人口/国家信息" ), ) vector_tool = QueryEngineTool.from_defaults( query_engine=retriever_query_engine, name="vector_tool", description=( "用于回答关于不同城市的语义问题" ), )

初始化 Agent¶

In [ ]

已复制！





from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.workflow import Context

agent = FunctionAgent(
    tools=[sql_tool, vector_tool],
    llm=OpenAI(model="gpt-4o"),
)

# hold the context/session state for the agent
ctx = Context(agent)
from llama_index.core.agent.workflow import FunctionAgent from llama_index.llms.openai import OpenAI from llama_index.core.workflow import Context agent = FunctionAgent( tools=[sql_tool, vector_tool], llm=OpenAI(model="gpt-4o"), ) # hold the context/session state for the agent ctx = Context(agent)

In [ ]

已复制！





from llama_index.core.agent.workflow import (
    ToolCallResult,
    ToolCall,
    AgentStream,
    AgentInput,
    AgentOutput,
)

handler = agent.run(
    "Tell me about the arts and culture of the city with the highest population. ",
    ctx=ctx,
)

async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print(
            f"\nCalled tool {ev.tool_name} with args {ev.tool_kwargs}, got response: {ev.tool_output}"
        )
    elif isinstance(ev, AgentStream):
        print(ev.delta, end="", flush=True)

response = await handler
from llama_index.core.agent.workflow import ( ToolCallResult, ToolCall, AgentStream, AgentInput, AgentOutput, ) handler = agent.run( "告诉我关于人口最多的城市的艺术和文化。", ctx=ctx, ) async for ev in handler.stream_events(): if isinstance(ev, ToolCallResult): print( f"\n调用了工具 {ev.tool_name}，参数为 {ev.tool_kwargs}，得到响应：{ev.tool_output}" ) elif isinstance(ev, AgentStream): print(ev.delta, end="", flush=True) response = await handler

Called tool sql_tool with args {'input': 'SELECT city FROM city_stats ORDER BY population DESC LIMIT 1;'}, got response: The city with the highest population is Tokyo.

Called tool vector_tool with args {'input': 'Tell me about the arts and culture of Tokyo.'}, got response: Tokyo boasts a vibrant arts and culture scene, characterized by a diverse range of museums, galleries, and performance venues. Ueno Park is a cultural hub, housing the Tokyo National Museum, which specializes in traditional Japanese art, alongside the National Museum of Western Art, a UNESCO World Heritage site, and the National Museum of Nature and Science. The park also features Ueno Zoo, known for its giant pandas.

The city is home to numerous notable museums, including the Artizon Museum, the National Museum of Emerging Science and Innovation, and the Edo-Tokyo Museum, which explores the city's history. Contemporary art is showcased at the Mori Art Museum and the Sumida Hokusai Museum, while the Sompo Museum of Art is recognized for its collection, including Van Gogh's "Sunflowers."

The performing arts thrive in Tokyo, with venues like the National Noh Theatre and Kabuki-za dedicated to traditional Japanese theatre. The New National Theatre Tokyo hosts a variety of performances, including opera and ballet. Major concert venues such as the Nippon Budokan and Tokyo Dome frequently feature popular music acts.

Tokyo's nightlife is vibrant, particularly in districts like Shibuya and Roppongi, which are filled with bars, clubs, and live music venues. The city is also known for its festivals, such as the Sannō Matsuri and the Sanja Festival, which celebrate traditional culture.

Shopping districts like Ginza and Nihombashi offer a blend of high-end retail and cultural experiences, while areas like Jinbōchō are famous for their literary connections, featuring bookstores and cafes linked to renowned authors. Overall, Tokyo's arts and culture reflect a rich tapestry of traditional and contemporary influences, making it a dynamic city for cultural exploration.
Tokyo, the city with the highest population, boasts a vibrant arts and culture scene. It features a diverse range of museums, galleries, and performance venues. Ueno Park serves as a cultural hub, housing the Tokyo National Museum, the National Museum of Western Art, and the National Museum of Nature and Science. The park also includes Ueno Zoo, known for its giant pandas.

Notable museums in Tokyo include the Artizon Museum, the National Museum of Emerging Science and Innovation, and the Edo-Tokyo Museum, which explores the city's history. Contemporary art is showcased at the Mori Art Museum and the Sumida Hokusai Museum, while the Sompo Museum of Art is recognized for its collection, including Van Gogh's "Sunflowers."

The performing arts thrive with venues like the National Noh Theatre and Kabuki-za dedicated to traditional Japanese theatre. The New National Theatre Tokyo hosts a variety of performances, including opera and ballet. Major concert venues such as the Nippon Budokan and Tokyo Dome frequently feature popular music acts.

Tokyo's nightlife is vibrant, especially in districts like Shibuya and Roppongi, filled with bars, clubs, and live music venues. The city is also known for its festivals, such as the Sannō Matsuri and the Sanja Festival, celebrating traditional culture.

Shopping districts like Ginza and Nihombashi offer a blend of high-end retail and cultural experiences, while areas like Jinbōchō are famous for their literary connections, featuring bookstores and cafes linked to renowned authors. Overall, Tokyo's arts and culture reflect a rich tapestry of traditional and contemporary influences, making it a dynamic city for cultural exploration.

In [ ]

已复制！





handler = agent.run("Tell me about the history of Berlin", ctx=ctx)

async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print(
            f"\nCalled tool {ev.tool_name} with args {ev.tool_kwargs}, got response: {ev.tool_output}"
        )
    elif isinstance(ev, AgentStream):
        print(ev.delta, end="", flush=True)

response = await handler
handler = agent.run("告诉我关于柏林的历史。", ctx=ctx) async for ev in handler.stream_events(): if isinstance(ev, ToolCallResult): print( f"\n调用了工具 {ev.tool_name}，参数为 {ev.tool_kwargs}，得到响应：{ev.tool_output}" ) elif isinstance(ev, AgentStream): print(ev.delta, end="", flush=True) response = await handler

Called tool vector_tool with args {'input': 'Tell me about the history of Berlin.'}, got response: Berlin's history dates back to prehistoric times, with evidence of human settlements as early as 60,000 BC. The area saw the emergence of various cultures, including the Maglemosian culture around 9,000 BC and the Lusatian culture around 2,000 BC, as dense human settlements developed along the Spree and Havel rivers. By 500 BC, Germanic tribes began to settle in the region, followed by Slavic tribes in the 7th century.

In the 12th century, the region came under German rule with the establishment of the Margraviate of Brandenburg. The first written records of towns in the area appear in the late 12th century, with Berlin's founding date considered to be 1237. The towns of Berlin and Cölln formed close economic ties and eventually merged, with the Hohenzollern family ruling the area from the 14th century until 1918.

The Thirty Years' War in the 17th century devastated Berlin, leading to significant population loss. However, under Frederick William, known as the "Great Elector," the city experienced a revival through policies promoting immigration and religious tolerance. The establishment of the Kingdom of Prussia in 1701 marked a significant turning point, with Berlin becoming its capital.

The 19th century brought the Industrial Revolution, transforming Berlin into a major economic center and leading to rapid population growth. By the late 19th century, Berlin was the capital of the newly founded German Empire. The city continued to grow and evolve through the 20th century, experiencing significant events such as World War II, the division into East and West Berlin during the Cold War, and reunification in 1990, when it once again became the capital of a unified Germany.

Today, Berlin is recognized as a global city of culture, politics, media, and science, with a diverse economy and rich historical heritage.
Berlin's history is rich and varied, dating back to prehistoric times with evidence of human settlements as early as 60,000 BC. The area saw the emergence of various cultures, including the Maglemosian culture around 9,000 BC and the Lusatian culture around 2,000 BC, with dense settlements along the Spree and Havel rivers. By 500 BC, Germanic tribes settled in the region, followed by Slavic tribes in the 7th century.

In the 12th century, the region came under German rule with the establishment of the Margraviate of Brandenburg. Berlin's founding date is considered to be 1237, with the towns of Berlin and Cölln forming close economic ties and eventually merging. The Hohenzollern family ruled the area from the 14th century until 1918.

The Thirty Years' War in the 17th century devastated Berlin, but it experienced a revival under Frederick William, the "Great Elector," through policies promoting immigration and religious tolerance. The establishment of the Kingdom of Prussia in 1701 marked a significant turning point, with Berlin becoming its capital.

The 19th century brought the Industrial Revolution, transforming Berlin into a major economic center and leading to rapid population growth. By the late 19th century, Berlin was the capital of the newly founded German Empire. The city continued to evolve through the 20th century, experiencing significant events such as World War II, the division into East and West Berlin during the Cold War, and reunification in 1990, when it once again became the capital of a unified Germany.

Today, Berlin is recognized as a global city of culture, politics, media, and science, with a diverse economy and rich historical heritage.

In [ ]

已复制！

response = await agent.run(
    "Can you give me the country corresponding to each city?", ctx=ctx
)

print(str(response))
response = await agent.run( "你能告诉我每个城市对应的国家吗？", ctx=ctx ) print(str(response))

Here are the cities along with their corresponding countries:

- Toronto is in Canada.
- Tokyo is in Japan.
- Berlin is in Germany.