入门教程（使用本地 LLM）#

本教程将向您展示如何开始使用 LlamaIndex 构建 Agent。我们将从一个基本示例开始，然后展示如何添加 RAG（检索增强生成）功能。

我们将使用 BAAI/bge-base-en-v1.5 作为我们的嵌入模型，并使用通过 Ollama 提供的 llama3.1 8B。

提示

请确保您已首先按照安装步骤进行操作。

设置#

Ollama 是一个工具，可帮助您以最少的设置在本地配置 LLM。

按照README文件了解如何安装。

要下载 Llama3 模型，只需运行 ollama pull llama3.1。

注意：您需要一台至少有约 32GB 内存的机器。

正如我们的安装指南中所述，llama-index 实际上是软件包的集合。要运行 Ollama 和 Huggingface，我们需要安装这些集成

pip install llama-index-llms-ollama llama-index-embeddings-huggingface

包名称本身就是导入名称，这对于记住如何导入或安装它们非常有帮助！

from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

更多集成都列在 https://llamahub.ai 上。

基本 Agent 示例#

让我们从一个简单的示例开始，使用一个 Agent，它可以通过调用工具执行基本的乘法运算。创建一个名为 starter.py 的文件

import asyncio
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.ollama import Ollama


# Define a simple calculator tool
def multiply(a: float, b: float) -> float:
    """Useful for multiplying two numbers."""
    return a * b


# Create an agent workflow with our calculator tool
agent = FunctionAgent(
    tools=[multiply],
    llm=Ollama(model="llama3.1", request_timeout=360.0),
    system_prompt="You are a helpful assistant that can multiply two numbers.",
)


async def main():
    # Run the agent
    response = await agent.run("What is 1234 * 4567?")
    print(str(response))


# Run the agent
if __name__ == "__main__":
    asyncio.run(main())

这将输出类似以下内容：1234 * 4567 的答案是：5,618,916。

发生了什么

Agent 被赋予了一个问题：1234 * 4567 是多少？
在底层，这个问题以及工具的 schema（名称、docstring 和参数）被传递给了 LLM
Agent 选择了 multiply 工具并将参数写入了工具
Agent 收到了工具的结果并将其插入最终响应中

提示

如您所见，我们使用了 async Python 函数。许多 LLM 和模型支持异步调用，建议使用异步代码来提高应用程序的性能。要了解有关异步代码和 Python 的更多信息，我们推荐这篇关于 async + Python 的简短介绍。

添加聊天历史记录#

AgentWorkflow 也能够记住之前的消息。这些消息包含在 AgentWorkflow 的 Context 中。

如果传入了 Context，Agent 将使用它继续对话。

from llama_index.core.workflow import Context

# create context
ctx = Context(agent)

# run agent with context
response = await agent.run("My name is Logan", ctx=ctx)
response = await agent.run("What is my name?", ctx=ctx)

添加 RAG 功能#

现在，让我们通过添加搜索文档的功能来增强我们的 Agent。首先，让我们使用终端获取一些示例数据

mkdir data
wget https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -O data/paul_graham_essay.txt

您的目录结构现在应该看起来像这样

├── starter.py
└── data
    └── paul_graham_essay.txt

现在我们可以使用 LlamaIndex 创建一个用于搜索文档的工具。默认情况下，我们的 VectorStoreIndex 将使用 OpenAI 的 text-embedding-ada-002 嵌入来嵌入和检索文本。

我们修改后的 starter.py 应该看起来像这样

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import asyncio
import os

# Settings control global defaults
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm = Ollama(model="llama3.1", request_timeout=360.0)

# Create a RAG tool using LlamaIndex
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(
    documents,
    # we can optionally override the embed_model here
    # embed_model=Settings.embed_model,
)
query_engine = index.as_query_engine(
    # we can optionally override the llm here
    # llm=Settings.llm,
)


def multiply(a: float, b: float) -> float:
    """Useful for multiplying two numbers."""
    return a * b


async def search_documents(query: str) -> str:
    """Useful for answering natural language questions about an personal essay written by Paul Graham."""
    response = await query_engine.aquery(query)
    return str(response)


# Create an enhanced workflow with both tools
agent = AgentWorkflow.from_tools_or_functions(
    [multiply, search_documents],
    llm=Settings.llm,
    system_prompt="""You are a helpful assistant that can perform calculations
    and search through documents to answer questions.""",
)


# Now we can ask questions about the documents or do calculations
async def main():
    response = await agent.run(
        "What did the author do in college? Also, what's 7 * 8?"
    )
    print(response)


# Run the agent
if __name__ == "__main__":
    asyncio.run(main())

Agent 现在可以无缝地在计算器和搜索文档之间切换来回答问题。

存储 RAG 索引#

为了避免每次都重新处理文档，您可以将索引持久化到磁盘

# Save the index
index.storage_context.persist("storage")

# Later, load the index
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(
    storage_context,
    # we can optionally override the embed_model here
    # it's important to use the same embed_model as the one used to build the index
    # embed_model=Settings.embed_model,
)
query_engine = index.as_query_engine(
    # we can optionally override the llm here
    # llm=Settings.llm,
)

提示

如果您使用了默认之外的向量存储集成，您很可能可以直接从向量存储中重新加载

index = VectorStoreIndex.from_vector_store(
    vector_store,
    # it's important to use the same embed_model as the one used to build the index
    # embed_model=Settings.embed_model,
)

下一步是什么？#

这仅仅是您可以使用 LlamaIndex Agent 实现功能的开始！您可以

向 Agent 添加更多工具
使用不同的 LLM
使用系统提示定制 Agent 的行为
添加流式处理功能
实现人工干预工作流
使用多个 Agent 协作完成任务

一些有用的后续链接

在我们的Agent 文档中查看更高级的 Agent 示例
了解更多关于高级概念的信息
探索如何定制
查看组件指南