使用模式#

入门指南#

从索引构建聊天引擎

chat_engine = index.as_chat_engine()

提示

了解如何构建索引，请参阅索引

与您的数据对话

response = chat_engine.chat("Tell me a joke.")

重置聊天历史记录以开始新对话

chat_engine.reset()

进入交互式聊天 REPL

chat_engine.chat_repl()

配置聊天引擎#

配置聊天引擎与配置查询引擎非常相似。

高级 API#

您可以直接从索引构建和配置聊天引擎，只需一行代码

chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)

注意：您可以通过将 chat_mode 指定为关键字参数来访问不同的聊天引擎。condense_question 对应于 CondenseQuestionChatEngine，react 对应于 ReActChatEngine，context 对应于 ContextChatEngine。

注意：虽然高级 API 优化了易用性，但它并不暴露完整的可配置性。

可用聊天模式#

best - 将查询引擎转换为工具，与 ReAct 数据智能体或 OpenAI 数据智能体一起使用，具体取决于您的 LLM 支持。OpenAI 数据智能体需要 gpt-3.5-turbo 或 gpt-4，因为它们使用 OpenAI 的函数调用 API。
condense_question - 查看聊天历史记录并重写用户消息，使其成为索引的查询。读取查询引擎的响应后返回响应。
context - 使用每条用户消息从索引中检索节点。检索到的文本被插入到系统提示中，以便聊天引擎可以自然响应或使用来自查询引擎的上下文。
condense_plus_context - condense_question 和 context 的组合。查看聊天历史记录并重写用户消息，使其成为索引的检索查询。检索到的文本被插入到系统提示中，以便聊天引擎可以自然响应或使用来自查询引擎的上下文。
simple - 直接与 LLM 进行简单聊天，不涉及查询引擎。
react - 与 best 相同，但强制使用 ReAct 数据智能体。
openai - 与 best 相同，但强制使用 OpenAI 数据智能体。

低级组合 API#

如果您需要更细粒度的控制，可以使用低级组合 API。具体来说，您将显式构建 ChatEngine 对象，而不是调用 index.as_chat_engine(...)。

注意：您可能需要查阅 API 参考或示例笔记本。

以下是一个配置以下内容的示例：

配置压缩问题提示，
使用一些现有历史记录初始化对话，
打印详细的调试消息。

from llama_index.core import PromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.chat_engine import CondenseQuestionChatEngine

custom_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)

# list of `ChatMessage` objects
custom_chat_history = [
    ChatMessage(
        role=MessageRole.USER,
        content="Hello assistant, we are having a insightful discussion about Paul Graham today.",
    ),
    ChatMessage(role=MessageRole.ASSISTANT, content="Okay, sounds good."),
]

query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine,
    condense_question_prompt=custom_prompt,
    chat_history=custom_chat_history,
    verbose=True,
)

流式处理#

要启用流式处理，您只需调用 stream_chat 端点而不是 chat 端点。

警告

这与查询引擎（您传入 streaming=True 标志）有点不一致。我们正在努力使行为更加一致！

chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Tell me a joke.")
for token in streaming_response.response_gen:
    print(token, end="")

请参阅端到端教程