UpTrain 回调处理器¶

UpTrain (github || website || docs) 是一个用于评估和改进 GenAI 应用程序的开源平台。它为 20+ 个预配置检查（涵盖语言、代码、嵌入用例）提供评分，对失败案例执行根本原因分析，并提供有关如何解决这些问题的见解。

本 Notebook 展示了如何使用 UpTrain 回调处理器来评估您的 RAG 管道的不同组件。

1. RAG 查询引擎评估：¶

RAG 查询引擎在检索上下文和生成响应方面起着至关重要的作用。为了确保其性能和响应质量，我们进行以下评估

上下文相关性：确定检索到的上下文是否包含足够的信息来回答用户查询。
事实准确性：评估 LLM 的响应是否可以通过检索到的上下文进行验证。
响应完整性：检查响应是否包含全面回答用户查询所需的所有信息。

2. 子问题查询生成评估：¶

SubQuestionQueryGeneration 操作符将一个问题分解为子问题，并使用 RAG 查询引擎为每个子问题生成响应。为了衡量其准确性，我们使用

子查询完整性：确保子问题准确且全面地涵盖了原始查询。

3. 重排序评估：¶

重排序涉及根据节点与查询的相关性对节点进行重新排序并选择排名靠前的节点。根据重排序后返回的节点数量，执行不同的评估。

a. 相同数量的节点

上下文重排序：检查重排序后节点的顺序是否比原始顺序更与查询相关。

b. 不同数量的节点

上下文简洁性：检查节点数量减少后是否仍提供所有必需的信息。

这些评估共同确保了 LlamaIndex 管道中 RAG 查询引擎、SubQuestionQueryGeneration 操作符和重排序过程的鲁棒性和有效性。

注意：¶

我们使用基本 RAG 查询引擎进行了评估，同样的评估也可以使用高级 RAG 查询引擎进行。
重排序评估也同样适用，我们使用了 SentenceTransformerRerank 进行评估，同样的评估也可以使用其他重排序器进行。

安装依赖项并导入库¶

安装 notebook 依赖项。

In [ ]

已复制！

%pip install llama-index-readers-web
%pip install llama-index-callbacks-uptrain
%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers
%pip install llama-index-readers-web %pip install llama-index-callbacks-uptrain %pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers

导入库。

In [ ]

已复制！





from getpass import getpass

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.postprocessor import SentenceTransformerRerank

import os
from getpass import getpass from llama_index.core import Settings, VectorStoreIndex from llama_index.core.node_parser import SentenceSplitter from llama_index.readers.web import SimpleWebPageReader from llama_index.core.callbacks import CallbackManager from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler from llama_index.core.query_engine import SubQuestionQueryEngine from llama_index.core.tools import QueryEngineTool, ToolMetadata from llama_index.core.postprocessor import SentenceTransformerRerank import os

设置¶

UpTrain 为您提供

具有高级钻取和过滤选项的仪表盘
失败案例的洞察和常见主题
生产数据的可观测性和实时监控
通过与您的 CI/CD 管道无缝集成进行回归测试

您可以使用以下选项来使用 UpTrain 进行评估

1. UpTrain 的开源软件 (OSS)：¶

您可以使用开源评估服务来评估您的模型。在这种情况下，您需要提供一个 OpenAI API 密钥。您可以在此处获取。

为了在 UpTrain 仪表盘中查看您的评估结果，您需要在终端中运行以下命令进行设置

git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh

这将在您的本地机器上启动 UpTrain 仪表盘。您可以通过 http://localhost:3000/dashboard 访问它。

参数

key_type="openai"
api_key="OPENAI_API_KEY"
project_name="PROJECT_NAME"

2. UpTrain 托管服务和仪表盘：¶

或者，您可以使用 UpTrain 的托管服务来评估您的模型。您可以在此处创建一个免费的 UpTrain 账户并获得免费试用积分。如果您想要更多试用积分，请在此处与 UpTrain 的维护者预约通话。

使用托管服务的好处包括

无需在本地机器上设置 UpTrain 仪表盘。
无需它们的 API 密钥即可访问许多 LLM。

完成评估后，您可以在 UpTrain 仪表盘的 https://dashboard.uptrain.ai/dashboard 查看结果

参数

key_type="uptrain"
api_key="UPTRAIN_API_KEY"
project_name="PROJECT_NAME"

注意： project_name 将是项目名称，评估结果将在此项目名称下显示在 UpTrain 仪表盘中。

创建 UpTrain 回调处理器¶

In [ ]

已复制！

os.environ["OPENAI_API_KEY"] = getpass()

callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)

Settings.callback_manager = CallbackManager([callback_handler])
os.environ["OPENAI_API_KEY"] = getpass() callback_handler = UpTrainCallbackHandler( key_type="openai", api_key=os.environ["OPENAI_API_KEY"], project_name="uptrain_llamaindex", ) Settings.callback_manager = CallbackManager([callback_handler])

加载和解析文档¶

从 Paul Graham 的文章“我做过的事”中加载文档。

In [ ]

已复制！

documents = SimpleWebPageReader().load_data(
    [
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
    ]
)
documents = SimpleWebPageReader().load_data( [ "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt" ] )

将文档解析为节点。

In [ ]

已复制！

parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
parser = SentenceSplitter() nodes = parser.get_nodes_from_documents(documents)

1. RAG 查询引擎评估¶

UpTrain 回调处理器将在生成查询、上下文和响应后自动捕获它们，并对响应运行以下三个评估（评分从 0 到 1）

上下文相关性：确定检索到的上下文是否包含足够的信息来回答用户查询。
事实准确性：评估 LLM 的响应是否可以通过检索到的上下文进行验证。
响应完整性：检查响应是否包含全面回答用户查询所需的所有信息。

In [ ]

已复制！





index = VectorStoreIndex.from_documents(
    documents,
)
query_engine = index.as_query_engine()

max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?",
    "When and how did Paul Graham meet Jessica Livingston?",
    "What is Bel, and when and where was it written?",
]
for query in queries:
    response = query_engine.query(query)
index = VectorStoreIndex.from_documents( documents, ) query_engine = index.as_query_engine() max_characters_per_line = 80 queries = [ "What did Paul Graham do growing up?", "When and how did Paul Graham's mother die?", "What, in Paul Graham's opinion, is the most distinctive thing about YC?", "When and how did Paul Graham meet Jessica Livingston?", "What is Bel, and when and where was it written?", ] for query in queries: response = query_engine.query(query)

100%|██████████| 1/1 [00:01<00:00,  1.33s/it]
100%|██████████| 1/1 [00:01<00:00,  1.36s/it]
100%|██████████| 1/1 [00:03<00:00,  3.50s/it]
100%|██████████| 1/1 [00:01<00:00,  1.32s/it]

Question: What did Paul Graham do growing up?
Response: Growing up, Paul Graham worked on writing short stories and programming. He started programming on an IBM 1401 in 9th grade using an early version of Fortran. Later, he got a TRS-80 computer and wrote simple games, a rocket prediction program, and a word processor. Despite his interest in programming, he initially planned to study philosophy in college before eventually switching to AI.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.59s/it]
100%|██████████| 1/1 [00:00<00:00,  1.01it/s]
100%|██████████| 1/1 [00:01<00:00,  1.76s/it]
100%|██████████| 1/1 [00:01<00:00,  1.28s/it]

Question: When and how did Paul Graham's mother die?
Response: Paul Graham's mother died when he was 18 years old, from a brain tumor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5

100%|██████████| 1/1 [00:01<00:00,  1.75s/it]
100%|██████████| 1/1 [00:01<00:00,  1.55s/it]
100%|██████████| 1/1 [00:03<00:00,  3.39s/it]
100%|██████████| 1/1 [00:01<00:00,  1.48s/it]

Question: What, in Paul Graham's opinion, is the most distinctive thing about YC?
Response: The most distinctive thing about Y Combinator, according to Paul Graham, is that instead of deciding for himself what to work on, the problems come to him. Every 6 months, a new batch of startups brings their problems, which then become the focus of YC. This engagement with a variety of startup problems and the direct involvement in solving them is what Graham finds most unique about Y Combinator.

Context Relevance Score: 1.0
Factual Accuracy Score: 0.3333333333333333
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.92s/it]
100%|██████████| 1/1 [00:00<00:00,  1.20it/s]
100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
100%|██████████| 1/1 [00:01<00:00,  1.08s/it]

Question: When and how did Paul Graham meet Jessica Livingston?
Response: Paul Graham met Jessica Livingston at a big party at his house in October 2003.

Context Relevance Score: 1.0
Factual Accuracy Score: 0.5
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.82s/it]
100%|██████████| 1/1 [00:01<00:00,  1.14s/it]
100%|██████████| 1/1 [00:03<00:00,  3.19s/it]
100%|██████████| 1/1 [00:01<00:00,  1.50s/it]

Question: What is Bel, and when and where was it written?
Response: Bel is a new Lisp that was written in Arc. It was developed over a period of 4 years, from March 26, 2015 to October 12, 2019. The majority of Bel was written in England.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

2. 子问题查询引擎评估¶

子问题查询引擎用于解决使用多个数据源回答复杂查询的问题。它首先将复杂查询分解为每个相关数据源的子问题，然后收集所有中间响应并合成最终响应。

UpTrain 回调处理器将在生成子问题及其各自响应后自动捕获它们，并对响应运行以下三个评估（评分从 0 到 1）

上下文相关性：确定检索到的上下文是否包含足够的信息来回答用户查询。
事实准确性：评估 LLM 的响应是否可以通过检索到的上下文进行验证。
响应完整性：检查响应是否包含全面回答用户查询所需的所有信息。

除了上述评估外，回调处理器还将运行以下评估

子查询完整性：确保子问题准确且全面地涵盖了原始查询。

In [ ]

已复制！





# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    documents=documents,
    use_async=True,
).as_query_engine()

query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)
# build index and query engine vector_query_engine = VectorStoreIndex.from_documents( documents=documents, use_async=True, ).as_query_engine() query_engine_tools = [ QueryEngineTool( query_engine=vector_query_engine, metadata=ToolMetadata( name="documents", description="Paul Graham essay on What I Worked On", ), ), ] query_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=query_engine_tools, use_async=True, ) response = query_engine.query( "How was Paul Grahams life different before, during, and after YC?" )

Generated 3 sub questions.
[documents] Q: What did Paul Graham work on before YC?
[documents] Q: What did Paul Graham work on during YC?
[documents] Q: What did Paul Graham work on after YC?
[documents] A: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.
[documents] A: Paul Graham worked on writing essays and working on Y Combinator during YC.
[documents] A: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.

100%|██████████| 3/3 [00:02<00:00,  1.47it/s]
100%|██████████| 3/3 [00:00<00:00,  3.28it/s]
100%|██████████| 3/3 [00:01<00:00,  1.68it/s]
100%|██████████| 3/3 [00:01<00:00,  2.28it/s]

Question: What did Paul Graham work on after YC?
Response: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on during YC?
Response: Paul Graham worked on writing essays and working on Y Combinator during YC.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on before YC?
Response: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5

100%|██████████| 1/1 [00:01<00:00,  1.24s/it]

Question: How was Paul Grahams life different before, during, and after YC?
Sub Query Completeness Score: 1.0

3. 重排序¶

重排序是根据节点与查询的相关性对节点进行重新排序的过程。Llamaindex 提供了多种类型的重排序算法。在此示例中，我们使用了 LLMRerank。

重排序器允许您输入重排序后将返回的 top n 个节点数量。如果此值与原始节点数量相同，重排序器将只重新排序节点，而不改变节点数量。否则，它将重新排序节点并返回 top n 个节点。

我们将根据重排序后返回的节点数量进行不同的评估。

3a. 重排序（节点数量相同）¶

如果重排序后返回的节点数量与原始节点数量相同，将执行以下评估

上下文重排序：检查重排序后节点的顺序是否比原始顺序更与查询相关。

In [ ]

已复制！





callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=3,  # number of nodes after reranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)

query_engine = index.as_query_engine(
    similarity_top_k=3,  # number of nodes before reranking
    node_postprocessors=[rerank_postprocessor],
)

response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
callback_handler = UpTrainCallbackHandler( key_type="openai", api_key=os.environ["OPENAI_API_KEY"], project_name="uptrain_llamaindex", ) Settings.callback_manager = CallbackManager([callback_handler]) rerank_postprocessor = SentenceTransformerRerank( top_n=3, # number of nodes after reranking keep_retrieval_score=True, ) index = VectorStoreIndex.from_documents( documents=documents, ) query_engine = index.as_query_engine( similarity_top_k=3, # number of nodes before reranking node_postprocessors=[rerank_postprocessor], ) response = query_engine.query( "What did Sam Altman do in this essay?", )

100%|██████████| 1/1 [00:01<00:00,  1.89s/it]

Question: What did Sam Altman do in this essay?
Context Reranking Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.88s/it]
100%|██████████| 1/1 [00:01<00:00,  1.44s/it]
100%|██████████| 1/1 [00:02<00:00,  2.77s/it]
100%|██████████| 1/1 [00:01<00:00,  1.45s/it]

Question: What did Sam Altman do in this essay?
Response: Sam Altman was asked to become the president of Y Combinator after the original founders decided to step down and reorganize the company for long-term sustainability.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

3b. 重排序（节点数量不同）¶

如果重排序后返回的节点数量少于原始节点数量，将执行以下评估

上下文简洁性：检查节点数量减少后是否仍提供所有必需的信息。

In [ ]

已复制！





callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=2,  # Number of nodes after re-ranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)
query_engine = index.as_query_engine(
    similarity_top_k=5,  # Number of nodes before re-ranking
    node_postprocessors=[rerank_postprocessor],
)

# Use your advanced RAG
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
callback_handler = UpTrainCallbackHandler( key_type="openai", api_key=os.environ["OPENAI_API_KEY"], project_name="uptrain_llamaindex", ) Settings.callback_manager = CallbackManager([callback_handler]) rerank_postprocessor = SentenceTransformerRerank( top_n=2, # Number of nodes after re-ranking keep_retrieval_score=True, ) index = VectorStoreIndex.from_documents( documents=documents, ) query_engine = index.as_query_engine( similarity_top_k=5, # Number of nodes before re-ranking node_postprocessors=[rerank_postprocessor], ) # Use your advanced RAG response = query_engine.query( "What did Sam Altman do in this essay?", )

100%|██████████| 1/1 [00:02<00:00,  2.22s/it]

Question: What did Sam Altman do in this essay?
Context Conciseness Score: 0.0

100%|██████████| 1/1 [00:01<00:00,  1.58s/it]
100%|██████████| 1/1 [00:00<00:00,  1.19it/s]
100%|██████████| 1/1 [00:01<00:00,  1.62s/it]
100%|██████████| 1/1 [00:01<00:00,  1.42s/it]

Question: What did Sam Altman do in this essay?
Response: Sam Altman offered unsolicited advice to the author during a visit to California for interviews.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

UpTrain 的仪表盘和洞察¶

这里有一个短视频，展示了仪表盘和洞察