使用 `LabelledRagDatatset` 对 RAG 管道进行基准测试¶

LabelledRagDataset 旨在用于评估任何给定的 RAG 管道，该管道可能有多种配置（例如，选择 LLM、similarity_top_k、chunk_size 等值）。我们将这种抽象比作传统的机器学习数据集，其中 X 特征用于预测真实标签 y。在这种情况下，我们将 query 和检索到的 contexts 用作“特征”，并将对 query 的回答（称为 reference_answer）用作真实标签。

当然，这样的数据集由观测或示例组成。对于 LabelledRagDataset，它们由一组 LabelledRagDataExample 构成。

在本 notebook 中，我们将展示如何从头构建一个 LabelledRagDataset。请注意，另一种方法是直接从 llama-hub 下载社区提供的 LabelledRagDataset，以便在其上评估/基准测试您自己的 RAG 管道。

`LabelledRagDataExample` 类¶

输入 [ ]

已复制！

%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-llms-openai %pip install llama-index-readers-wikipedia

输入 [ ]

已复制！





from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedByType,
    CreatedBy,
)

# constructing a LabelledRagDataExample
query = "This is a test query, is it not?"
query_by = CreatedBy(type=CreatedByType.AI, model_name="gpt-4")
reference_answer = "Yes it is."
reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
reference_contexts = ["This is a sample context"]

rag_example = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)
from llama_index.core.llama_dataset import ( LabelledRagDataExample, CreatedByType, CreatedBy, ) # constructing a LabelledRagDataExample query = "This is a test query, is it not?" query_by = CreatedBy(type=CreatedByType.AI, model_name="gpt-4") reference_answer = "Yes it is." reference_answer_by = CreatedBy(type=CreatedByType.HUMAN) reference_contexts = ["This is a sample context"] rag_example = LabelledRagDataExample( query=query, query_by=query_by, reference_contexts=reference_contexts, reference_answer=reference_answer, reference_answer_by=reference_answer_by, )

LabelledRagDataExample 是一个 Pydantic Model，因此可以在 json 或 dict 之间相互转换（反之亦然）。

输入 [ ]

已复制！

print(rag_example.json())
print(rag_example.json())

{"query": "This is a test query, is it not?", "query_by": {"model_name": "gpt-4", "type": "ai"}, "reference_contexts": ["This is a sample context"], "reference_answer": "Yes it is.", "reference_answer_by": {"model_name": "", "type": "human"}}

输入 [ ]

已复制！

LabelledRagDataExample.parse_raw(rag_example.json())
LabelledRagDataExample.parse_raw(rag_example.json())

输出 [ ]

LabelledRagDataExample(query='This is a test query, is it not?', query_by=CreatedBy(model_name='gpt-4', type=<CreatedByType.AI: 'ai'>), reference_contexts=['This is a sample context'], reference_answer='Yes it is.', reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))

输入 [ ]

已复制！

rag_example.dict()
rag_example.dict()

输出 [ ]

{'query': 'This is a test query, is it not?',
 'query_by': {'model_name': 'gpt-4', 'type': <CreatedByType.AI: 'ai'>},
 'reference_contexts': ['This is a sample context'],
 'reference_answer': 'Yes it is.',
 'reference_answer_by': {'model_name': '',
  'type': <CreatedByType.HUMAN: 'human'>}}

输入 [ ]

已复制！

LabelledRagDataExample.parse_obj(rag_example.dict())
LabelledRagDataExample.parse_obj(rag_example.dict())

输出 [ ]

LabelledRagDataExample(query='This is a test query, is it not?', query_by=CreatedBy(model_name='gpt-4', type=<CreatedByType.AI: 'ai'>), reference_contexts=['This is a sample context'], reference_answer='Yes it is.', reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))

让我们创建第二个示例，以便拥有一个（稍微）更有趣的 LabelledRagDataset。

输入 [ ]

已复制！





query = "This is a test query, is it so?"
reference_answer = "I think yes, it is."
reference_contexts = ["This is a second sample context"]

rag_example_2 = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)
query = "This is a test query, is it so?" reference_answer = "I think yes, it is." reference_contexts = ["This is a second sample context"] rag_example_2 = LabelledRagDataExample( query=query, query_by=query_by, reference_contexts=reference_contexts, reference_answer=reference_answer, reference_answer_by=reference_answer_by, )

`LabelledRagDataset` 类¶

输入 [ ]

已复制！

from llama_index.core.llama_dataset import LabelledRagDataset

rag_dataset = LabelledRagDataset(examples=[rag_example, rag_example_2])
from llama_index.core.llama_dataset import LabelledRagDataset rag_dataset = LabelledRagDataset(examples=[rag_example, rag_example_2])

存在一个便捷方法可以将数据集视为 pandas.DataFrame。

输入 [ ]

已复制！

rag_dataset.to_pandas()
rag_dataset.to_pandas()

输出 [ ]

	查询	参考上下文	参考答案	参考答案来源	查询来源
0	这是一个测试查询，不是吗？	[这是一个示例上下文]	是的。	人工	AI (gpt-4)
1	这是一个测试查询，是这样吗？	[这是第二个示例上下文]	我想是的。	人工	AI (gpt-4)

序列化¶

要将数据集持久化到磁盘以及从磁盘加载，可以使用 save_json 和 from_json 方法。

输入 [ ]

已复制！

rag_dataset.save_json("rag_dataset.json")
rag_dataset.save_json("rag_dataset.json")

输入 [ ]

已复制！

reload_rag_dataset = LabelledRagDataset.from_json("rag_dataset.json")
reload_rag_dataset = LabelledRagDataset.from_json("rag_dataset.json")

输入 [ ]

已复制！

reload_rag_dataset.to_pandas()
reload_rag_dataset.to_pandas()

输出 [ ]

	查询	参考上下文	参考答案	参考答案来源	查询来源
0	这是一个测试查询，不是吗？	[这是一个示例上下文]	是的。	人工	AI (gpt-4)
1	这是一个测试查询，是这样吗？	[这是第二个示例上下文]	我想是的。	人工	AI (gpt-4)

在 Wikipedia 上构建合成 `LabelledRagDataset`¶

本节中，我们将首先使用合成生成器创建一个 LabelledRagDataset。最终，我们将使用 GPT-4 为合成的 LabelledRagDataExample 生成 query 和 reference_answer。

注意：如果已有文本语料库上的 query、参考答案和 contexts，则无需使用数据合成来预测并随后评估这些预测。

输入 [ ]

已复制！

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

输入 [ ]

已复制！

!pip install wikipedia -q
!pip install wikipedia -q

输入 [ ]

已复制！





# wikipedia pages
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core import VectorStoreIndex

cities = [
    "San Francisco",
]

documents = WikipediaReader().load_data(
    pages=[f"History of {x}" for x in cities]
)
index = VectorStoreIndex.from_documents(documents)
# wikipedia pages from llama_index.readers.wikipedia import WikipediaReader from llama_index.core import VectorStoreIndex cities = [ "San Francisco", ] documents = WikipediaReader().load_data( pages=[f"History of {x}" for x in cities] ) index = VectorStoreIndex.from_documents(documents)

RagDatasetGenerator 可以基于一组文档构建，以生成 LabelledRagDataExample。

输入 [ ]

已复制！





# generate questions against chunks
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

# set context for llm provider
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# instantiate a DatasetGenerator
dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)
# generate questions against chunks from llama_index.core.llama_dataset.generator import RagDatasetGenerator from llama_index.llms.openai import OpenAI # set context for llm provider llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3) # instantiate a DatasetGenerator dataset_generator = RagDatasetGenerator.from_documents( documents, llm=llm, num_questions_per_chunk=2, # set the number of questions per nodes show_progress=True, )

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

输入 [ ]

已复制！

len(dataset_generator.nodes)
len(dataset_generator.nodes)

输出 [ ]

输入 [ ]

已复制！

# since there are 13 nodes, there should be a total of 26 questions
rag_dataset = dataset_generator.generate_dataset_from_nodes()
因为有 13 个节点，总共应该有 26 个问题

100%|███████████████████████████████████████████████████████| 13/13 [00:02<00:00,  5.04it/s]
100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.14s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.95s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:13<00:00,  6.55s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.89s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.66s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.85s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.03s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.07s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.48s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.34s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.50s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.35s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.34s/it]

输入 [ ]

已复制！

rag_dataset.to_pandas()
rag_dataset.to_pandas()

输出 [ ]

	查询	参考上下文	参考答案	参考答案来源	查询来源
0	1849 年淘金热对开发有何影响...	[加利福尼亚州旧金山市的历史，加州...	1849 年的淘金热产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
1	早期建立的欧洲定居点有哪些...	[加利福尼亚州旧金山市的历史，加州...	早期在...建立的欧洲定居点...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
2	欧洲人的到来如何影响了...	[== 欧洲人到来与早期定居点...	欧洲人的到来产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
3	面临的一些挑战是什么...	[== 欧洲人到来与早期定居点...	旧金山的早期定居者面临着几个...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
4	加州淘金热如何影响了...	[== 1848 年淘金热 ==\n加州淘金热...	19 世纪中叶的加州淘金热...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
5	讨论中国移民在其中的作用...	[== 1848 年淘金热 ==\n加州淘金热...	中国移民在其中发挥了重要作用...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
6	旧金山如何转变为一个主要城市...	[== 西方巴黎 ==\n\n正是在...	旧金山在...期间转变为一个主要城市...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
7	一些重要的发展和变化是什么...	[== 西方巴黎 ==\n\n正是在...	在 19 世纪末和 20 世纪初，...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
8	Abe Ruef 如何协助 Eugene Schmitz...	[== 腐败和贿赂审判 ==\n\n市长 Eu...	Abe Ruef 为 Eugene Schmitz 捐赠了 16,000 美元...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
9	描述 1906 年地震和...的影响	[== 腐败和贿赂审判 ==\n\n市长 Eu...	1906 年的地震和火灾造成了毁灭性的...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
10	1906 年旧金山地震如何影响了...	[=== 重建 ===\n几乎立即在...	1906 年旧金山地震产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
11	发生了哪些重大事件和发展...	[=== 重建 ===\n几乎立即在...	在 1930 年代和二战期间，发生了一些重大...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
12	二战后时代如何促进了...	[== 二战后 ==\n二战后，...	二战后，许多美国军事人员...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
13	讨论城市更新计划的影响...	[== 二战后 ==\n二战后，...	M. Justin Herman 主导了城市更新计划...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
14	旧金山如何成为反主流文化的中心...	[== 1960 – 1970 年代 ==\n\n\n=== "爱之夏" ...	旧金山在...期间成为反主流文化的中心...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
15	解释旧金山作为“同性恋之都”的作用...	[== 1960 – 1970 年代 ==\n\n\n=== "爱之夏" ...	在 1960 年代及以后，旧金山成为了...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
16	BART 和 Muni 的建设如何影响了...	[=== 新公共基础设施 ===\n1970 年代...	1970 年代 BART 和 Muni 的建设...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
17	旧金山面临的主要挑战是什么...	[=== 新公共基础设施 ===\n1970 年代...	在 1980 年代，旧金山面临几个主要...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
18	1989 年 Loma Prieta 地震如何影响了...	[=== 1989 年 Loma Prieta 地震 ===\n\n10 月...	1989 年 Loma Prieta 地震产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
19	讨论互联网泡沫在...的影响	[=== 1989 年 Loma Prieta 地震 ===\n\n10 月...	1990 年代末的互联网泡沫产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
20	教会湾社区的重建如何...	[== 2010 年代 ==\n2000 年代初及进入 2010...	教会湾社区的重建...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
21	旧金山发生了哪些重大事件...	[== 2010 年代 ==\n2000 年代初及进入 2010...	2010 年，旧金山巨人队赢得了他们的第一个...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
22	在旧金山历史的背景下，讨论...	[=== 文化主题 ===\nBerglund, Barbara (2...	1906 年的地震对...产生了重大影响...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
23	不同的民族和宗教社区如何...	[=== 文化主题 ===\nBerglund, Barbara (2...	来源中提到的两个特定社区...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
24	在旧金山历史的背景下，发生了什么...	[=== 淘金热与早期 ===\nHittell, John...	在此期间的一些重大事件和发展...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)
25	政治如何塑造了...的增长和转变...	[=== 淘金热与早期 ===\nHittell, John...	提供的来源提供了全面的理解...	AI (gpt-3.5-turbo)	AI (gpt-3.5-turbo)

输入 [ ]

已复制！

rag_dataset.save_json("rag_dataset.json")
rag_dataset.save_json("rag_dataset.json")

使用 LabelledRagDatatset 对 RAG 管道进行基准测试¶

LabelledRagDataExample 类¶

LabelledRagDataset 类¶

序列化¶

在 Wikipedia 上构建合成 LabelledRagDataset¶

使用 `LabelledRagDatatset` 对 RAG 管道进行基准测试¶

`LabelledRagDataExample` 类¶

`LabelledRagDataset` 类¶

在 Wikipedia 上构建合成 `LabelledRagDataset`¶