指南评估器¶
本笔记本演示了如何使用 GuidelineEvaluator
根据用户指定的指南来评估问答系统。
如果您正在 Colab 上打开本笔记本,您可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
In [ ]
已复制!
from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.llms.openai import OpenAI
# Needed for running async functions in Jupyter Notebook
import nest_asyncio
nest_asyncio.apply()
from llama_index.core.evaluation import GuidelineEvaluator from llama_index.llms.openai import OpenAI # 在 Jupyter Notebook 中运行异步函数所需 import nest_asyncio nest_asyncio.apply()
In [ ]
已复制!
GUIDELINES = [
"The response should fully answer the query.",
"The response should avoid being vague or ambiguous.",
(
"The response should be specific and use statistics or numbers when"
" possible."
),
]
GUIDELINES = [ "响应应该充分回答查询。", "响应应该避免模糊或含糊不清。", ( "响应应该具体,并在可能时使用统计数据或数字。" ), ]
In [ ]
已复制!
llm = OpenAI(model="gpt-4")
evaluators = [
GuidelineEvaluator(llm=llm, guidelines=guideline)
for guideline in GUIDELINES
]
llm = OpenAI(model="gpt-4") evaluators = [ GuidelineEvaluator(llm=llm, guidelines=guideline) for guideline in GUIDELINES ]
In [ ]
已复制!
sample_data = {
"query": "Tell me about global warming.",
"contexts": [
(
"Global warming refers to the long-term increase in Earth's"
" average surface temperature due to human activities such as the"
" burning of fossil fuels and deforestation."
),
(
"It is a major environmental issue with consequences such as"
" rising sea levels, extreme weather events, and disruptions to"
" ecosystems."
),
(
"Efforts to combat global warming include reducing carbon"
" emissions, transitioning to renewable energy sources, and"
" promoting sustainable practices."
),
],
"response": (
"Global warming is a critical environmental issue caused by human"
" activities that lead to a rise in Earth's temperature. It has"
" various adverse effects on the planet."
),
}
sample_data = { "query": "告诉我关于全球变暖的信息。", "contexts": [ ( "全球变暖是指由于人类活动,如燃烧化石燃料和森林砍伐,导致地球平均表面温度长期上升的现象。" ), ( "这是一个主要的环境问题,其后果包括海平面上升、极端天气事件以及对生态系统的破坏。" ), ( "应对全球变暖的努力包括减少碳排放、转向可再生能源以及推广可持续实践。" ), ], "response": ( "全球变暖是由人类活动导致地球温度上升的关键环境问题。它对地球有各种不利影响。" ), }
In [ ]
已复制!
for guideline, evaluator in zip(GUIDELINES, evaluators):
eval_result = evaluator.evaluate(
query=sample_data["query"],
contexts=sample_data["contexts"],
response=sample_data["response"],
)
print("=====")
print(f"Guideline: {guideline}")
print(f"Pass: {eval_result.passing}")
print(f"Feedback: {eval_result.feedback}")
for guideline, evaluator in zip(GUIDELINES, evaluators): eval_result = evaluator.evaluate( query=sample_data["query"], contexts=sample_data["contexts"], response=sample_data["response"], ) print("=====") print(f"指南: {guideline}") print(f"通过: {eval_result.passing}") print(f"反馈: {eval_result.feedback}")
===== Guideline: The response should fully answer the query. Pass: False Feedback: The response does not fully answer the query. While it does provide a brief overview of global warming, it does not delve into the specifics of the causes, effects, or potential solutions to the problem. The response should be more detailed and comprehensive to fully answer the query. ===== Guideline: The response should avoid being vague or ambiguous. Pass: False Feedback: The response is too vague and does not provide specific details about global warming. It should include more information about the causes, effects, and potential solutions to global warming. ===== Guideline: The response should be specific and use statistics or numbers when possible. Pass: False Feedback: The response is too general and lacks specific details or statistics about global warming. It would be more informative if it included data such as the rate at which the Earth's temperature is rising, the main human activities contributing to global warming, or the specific adverse effects on the planet.