自我纠正查询引擎 - 评估与重试¶

在本notebook中，我们将展示几个高级的自我纠正查询引擎。
它们利用最新的LLM评估自身输出的能力，然后进行自我纠正以提供更好的回答。

如果您正在colab上打开此Notebook，您可能需要安装LlamaIndex 🦙。

In [ ]

已复制!

!pip install llama-index
!pip install llama-index

In [ ]

已复制!

# Uncomment to add your OpenAI API key
# import os
# os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
# Uncomment to add your OpenAI API key # import os # os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

In [ ]

已复制!

# Uncomment for debug level logging
# import logging
# import sys

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
# Uncomment for debug level logging # import logging # import sys # logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) # logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

设置¶

首先我们摄取文档。

In [ ]

已复制!

from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()
from llama_index.core import VectorStoreIndex from llama_index.core import SimpleDirectoryReader # Needed for running async functions in Jupyter Notebook import nest_asyncio nest_asyncio.apply()

下载数据

In [ ]

已复制!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载数据

In [ ]

已复制!

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
query = "What did the author do growing up?"
documents = SimpleDirectoryReader("./data/paul_graham/").load_data() index = VectorStoreIndex.from_documents(documents) query = "What did the author do growing up?"

让我们看看默认查询引擎的响应是什么样的

In [ ]

已复制!

base_query_engine = index.as_query_engine()
response = base_query_engine.query(query)
print(response)
base_query_engine = index.as_query_engine() response = base_query_engine.query(query) print(response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.

重试查询引擎¶

重试查询引擎使用评估器来改进基础查询引擎的响应。

它执行以下操作：

首先查询基础查询引擎，然后
使用评估器决定响应是否通过。
如果响应通过，则返回响应，
否则，使用评估结果（查询、响应和反馈）将原始查询转换为新查询，
重复，最多不超过max_retries次

In [ ]

已复制!





from llama_index.core.query_engine import RetryQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator

query_response_evaluator = RelevancyEvaluator()
retry_query_engine = RetryQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_response = retry_query_engine.query(query)
print(retry_response)
from llama_index.core.query_engine import RetryQueryEngine from llama_index.core.evaluation import RelevancyEvaluator query_response_evaluator = RelevancyEvaluator() retry_query_engine = RetryQueryEngine( base_query_engine, query_response_evaluator ) retry_response = retry_query_engine.query(query) print(retry_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer, a TRS-80, and started programming more extensively, including writing simple games and a word processor.

重试源查询引擎¶

源重试通过根据LLM节点评估过滤查询的现有源节点来修改查询源节点。

In [ ]

已复制!

from llama_index.core.query_engine import RetrySourceQueryEngine

retry_source_query_engine = RetrySourceQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_source_response = retry_source_query_engine.query(query)
print(retry_source_response)
from llama_index.core.query_engine import RetrySourceQueryEngine retry_source_query_engine = RetrySourceQueryEngine( base_query_engine, query_response_evaluator ) retry_source_response = retry_source_query_engine.query(query) print(retry_source_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.

重试指导查询引擎¶

此模块尝试使用指导方针来指导评估器的行为。您可以自定义自己的指导方针。

In [ ]

已复制!





from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES
from llama_index.core import Response
from llama_index.core.indices.query.query_transform.feedback_transform import (
    FeedbackQueryTransformation,
)
from llama_index.core.query_engine import RetryGuidelineQueryEngine

# Guideline eval
guideline_eval = GuidelineEvaluator(
    guidelines=DEFAULT_GUIDELINES
    + "\nThe response should not be overly long.\n"
    "The response should try to summarize where possible.\n"
)  # just for example
from llama_index.core.evaluation import GuidelineEvaluator from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES from llama_index.core import Response from llama_index.core.indices.query.query_transform.feedback_transform import ( FeedbackQueryTransformation, ) from llama_index.core.query_engine import RetryGuidelineQueryEngine # Guideline eval guideline_eval = GuidelineEvaluator( guidelines=DEFAULT_GUIDELINES + "\nThe response should not be overly long.\n" "The response should try to summarize where possible.\n" ) # just for example

让我们看看其内部工作原理。

In [ ]

已复制!





typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
eval = guideline_eval.evaluate_response(query, typed_response)
print(f"Guideline eval evaluation result: {eval.feedback}")

feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True)
transformed_query = feedback_query_transform.run(query, {"evaluation": eval})
print(f"Transformed query: {transformed_query.query_str}")
typed_response = ( response if isinstance(response, Response) else response.get_response() ) eval = guideline_eval.evaluate_response(query, typed_response) print(f"Guideline eval evaluation result: {eval.feedback}") feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True) transformed_query = feedback_query_transform.run(query, {"evaluation": eval}) print(f"Transformed query: {transformed_query.query_str}")

Guideline eval evaluation result: The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's activities growing up, such as writing short stories and programming on different computers, but it could be more concise and focused. Additionally, the response does not mention any statistics or numbers to support the author's experiences.
Transformed query: Here is a previous bad answer.
The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.
Here is some feedback from the evaluator about the response given.
The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's activities growing up, such as writing short stories and programming on different computers, but it could be more concise and focused. Additionally, the response does not mention any statistics or numbers to support the author's experiences.
Now answer the question.
What were the author's activities and interests during their childhood and adolescence?

现在让我们运行完整的查询引擎

In [ ]

已复制!

retry_guideline_query_engine = RetryGuidelineQueryEngine(
    base_query_engine, guideline_eval, resynthesize_query=True
)
retry_guideline_response = retry_guideline_query_engine.query(query)
print(retry_guideline_response)
retry_guideline_query_engine = RetryGuidelineQueryEngine( base_query_engine, guideline_eval, resynthesize_query=True ) retry_guideline_response = retry_guideline_query_engine.query(query) print(retry_guideline_response)

During their childhood and adolescence, the author worked on writing short stories and programming. They mentioned that their short stories were not very good, lacking plot but focusing on characters with strong feelings. In terms of programming, they tried writing programs on the IBM 1401 computer in 9th grade using an early version of Fortran. However, they mentioned being puzzled by the 1401 and not being able to do much with it due to the limited input options. They also mentioned getting a microcomputer, a TRS-80, and starting to write simple games, a program to predict rocket heights, and a word processor.