多步查询引擎¶
我们有一个多步查询引擎,能够将复杂查询分解为按顺序排列的子问题。本指南将引导您完成设置过程!
如果您在 colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
下载数据¶
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档,构建向量存储索引¶
In [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os os.environ["OPENAI_API_KEY"] = "sk-..."
In [ ]
已复制!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.llms.openai import OpenAI from IPython.display import Markdown, display
In [ ]
已复制!
# LLM (gpt-3.5)
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
# LLM (gpt-4)
gpt4 = OpenAI(temperature=0, model="gpt-4")
# 大语言模型 (gpt-3.5) gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo") # 大语言模型 (gpt-4) gpt4 = OpenAI(temperature=0, model="gpt-4")
In [ ]
已复制!
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 加载文档 documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
In [ ]
已复制!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
查询索引¶
In [ ]
已复制!
from llama_index.core.indices.query.query_transform.base import (
StepDecomposeQueryTransform,
)
# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4, verbose=True)
# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
llm=gpt35, verbose=True
)
from llama_index.core.indices.query.query_transform.base import ( StepDecomposeQueryTransform, ) # gpt-4 step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4, verbose=True) # gpt-3 step_decompose_transform_gpt3 = StepDecomposeQueryTransform( llm=gpt35, verbose=True )
In [ ]
已复制!
index_summary = "Used to answer questions about the author"
index_summary = "用于回答关于作者的问题"
In [ ]
已复制!
# set Logging to DEBUG for more detailed outputs
from llama_index.core.query_engine import MultiStepQueryEngine
query_engine = index.as_query_engine(llm=gpt4)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform,
index_summary=index_summary,
)
response_gpt4 = query_engine.query(
"Who was in the first batch of the accelerator program the author"
" started?",
)
# 将日志级别设置为 DEBUG 以获得更详细的输出 from llama_index.core.query_engine import MultiStepQueryEngine query_engine = index.as_query_engine(llm=gpt4) query_engine = MultiStepQueryEngine( query_engine=query_engine, query_transform=step_decompose_transform, index_summary=index_summary, ) response_gpt4 = query_engine.query( "Who was in the first batch of the accelerator program the author" " started?", )
> Current query: Who was in the first batch of the accelerator program the author started? > New query: Who is the author of the accelerator program? > Current query: Who was in the first batch of the accelerator program the author started? > New query: Who was in the first batch of the accelerator program started by Paul Graham? > Current query: Who was in the first batch of the accelerator program the author started? > New query: None
In [ ]
已复制!
display(Markdown(f"<b>{response_gpt4}</b>"))
display(Markdown(f"{response_gpt4}"))
作者启动的第一批加速器计划的参与者包括 Reddit 的创始人、后来创建 Twitch 的 Justin Kan 和 Emmett Shear、曾帮助编写 RSS 规范后来成为开放获取的殉道者 Aaron Swartz,以及后来成为 YC 第二任总裁的 Sam Altman。
In [ ]
已复制!
sub_qa = response_gpt4.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)
sub_qa = response_gpt4.metadata["sub_qa"] tuples = [(t[0], t[1].response) for t in sub_qa] print(tuples)
[('Who is the author of the accelerator program?', 'The author of the accelerator program is Paul Graham.'), ('Who was in the first batch of the accelerator program started by Paul Graham?', 'The first batch of the accelerator program started by Paul Graham included the founders of Reddit, Justin Kan and Emmett Shear who later founded Twitch, Aaron Swartz who had helped write the RSS spec and later became a martyr for open access, and Sam Altman who later became the second president of YC.')]
In [ ]
已复制!
response_gpt4 = query_engine.query(
"In which city did the author found his first company, Viaweb?",
)
response_gpt4 = query_engine.query( "In which city did the author found his first company, Viaweb?", )
> Current query: In which city did the author found his first company, Viaweb? > New query: Who is the author who founded Viaweb? > Current query: In which city did the author found his first company, Viaweb? > New query: In which city did Paul Graham found his first company, Viaweb? > Current query: In which city did the author found his first company, Viaweb? > New query: None
In [ ]
已复制!
print(response_gpt4)
print(response_gpt4)
The author founded his first company, Viaweb, in Cambridge.
In [ ]
已复制!
query_engine = index.as_query_engine(llm=gpt35)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform_gpt3,
index_summary=index_summary,
)
response_gpt3 = query_engine.query(
"In which city did the author found his first company, Viaweb?",
)
query_engine = index.as_query_engine(llm=gpt35) query_engine = MultiStepQueryEngine( query_engine=query_engine, query_transform=step_decompose_transform_gpt3, index_summary=index_summary, ) response_gpt3 = query_engine.query( "In which city did the author found his first company, Viaweb?", )
> Current query: In which city did the author found his first company, Viaweb? > New query: None
In [ ]
已复制!
print(response_gpt3)
print(response_gpt3)
Empty Response