%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
%pip install llama-index-llms-openai
%pip install llama-index-retrievers-bm25
设置¶
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
import os
import openai
os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]
如果您在 Colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
下载数据
如果您在colab上打开此Notebook,您可能需要安装LlamaIndex 🦙。
下载数据
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-02-12 17:59:58-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K 327KB/s in 0.2s 2024-02-12 17:59:59 (327 KB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
接下来,我们将为文档设置一个向量索引。
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=256)
index = VectorStoreIndex.from_documents(documents, transformations=[splitter])
from llama_index.core import VectorStoreIndex from llama_index.core.node_parser import SentenceSplitter splitter = SentenceSplitter(chunk_size=256) index = VectorStoreIndex.from_documents(documents, transformations=[splitter])
创建一个混合融合检索器¶
在此步骤中,我们将我们的索引与基于BM25的检索器进行融合。这将使我们能够在输入查询中同时捕获语义关系和关键词。
由于这两种检索器都计算分数,我们可以使用互惠重排序算法来重新排序节点,而无需使用额外的模型或过多的计算。
此设置还会查询4次,一次使用您的原始查询,然后生成另外3个查询。
默认情况下,它使用以下提示来生成额外的查询
QUERY_GEN_PROMPT = (
"You are a helpful assistant that generates multiple search queries based on a "
"single input query. Generate {num_queries} search queries, one on each line, "
"related to the following input query:\n"
"Query: {query}\n"
"Queries:\n"
)
首先,我们创建检索器。每个检索器都将检索最相似的前2个节点。
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.retrievers.bm25 import BM25Retriever
vector_retriever = index.as_retriever(similarity_top_k=2)
bm25_retriever = BM25Retriever.from_defaults(
docstore=index.docstore, similarity_top_k=2
)
from llama_index.retrievers.bm25 import BM25Retriever vector_retriever = index.as_retriever(similarity_top_k=2) bm25_retriever = BM25Retriever.from_defaults( docstore=index.docstore, similarity_top_k=2 )
接下来,我们可以创建融合检索器,它将从检索器返回的4个节点中返回最相似的前2个节点。
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.core.retrievers import QueryFusionRetriever
retriever = QueryFusionRetriever(
[vector_retriever, bm25_retriever],
similarity_top_k=2,
num_queries=4, # set this to 1 to disable query generation
mode="reciprocal_rerank",
use_async=True,
verbose=True,
# query_gen_prompt="...", # we could override the query generation prompt here
)
from llama_index.core.retrievers import QueryFusionRetriever retriever = QueryFusionRetriever( [vector_retriever, bm25_retriever], similarity_top_k=2, num_queries=4, # set this to 1 to disable query generation mode="reciprocal_rerank", use_async=True, verbose=True, # query_gen_prompt="...", # we could override the query generation prompt here )
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
# apply nested async to run in a notebook
import nest_asyncio
nest_asyncio.apply()
# apply nested async to run in a notebook import nest_asyncio nest_asyncio.apply()
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
nodes_with_scores = retriever.retrieve(
"What happened at Interleafe and Viaweb?"
)
nodes_with_scores = retriever.retrieve( "What happened at Interleafe and Viaweb?" )
Generated queries: 1. What were the major events or milestones in the history of Interleafe and Viaweb? 2. Can you provide a timeline of the key developments and achievements of Interleafe and Viaweb? 3. What were the successes and failures of Interleafe and Viaweb as companies?
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
for node in nodes_with_scores:
print(f"Score: {node.score:.2f} - {node.text}...\n-----\n")
for node in nodes_with_scores: print(f"Score: {node.score:.2f} - {node.text}...\n-----\n")
Score: 0.03 - The UI was horrible, but it proved you could build a whole store through the browser, without any client software or typing anything into the command line on the server. Now we felt like we were really onto something. I had visions of a whole new generation of software working this way. You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group called Release Engineering that seemed to be at least as big as the group that actually wrote the software. Now you could just update the software right on the server. We started a new company we called Viaweb, after the fact that our software worked via the web, and we got $10,000 in seed funding from Idelle's husband Julian. In return for that and doing the initial legal work and giving us business advice, we gave him 10% of the company. Ten years later this deal became the model for Y Combinator's. We knew founders needed something like this, because we'd needed it ourselves.... ----- Score: 0.03 - Now we felt like we were really onto something. I had visions of a whole new generation of software working this way. You wouldn't need versions, or ports, or any of that crap. At Interleaf there had been a whole group called Release Engineering that seemed to be at least as big as the group that actually wrote the software. Now you could just update the software right on the server. We started a new company we called Viaweb, after the fact that our software worked via the web, and we got $10,000 in seed funding from Idelle's husband Julian. In return for that and doing the initial legal work and giving us business advice, we gave him 10% of the company. Ten years later this deal became the model for Y Combinator's. We knew founders needed something like this, because we'd needed it ourselves. At this stage I had a negative net worth, because the thousand dollars or so I had in the bank was more than counterbalanced by what I owed the government in taxes. (Had I diligently set aside the proper proportion of the money I'd made consulting for Interleaf?... -----
如我们所见,返回的两个节点都正确提到了Viaweb和Interleaf!
在查询引擎中使用!¶
现在,我们可以将检索器插入到查询引擎中,以合成自然语言响应。
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(retriever)
from llama_index.core.query_engine import RetrieverQueryEngine query_engine = RetrieverQueryEngine.from_args(retriever)
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
response = query_engine.query("What happened at Interleafe and Viaweb?")
response = query_engine.query("What happened at Interleafe and Viaweb?")
Generated queries: 1. What were the major events or milestones in the history of Interleafe and Viaweb? 2. Can you provide a timeline of the key developments and achievements of Interleafe and Viaweb? 3. What were the outcomes or impacts of Interleafe and Viaweb on the respective industries they operated in?
%pip install llama-index-llms-openai %pip install llama-index-retrievers-bm25
import os import openai os.environ["OPENAI_API_KEY"] = "sk-..." openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.core.response.notebook_utils import display_response
display_response(response)
from llama_index.core.response.notebook_utils import display_response display_response(response)
最终响应:
在Interleaf公司,有一个名为“发布工程”的团队,其规模与实际编写软件的团队一样大。这表明他们非常注重软件的版本管理和端口。然而,在Viaweb公司,创始人意识到他们可以直接在服务器上更新软件,从而消除了版本和端口的需求。他们创办了Viaweb,一家通过网络提供软件服务的公司。他们获得了1万美元的种子资金,并将公司10%的股份给了Julian,Julian提供了资金和商业建议。这项交易后来成为了Y Combinator的模式。