mixedbread 重排序 Cookbook¶
mixedbread.ai 已在 Apache 2.0 许可下发布了三个完全开源的重排序器模型。如需更深入的信息,您可以查看他们的详细博客文章。以下是这三个模型
mxbai-rerank-xsmall-v1
mxbai-rerank-base-v1
mxbai-rerank-large-v1
在本 notebook 中,我们将演示如何在 LlamaIndex 中使用 mxbai-rerank-base-v1
模型与 SentenceTransformerRerank
模块。此设置允许您使用 SentenceTransformerRerank
模块无缝替换任何您选择的重排序器模型,以增强您的 RAG 管道。
安装¶
In [ ]
已复制!
!pip install llama-index
!pip install sentence-transformers
!pip install llama-index !pip install sentence-transformers
设置 API 密钥¶
In [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
import os os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
In [ ]
已复制!
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
)
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, ) from llama_index.core.postprocessor import SentenceTransformerRerank
下载数据¶
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-03-01 09:52:09-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.007s 2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
加载文档¶
In [ ]
已复制!
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
构建索引¶
In [ ]
已复制!
index = VectorStoreIndex.from_documents(documents=documents)
index = VectorStoreIndex.from_documents(documents=documents)
为 mxbai-rerank-base-v1
重排序器定义后处理器¶
In [ ]
已复制!
from llama_index.core.postprocessor import SentenceTransformerRerank
postprocessor = SentenceTransformerRerank(
model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2
)
from llama_index.core.postprocessor import SentenceTransformerRerank postprocessor = SentenceTransformerRerank( model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2 )
创建查询引擎¶
我们首先检索 10 个相关节点,然后使用定义的后处理器选择排名前 2 的节点。
In [ ]
已复制!
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[postprocessor],
)
query_engine = index.as_query_engine( similarity_top_k=10, node_postprocessors=[postprocessor], )
测试查询¶
In [ ]
已复制!
response = query_engine.query(
"Why did Sam Altman decline the offer of becoming president of Y Combinator?",
)
print(response)
response = query_engine.query( "Why did Sam Altman decline the offer of becoming president of Y Combinator?", ) print(response)
Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.
In [ ]
已复制!
response = query_engine.query(
"Why did Paul Graham start YC?",
)
print(response)
response = query_engine.query( "Why did Paul Graham start YC?", ) print(response)
Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.