mixedbread 重排序 Cookbook¶

mixedbread.ai 已在 Apache 2.0 许可下发布了三个完全开源的重排序器模型。如需更深入的信息，您可以查看他们的详细博客文章。以下是这三个模型

mxbai-rerank-xsmall-v1
mxbai-rerank-base-v1
mxbai-rerank-large-v1

在本 notebook 中，我们将演示如何在 LlamaIndex 中使用 mxbai-rerank-base-v1 模型与 SentenceTransformerRerank 模块。此设置允许您使用 SentenceTransformerRerank 模块无缝替换任何您选择的重排序器模型，以增强您的 RAG 管道。

安装¶

In [ ]

已复制!

!pip install llama-index
!pip install sentence-transformers
!pip install llama-index !pip install sentence-transformers

设置 API 密钥¶

In [ ]

已复制!

import os

os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
import os os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"

In [ ]

已复制!

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
)

from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, ) from llama_index.core.postprocessor import SentenceTransformerRerank

下载数据¶

In [ ]

已复制!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-03-01 09:52:09--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’

data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.007s  

2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]

加载文档¶

In [ ]

已复制!

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

构建索引¶

In [ ]

已复制!

index = VectorStoreIndex.from_documents(documents=documents)
index = VectorStoreIndex.from_documents(documents=documents)

为 `mxbai-rerank-base-v1` 重排序器定义后处理器¶

In [ ]

已复制!

from llama_index.core.postprocessor import SentenceTransformerRerank

postprocessor = SentenceTransformerRerank(
    model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2
)
from llama_index.core.postprocessor import SentenceTransformerRerank postprocessor = SentenceTransformerRerank( model="mixedbread-ai/mxbai-rerank-base-v1", top_n=2 )

创建查询引擎¶

我们首先检索 10 个相关节点，然后使用定义的后处理器选择排名前 2 的节点。

In [ ]

已复制!

query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[postprocessor],
)
query_engine = index.as_query_engine( similarity_top_k=10, node_postprocessors=[postprocessor], )

测试查询¶

In [ ]

已复制!

response = query_engine.query(
    "Why did Sam Altman decline the offer of becoming president of Y Combinator?",
)

print(response)
response = query_engine.query( "Why did Sam Altman decline the offer of becoming president of Y Combinator?", ) print(response)

Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.

In [ ]

已复制!

response = query_engine.query(
    "Why did Paul Graham start YC?",
)

print(response)
response = query_engine.query( "Why did Paul Graham start YC?", ) print(response)

Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.

mixedbread 重排序 Cookbook¶

安装¶

设置 API 密钥¶

下载数据¶

加载文档¶

构建索引¶

为 mxbai-rerank-base-v1 重排序器定义后处理器¶

创建查询引擎¶

测试查询¶

为 `mxbai-rerank-base-v1` 重排序器定义后处理器¶