BGEM3Demo
在本 notebook 中,我们将展示如何将 BGE-M3 与 LlamaIndex 一起使用。
BGE-M3 是一个混合多语言检索模型,支持 100 多种语言,并可处理最长 8,192 个 token 的输入。该模型可以执行 (i) 密集检索,(ii) 稀疏检索,和 (iii) 多向量检索。
入门¶
In [ ]
已复制!
%pip install llama-index-indices-managed-bge-m3
%pip install llama-index-indices-managed-bge-m3
In [ ]
已复制!
%pip install llama-index
%pip install llama-index
创建 BGEM3Index¶
In [ ]
已复制!
from llama_index.core import Settings
from llama_index.core import Document
from llama_index.indices.managed.bge_m3 import BGEM3Index
Settings.chunk_size = 8192
from llama_index.core import Settings from llama_index.core import Document from llama_index.indices.managed.bge_m3 import BGEM3Index Settings.chunk_size = 8192
In [ ]
已复制!
# Let's create some demo corpus
sentences = [
"BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
"BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document",
]
documents = [Document(doc_id=i, text=s) for i, s in enumerate(sentences)]
# Let's create some demo corpus sentences = [ "BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document", ] documents = [Document(doc_id=i, text=s) for i, s in enumerate(sentences)]
In [ ]
已复制!
# Indexing with BGE-M3 model
index = BGEM3Index.from_documents(
documents,
weights_for_different_modes=[
0.4,
0.2,
0.4,
], # [dense_weight, sparse_weight, multi_vector_weight]
)
# Indexing with BGE-M3 model index = BGEM3Index.from_documents( documents, weights_for_different_modes=[ 0.4, 0.2, 0.4, ], # [dense_weight, sparse_weight, multi_vector_weight] )
检索相关文档¶
In [ ]
已复制!
retriever = index.as_retriever()
response = retriever.retrieve("What is BGE-M3?")
retriever = index.as_retriever() response = retriever.retrieve("What is BGE-M3?")
使用 BGE-M3 进行 RAG¶
In [ ]
已复制!
query_engine = index.as_query_engine()
response = query_engine.query("What is BGE-M3?")
query_engine = index.as_query_engine() response = query_engine.query("What is BGE-M3?")