使用 Optimum-Intel 优化的嵌入模型¶

LlamaIndex 支持使用 Optimum-Intel 库为英特尔加载量化嵌入模型。

优化后的模型更小、更快，且准确性损失极小，请参阅文档和一篇使用 IntelLabs/fastRAG 库的优化指南。

优化基于 Xeon® 第 4 代或更新处理器的数学指令。

为了能够加载和使用量化模型，请安装所需的依赖项 pip install optimum[exporters] optimum-intel neural-compressor intel_extension_for_pytorch。

使用 IntelEmbedding 类进行加载；用法类似于任何 HuggingFace 本地嵌入模型；请参阅示例

In [ ]

已复制！

%pip install llama-index-embeddings-huggingface-optimum-intel
%pip install llama-index-embeddings-huggingface-optimum-intel

In [ ]

已复制！

from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding

embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")
from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")

In [ ]

已复制！

embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("Hello World!") print(len(embeddings)) print(embeddings[:5])

384
[-0.0032782123889774084, -0.013396517373621464, 0.037944991141557693, -0.04642259329557419, 0.027709005400538445]