使用 Cohere 多模态嵌入进行多模态检索¶
Cohere 发布了多模态嵌入模型,在本 notebook 中,我们将演示使用 Cohere 多模态嵌入进行多模态检索
。
为什么多模态嵌入很重要?
多模态嵌入非常重要,因为它们使得 AI 系统能够以统一的方式理解和搜索图像和文本。多模态嵌入将不同类型的内容(如文本和图像)转换到同一个嵌入空间,这使得用户可以通过一个查询查找不同媒体类型中的相关信息,而无需使用单独的文本搜索和图像搜索系统。
本次演示的步骤如下:
- 从相关的维基百科文章下载文本、图像和原始 PDF 文件。
- 使用 Cohere 多模态嵌入为文本和图像构建多模态索引。
- 使用多模态检索器同时检索给定查询的相关文本和图像。
- 使用多模态查询引擎为给定查询生成响应。
注意:我们将使用 Anthropic 的多模态 LLM 来生成响应,因为 Cohere 尚不支持多模态 LLM。
安装¶
我们将使用 Cohere 多模态嵌入进行检索,使用 Qdrant 向量存储和 Anthropic 多模态 LLM 进行响应生成。
%pip install llama-index-embeddings-cohere
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-multi-modal-llms-anthropic
import os
os.environ["COHERE_API_KEY"] = "<YOUR COHERE API KEY>"
os.environ["ANTHROPIC_API_KEY"] = "<YOUR ANTHROPIC API KEY>"
工具¶
get_wikipedia_images
: 从指定标题的维基百科页面获取图像 URL。plot_images
: 绘制指定图像路径列表中的图像。delete_large_images
: 删除指定目录中大于 5 MB 的图像。
注意: Cohere API 接受小于 5MB 的图像文件。
import requests
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
import urllib.request
import os
def get_wikipedia_images(title):
"""
Get the image URLs from the Wikipedia page with the specified title.
"""
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "imageinfo",
"iiprop": "url|dimensions|mime",
"generator": "images",
"gimlimit": "50",
},
).json()
image_urls = []
for page in response["query"]["pages"].values():
if page["imageinfo"][0]["url"].endswith(".jpg") or page["imageinfo"][
0
]["url"].endswith(".png"):
image_urls.append(page["imageinfo"][0]["url"])
return image_urls
def plot_images(image_paths):
"""
Plot the images in the specified list of image paths.
"""
images_shown = 0
plt.figure(figsize=(16, 9))
for img_path in image_paths:
if os.path.isfile(img_path):
image = Image.open(img_path)
plt.subplot(2, 3, images_shown + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
images_shown += 1
if images_shown >= 9:
break
def delete_large_images(folder_path):
"""
Delete images larger than 5 MB in the specified directory.
"""
# List to hold the names of deleted image files
deleted_images = []
# Iterate through each file in the directory
for file_name in os.listdir(folder_path):
if file_name.lower().endswith(
(".png", ".jpg", ".jpeg", ".gif", ".bmp")
):
# Construct the full file path
file_path = os.path.join(folder_path, file_name)
# Get the size of the file in bytes
file_size = os.path.getsize(file_path)
# Check if the file size is greater than 5 MB (5242880 bytes) and remove it
if file_size > 5242880:
os.remove(file_path)
deleted_images.append(file_name)
print(
f"Image: {file_name} was larger than 5 MB and has been deleted."
)
image_uuid = 0
# image_metadata_dict stores images metadata including image uuid, filename and path
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 10
wiki_titles = {
"Audi e-tron",
"Ford Mustang",
"Porsche Taycan",
}
data_path = Path("mixed_wiki")
if not data_path.exists():
Path.mkdir(data_path)
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
images_per_wiki = 0
try:
list_img_urls = get_wikipedia_images(title)
for url in list_img_urls:
if (
url.endswith(".jpg")
or url.endswith(".png")
or url.endswith(".svg")
):
image_uuid += 1
urllib.request.urlretrieve(
url, data_path / f"{image_uuid}.jpg"
)
images_per_wiki += 1
if images_per_wiki > MAX_IMAGES_PER_WIKI:
break
except:
print(str(Exception("No images found for Wikipedia page: ")) + title)
continue
删除较大的图像文件¶
Cohere 多模态嵌入模型接受小于 5MB 的文件,因此这里我们删除较大的图像文件。
delete_large_images(data_path)
Image: 8.jpg was larger than 5 MB and has been deleted. Image: 13.jpg was larger than 5 MB and has been deleted. Image: 11.jpg was larger than 5 MB and has been deleted. Image: 21.jpg was larger than 5 MB and has been deleted. Image: 23.jpg was larger than 5 MB and has been deleted. Image: 32.jpg was larger than 5 MB and has been deleted. Image: 19.jpg was larger than 5 MB and has been deleted. Image: 4.jpg was larger than 5 MB and has been deleted. Image: 5.jpg was larger than 5 MB and has been deleted. Image: 7.jpg was larger than 5 MB and has been deleted. Image: 6.jpg was larger than 5 MB and has been deleted. Image: 1.jpg was larger than 5 MB and has been deleted.
设置嵌入模型和 LLM。¶
Cohere 多模态嵌入模型用于检索,Anthropic 多模态 LLM 用于响应生成。
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal
from llama_index.core import Settings
Settings.embed_model = CohereEmbedding(
api_key=os.environ["COHERE_API_KEY"],
model_name="embed-english-v3.0", # current v3 models support multimodal embeddings
)
anthropic_multimodal_llm = AnthropicMultiModal(max_tokens=300)
加载数据¶
我们将加载下载的文本和图像数据。
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
设置 Qdrant 向量存储¶
我们将使用 Qdrant 向量存储来存储图像和文本嵌入以及相关的元数据。
from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext
import qdrant_client
# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_mm_db")
text_store = QdrantVectorStore(
client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
client=client, collection_name="image_collection"
)
storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)
创建 MultiModalVectorStoreIndex。¶
index = MultiModalVectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
image_embed_model=Settings.embed_model,
)
WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes. WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes.
测试检索¶
这里我们创建一个检索器并进行测试。
retriever_engine = index.as_retriever(
similarity_top_k=4, image_similarity_top_k=4
)
query = "Which models of Porsche are discussed here?"
retrieval_results = retriever_engine.retrieve(query)
检查检索结果¶
from llama_index.core.response.notebook_utils import display_source_node
from llama_index.core.schema import ImageNode
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
节点ID: ac3e92f1-e192-4aa5-bbc6-45674654d96f
相似性 0.49435770203542906
文本: === 空气动力学 === Taycan Turbo 的风阻系数为 Cd=0.22,制造商声称这是目前所有保时捷车型中最低的。Turbo S 型号的风阻系数略高...
节点ID: 045cde7c-963f-46cd-b820-9cabe07f1ab5
相似性 0.4804621315897337
文本: 保时捷 Taycan 是一款由德国汽车制造商保时捷生产的纯电动豪华运动轿车和猎装车。Taycan 的概念版本名为保时捷 Mission E...
节点ID: e14475d1-7bd4-48f3-a085-f712d5bc7e5a
相似性 0.46787589674504015
文本: === 保时捷 Mission E Cross Turismo === 保时捷 Mission E Cross Turismo 预示了 Taycan Cross Turismo,并在 2018 年日内瓦车展上展出。Mission E 的设计语言...
节点ID: a25b3aea-2fdd-4ae2-b5bc-55eef453fe82
相似性 0.4370399571869162
文本: == 技术规格 ==
=== 底盘 === Taycan 的车身主要由钢和铝通过不同的连接技术连接而成。车身的 B 柱、侧顶框架和座椅横梁由...
测试多模态 QueryEngine¶
我们将使用上述 MultiModalVectorStoreIndex
创建一个 QueryEngine
。
from llama_index.core import PromptTemplate
qa_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)
query_engine = index.as_query_engine(
llm=anthropic_multimodal_llm, text_qa_template=qa_tmpl
)
query = "Which models of Porsche are discussed here?"
response = query_engine.query(query)
print(str(response))
Based on the context provided, the Porsche models discussed are: - Porsche Taycan - a battery electric luxury sports sedan. It is offered in several variants at different performance levels, including the Taycan Turbo and Turbo S high-performance AWD models, the mid-range Taycan 4S, and a base RWD model. - Porsche Taycan Cross Turismo - a lifted shooting brake/wagon version of the Taycan with crossover-like features and styling. - Porsche Taycan Sport Turismo - shares the shooting brake profile with the Cross Turismo but without the crossover styling elements. A RWD version is available as the base Taycan Sport Turismo. - Porsche Mission E - the concept car unveiled in 2015 that previewed the design and technology of the production Taycan models.
检查来源¶
from llama_index.core.response.notebook_utils import display_source_node
for text_node in response.metadata["text_nodes"]:
display_source_node(text_node, source_length=200)
plot_images(
[n.metadata["file_path"] for n in response.metadata["image_nodes"]]
)
节点ID: ac3e92f1-e192-4aa5-bbc6-45674654d96f
相似性 0.49435770203542906
文本: === 空气动力学 === Taycan Turbo 的风阻系数为 Cd=0.22,制造商声称这是目前所有保时捷车型中最低的。Turbo S 型号的风阻系数略高...
节点ID: 045cde7c-963f-46cd-b820-9cabe07f1ab5
相似性 0.4804621315897337
文本: 保时捷 Taycan 是一款由德国汽车制造商保时捷生产的纯电动豪华运动轿车和猎装车。Taycan 的概念版本名为保时捷 Mission E...