使用 Cohere 多模态嵌入进行多模态检索¶

Cohere 发布了多模态嵌入模型，在本 notebook 中，我们将演示使用 Cohere 多模态嵌入进行多模态检索。

为什么多模态嵌入很重要？

多模态嵌入非常重要，因为它们使得 AI 系统能够以统一的方式理解和搜索图像和文本。多模态嵌入将不同类型的内容（如文本和图像）转换到同一个嵌入空间，这使得用户可以通过一个查询查找不同媒体类型中的相关信息，而无需使用单独的文本搜索和图像搜索系统。

本次演示的步骤如下：

从相关的维基百科文章下载文本、图像和原始 PDF 文件。
使用 Cohere 多模态嵌入为文本和图像构建多模态索引。
使用多模态检索器同时检索给定查询的相关文本和图像。
使用多模态查询引擎为给定查询生成响应。

注意：我们将使用 Anthropic 的多模态 LLM 来生成响应，因为 Cohere 尚不支持多模态 LLM。

安装¶

我们将使用 Cohere 多模态嵌入进行检索，使用 Qdrant 向量存储和 Anthropic 多模态 LLM 进行响应生成。

In [ ]

已复制！

%pip install llama-index-embeddings-cohere
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-multi-modal-llms-anthropic
%pip install llama-index-embeddings-cohere %pip install llama-index-vector-stores-qdrant %pip install llama-index-multi-modal-llms-anthropic

设置 API Keys¶

Cohere - 多模态检索

Anthropic - 多模态 LLM。

In [ ]

已复制！

import os

os.environ["COHERE_API_KEY"] = "<YOUR COHERE API KEY>"

os.environ["ANTHROPIC_API_KEY"] = "<YOUR ANTHROPIC API KEY>"
import os os.environ["COHERE_API_KEY"] = "" os.environ["ANTHROPIC_API_KEY"] = ""

工具¶

get_wikipedia_images: 从指定标题的维基百科页面获取图像 URL。
plot_images: 绘制指定图像路径列表中的图像。
delete_large_images: 删除指定目录中大于 5 MB 的图像。

注意: Cohere API 接受小于 5MB 的图像文件。

In [ ]

已复制！





import requests
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
import urllib.request
import os


def get_wikipedia_images(title):
    """
    Get the image URLs from the Wikipedia page with the specified title.
    """
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "imageinfo",
            "iiprop": "url|dimensions|mime",
            "generator": "images",
            "gimlimit": "50",
        },
    ).json()
    image_urls = []
    for page in response["query"]["pages"].values():
        if page["imageinfo"][0]["url"].endswith(".jpg") or page["imageinfo"][
            0
        ]["url"].endswith(".png"):
            image_urls.append(page["imageinfo"][0]["url"])
    return image_urls


def plot_images(image_paths):
    """
    Plot the images in the specified list of image paths.
    """
    images_shown = 0
    plt.figure(figsize=(16, 9))
    for img_path in image_paths:
        if os.path.isfile(img_path):
            image = Image.open(img_path)

            plt.subplot(2, 3, images_shown + 1)
            plt.imshow(image)
            plt.xticks([])
            plt.yticks([])

            images_shown += 1
            if images_shown >= 9:
                break


def delete_large_images(folder_path):
    """
    Delete images larger than 5 MB in the specified directory.
    """
    # List to hold the names of deleted image files
    deleted_images = []

    # Iterate through each file in the directory
    for file_name in os.listdir(folder_path):
        if file_name.lower().endswith(
            (".png", ".jpg", ".jpeg", ".gif", ".bmp")
        ):
            # Construct the full file path
            file_path = os.path.join(folder_path, file_name)
            # Get the size of the file in bytes
            file_size = os.path.getsize(file_path)
            # Check if the file size is greater than 5 MB (5242880 bytes) and remove it
            if file_size > 5242880:
                os.remove(file_path)
                deleted_images.append(file_name)
                print(
                    f"Image: {file_name} was larger than 5 MB and has been deleted."
                )
import requests import matplotlib.pyplot as plt from PIL import Image from pathlib import Path import urllib.request import os def get_wikipedia_images(title): """ Get the image URLs from the Wikipedia page with the specified title. """ response = requests.get( "https://en.wikipedia.org/w/api.php", params={ "action": "query", "format": "json", "titles": title, "prop": "imageinfo", "iiprop": "url|dimensions|mime", "generator": "images", "gimlimit": "50", }, ).json() image_urls = [] for page in response["query"]["pages"].values(): if page["imageinfo"][0]["url"].endswith(".jpg") or page["imageinfo"][ 0 ]["url"].endswith(".png"): image_urls.append(page["imageinfo"][0]["url"]) return image_urls def plot_images(image_paths): """ Plot the images in the specified list of image paths. """ images_shown = 0 plt.figure(figsize=(16, 9)) for img_path in image_paths: if os.path.isfile(img_path): image = Image.open(img_path) plt.subplot(2, 3, images_shown + 1) plt.imshow(image) plt.xticks([]) plt.yticks([]) images_shown += 1 if images_shown >= 9: break def delete_large_images(folder_path): """ Delete images larger than 5 MB in the specified directory. """ # List to hold the names of deleted image files deleted_images = [] # Iterate through each file in the directory for file_name in os.listdir(folder_path): if file_name.lower().endswith( (".png", ".jpg", ".jpeg", ".gif", ".bmp") ): # Construct the full file path file_path = os.path.join(folder_path, file_name) # Get the size of the file in bytes file_size = os.path.getsize(file_path) # Check if the file size is greater than 5 MB (5242880 bytes) and remove it if file_size > 5242880: os.remove(file_path) deleted_images.append(file_name) print( f"Image: {file_name} was larger than 5 MB and has been deleted." )

从维基百科下载文本和图像¶

我们将下载以下维基百科页面相关的文本和图像。

Audi e-tron
Ford Mustang
Porsche Taycan

In [ ]

已复制！





image_uuid = 0
# image_metadata_dict stores images metadata including image uuid, filename and path
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 10

wiki_titles = {
    "Audi e-tron",
    "Ford Mustang",
    "Porsche Taycan",
}


data_path = Path("mixed_wiki")
if not data_path.exists():
    Path.mkdir(data_path)

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

    images_per_wiki = 0
    try:
        list_img_urls = get_wikipedia_images(title)

        for url in list_img_urls:
            if (
                url.endswith(".jpg")
                or url.endswith(".png")
                or url.endswith(".svg")
            ):
                image_uuid += 1
                urllib.request.urlretrieve(
                    url, data_path / f"{image_uuid}.jpg"
                )
                images_per_wiki += 1
                if images_per_wiki > MAX_IMAGES_PER_WIKI:
                    break
    except:
        print(str(Exception("No images found for Wikipedia page: ")) + title)
        continue
image_uuid = 0 # image_metadata_dict stores images metadata including image uuid, filename and path image_metadata_dict = {} MAX_IMAGES_PER_WIKI = 10 wiki_titles = { "Audi e-tron", "Ford Mustang", "Porsche Taycan", } data_path = Path("mixed_wiki") if not data_path.exists(): Path.mkdir(data_path) for title in wiki_titles: response = requests.get( "https://en.wikipedia.org/w/api.php", params={ "action": "query", "format": "json", "titles": title, "prop": "extracts", "explaintext": True, }, ).json() page = next(iter(response["query"]["pages"].values())) wiki_text = page["extract"] with open(data_path / f"{title}.txt", "w") as fp: fp.write(wiki_text) images_per_wiki = 0 try: list_img_urls = get_wikipedia_images(title) for url in list_img_urls: if ( url.endswith(".jpg") or url.endswith(".png") or url.endswith(".svg") ): image_uuid += 1 urllib.request.urlretrieve( url, data_path / f"{image_uuid}.jpg" ) images_per_wiki += 1 if images_per_wiki > MAX_IMAGES_PER_WIKI: break except: print(str(Exception("No images found for Wikipedia page: ")) + title) continue

删除较大的图像文件¶

Cohere 多模态嵌入模型接受小于 5MB 的文件，因此这里我们删除较大的图像文件。

In [ ]

已复制！

delete_large_images(data_path)
delete_large_images(data_path)

Image: 8.jpg was larger than 5 MB and has been deleted.
Image: 13.jpg was larger than 5 MB and has been deleted.
Image: 11.jpg was larger than 5 MB and has been deleted.
Image: 21.jpg was larger than 5 MB and has been deleted.
Image: 23.jpg was larger than 5 MB and has been deleted.
Image: 32.jpg was larger than 5 MB and has been deleted.
Image: 19.jpg was larger than 5 MB and has been deleted.
Image: 4.jpg was larger than 5 MB and has been deleted.
Image: 5.jpg was larger than 5 MB and has been deleted.
Image: 7.jpg was larger than 5 MB and has been deleted.
Image: 6.jpg was larger than 5 MB and has been deleted.
Image: 1.jpg was larger than 5 MB and has been deleted.

设置嵌入模型和 LLM。¶

Cohere 多模态嵌入模型用于检索，Anthropic 多模态 LLM 用于响应生成。

In [ ]

已复制！

from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal
from llama_index.core import Settings

Settings.embed_model = CohereEmbedding(
    api_key=os.environ["COHERE_API_KEY"],
    model_name="embed-english-v3.0",  # current v3 models support multimodal embeddings
)

anthropic_multimodal_llm = AnthropicMultiModal(max_tokens=300)
from llama_index.embeddings.cohere import CohereEmbedding from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal from llama_index.core import Settings Settings.embed_model = CohereEmbedding( api_key=os.environ["COHERE_API_KEY"], model_name="embed-english-v3.0", # current v3 models support multimodal embeddings ) anthropic_multimodal_llm = AnthropicMultiModal(max_tokens=300)

加载数据¶

我们将加载下载的文本和图像数据。

In [ ]

已复制！

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./mixed_wiki/").load_data()

设置 Qdrant 向量存储¶

我们将使用 Qdrant 向量存储来存储图像和文本嵌入以及相关的元数据。

In [ ]

已复制！





from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext

import qdrant_client

# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_mm_db")

text_store = QdrantVectorStore(
    client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
    client=client, collection_name="image_collection"
)
storage_context = StorageContext.from_defaults(
    vector_store=text_store, image_store=image_store
)
from llama_index.core.indices import MultiModalVectorStoreIndex from llama_index.vector_stores.qdrant import QdrantVectorStore from llama_index.core import StorageContext import qdrant_client # Create a local Qdrant vector store client = qdrant_client.QdrantClient(path="qdrant_mm_db") text_store = QdrantVectorStore( client=client, collection_name="text_collection" ) image_store = QdrantVectorStore( client=client, collection_name="image_collection" ) storage_context = StorageContext.from_defaults( vector_store=text_store, image_store=image_store )

创建 MultiModalVectorStoreIndex。¶

In [ ]

已复制！

index = MultiModalVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    image_embed_model=Settings.embed_model,
)
index = MultiModalVectorStoreIndex.from_documents( documents, storage_context=storage_context, image_embed_model=Settings.embed_model, )

WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes.
WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes.

测试检索¶

这里我们创建一个检索器并进行测试。

In [ ]

已复制！

retriever_engine = index.as_retriever(
    similarity_top_k=4, image_similarity_top_k=4
)
retriever_engine = index.as_retriever( similarity_top_k=4, image_similarity_top_k=4 )

In [ ]

已复制！

query = "Which models of Porsche are discussed here?"
retrieval_results = retriever_engine.retrieve(query)
query = "Which models of Porsche are discussed here?" retrieval_results = retriever_engine.retrieve(query)

检查检索结果¶

In [ ]

已复制！





from llama_index.core.response.notebook_utils import display_source_node
from llama_index.core.schema import ImageNode

retrieved_image = []
for res_node in retrieval_results:
    if isinstance(res_node.node, ImageNode):
        retrieved_image.append(res_node.node.metadata["file_path"])
    else:
        display_source_node(res_node, source_length=200)

plot_images(retrieved_image)
from llama_index.core.response.notebook_utils import display_source_node from llama_index.core.schema import ImageNode retrieved_image = [] for res_node in retrieval_results: if isinstance(res_node.node, ImageNode): retrieved_image.append(res_node.node.metadata["file_path"]) else: display_source_node(res_node, source_length=200) plot_images(retrieved_image)

节点ID： ac3e92f1-e192-4aa5-bbc6-45674654d96f
相似性 0.49435770203542906
文本： === 空气动力学 === Taycan Turbo 的风阻系数为 Cd=0.22，制造商声称这是目前所有保时捷车型中最低的。Turbo S 型号的风阻系数略高...

节点ID： 045cde7c-963f-46cd-b820-9cabe07f1ab5
相似性 0.4804621315897337
文本： 保时捷 Taycan 是一款由德国汽车制造商保时捷生产的纯电动豪华运动轿车和猎装车。Taycan 的概念版本名为保时捷 Mission E...

节点ID： e14475d1-7bd4-48f3-a085-f712d5bc7e5a
相似性 0.46787589674504015
文本： === 保时捷 Mission E Cross Turismo === 保时捷 Mission E Cross Turismo 预示了 Taycan Cross Turismo，并在 2018 年日内瓦车展上展出。Mission E 的设计语言...

节点ID： a25b3aea-2fdd-4ae2-b5bc-55eef453fe82
相似性 0.4370399571869162
文本： == 技术规格 ==

=== 底盘 === Taycan 的车身主要由钢和铝通过不同的连接技术连接而成。车身的 B 柱、侧顶框架和座椅横梁由...

No description has been provided for this image

测试多模态 QueryEngine¶

我们将使用上述 MultiModalVectorStoreIndex 创建一个 QueryEngine。

In [ ]

已复制！





from llama_index.core import PromptTemplate

qa_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)

query_engine = index.as_query_engine(
    llm=anthropic_multimodal_llm, text_qa_template=qa_tmpl
)
from llama_index.core import PromptTemplate qa_tmpl_str = ( "以下是上下文信息。\n" "---------------------\n" "{context_str}\n" "---------------------\n" "根据上下文信息，而不是先验知识，" "回答查询。\n" "查询：{query_str}\n" "答案： " ) qa_tmpl = PromptTemplate(qa_tmpl_str) query_engine = index.as_query_engine( llm=anthropic_multimodal_llm, text_qa_template=qa_tmpl )

In [ ]

已复制！

query = "Which models of Porsche are discussed here?"
response = query_engine.query(query)
query = "这里讨论了哪些保时捷车型？" response = query_engine.query(query)

In [ ]

已复制！

print(str(response))
print(str(response))

Based on the context provided, the Porsche models discussed are:

- Porsche Taycan - a battery electric luxury sports sedan. It is offered in several variants at different performance levels, including the Taycan Turbo and Turbo S high-performance AWD models, the mid-range Taycan 4S, and a base RWD model.

- Porsche Taycan Cross Turismo - a lifted shooting brake/wagon version of the Taycan with crossover-like features and styling. 

- Porsche Taycan Sport Turismo - shares the shooting brake profile with the Cross Turismo but without the crossover styling elements. A RWD version is available as the base Taycan Sport Turismo.

- Porsche Mission E - the concept car unveiled in 2015 that previewed the design and technology of the production Taycan models.

检查来源¶

In [ ]

已复制！

from llama_index.core.response.notebook_utils import display_source_node

for text_node in response.metadata["text_nodes"]:
    display_source_node(text_node, source_length=200)
plot_images(
    [n.metadata["file_path"] for n in response.metadata["image_nodes"]]
)
from llama_index.core.response.notebook_utils import display_source_node for text_node in response.metadata["text_nodes"]: display_source_node(text_node, source_length=200) plot_images( [n.metadata["file_path"] for n in response.metadata["image_nodes"]] )

节点ID： ac3e92f1-e192-4aa5-bbc6-45674654d96f
相似性 0.49435770203542906
文本： === 空气动力学 === Taycan Turbo 的风阻系数为 Cd=0.22，制造商声称这是目前所有保时捷车型中最低的。Turbo S 型号的风阻系数略高...

节点ID： 045cde7c-963f-46cd-b820-9cabe07f1ab5
相似性 0.4804621315897337
文本： 保时捷 Taycan 是一款由德国汽车制造商保时捷生产的纯电动豪华运动轿车和猎装车。Taycan 的概念版本名为保时捷 Mission E...