使用 Nomic Embed 和 Anthropic 的多模态 RAG。¶
在本 Notebook 中,我们将展示如何使用 LlamaIndex、Nomic Embed 和 Anthropic 构建一个多模态 RAG 系统。
维基百科文本 Embedding 索引:Nomic Embed Text v1.5
维基百科图像 Embedding 索引:Nomic Embed Vision v1.5
查询编码器
- 使用 Nomic Embed Text 对文本索引进行查询文本编码
- 使用 Nomic Embed Vision 对图像索引进行查询文本编码
框架:LlamaIndex
步骤
- 下载维基百科文章的原始文本和图像文件
- 使用 Nomic Embed Text Embedding 构建向量存储的文本索引
- 使用 Nomic Embed Vision Embedding 构建向量存储的图像索引
- 使用不同的查询编码 Embedding 和向量存储同时检索相关文本和图像
- 将检索到的文本和图像传递给 Claude 3
%pip install llama-index-vector-stores-qdrant llama-index-multi-modal-llms-anthropic llama-index-embeddings-nomic
%pip install llama_index ftfy regex tqdm
%pip install matplotlib scikit-image
%pip install -U qdrant_client
%pip install wikipedia
加载和下载包括维基百科文本和图像的多模态数据集¶
解析维基百科文章并保存到本地文件夹
from pathlib import Path
import requests
wiki_titles = [
"batman",
"Vincent van Gogh",
"San Francisco",
"iPhone",
"Tesla Model S",
"BTS",
]
data_path = Path("data_wiki")
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
if not data_path.exists():
Path.mkdir(data_path)
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
解析维基百科图像和文本。加载到本地文件夹¶
import wikipedia
import urllib.request
from pathlib import Path
import time
image_path = Path("data_wiki")
image_uuid = 0
# image_metadata_dict stores images metadata including image uuid, filename and path
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 30
wiki_titles = [
"San Francisco",
"Batman",
"Vincent van Gogh",
"iPhone",
"Tesla Model S",
"BTS band",
]
# create folder for images only
if not image_path.exists():
Path.mkdir(image_path)
# Download images for wiki pages
# Assign UUID for each image
for title in wiki_titles:
images_per_wiki = 0
print(title)
try:
page_py = wikipedia.page(title)
list_img_urls = page_py.images
for url in list_img_urls:
if url.endswith(".jpg") or url.endswith(".png"):
image_uuid += 1
image_file_name = title + "_" + url.split("/")[-1]
# img_path could be s3 path pointing to the raw image file in the future
image_metadata_dict[image_uuid] = {
"filename": image_file_name,
"img_path": "./" + str(image_path / f"{image_uuid}.jpg"),
}
# Create a request with a valid User-Agent header
req = urllib.request.Request(
url,
data=None,
headers={
"User-Agent": "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Mobile Safari/537.36"
},
)
# Open the URL and save the image
with urllib.request.urlopen(req) as response, open(
image_path / f"{image_uuid}.jpg", "wb"
) as out_file:
out_file.write(response.read())
images_per_wiki += 1
# Limit the number of images downloaded per wiki page to 15
if images_per_wiki > MAX_IMAGES_PER_WIKI:
break
# Add a delay between requests to avoid overwhelming the server
time.sleep(1) # Adjust the delay as needed
except Exception as e:
print(e)
print(f"{images_per_wiki=}")
continue
San Francisco Batman Vincent van Gogh iPhone Tesla Model S BTS band
import os
os.environ["NOMIC_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""
在不同集合下使用文本和图像 Embedding 构建多模态向量存储¶
import qdrant_client
from llama_index.core import SimpleDirectoryReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.embeddings.nomic import NomicEmbedding
# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_db")
text_store = QdrantVectorStore(
client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
client=client, collection_name="image_collection"
)
storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)
embedding_model = NomicEmbedding(
model_name="nomic-embed-text-v1.5",
vision_model_name="nomic-embed-vision-v1.5",
)
# Create the MultiModal index
documents = SimpleDirectoryReader("./data_wiki/").load_data()
index = MultiModalVectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
embed_model=embedding_model,
image_embed_model=embedding_model,
)
/Users/zach/Library/Caches/pypoetry/virtualenvs/llama-index-cFuSqcva-py3.12/lib/python3.12/site-packages/PIL/Image.py:3218: DecompressionBombWarning: Image size (101972528 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. warnings.warn(
绘制从维基百科下载的图像¶
from PIL import Image
import matplotlib.pyplot as plt
import os
def plot_images(image_metadata_dict):
original_images_urls = []
images_shown = 0
for image_id in image_metadata_dict:
img_path = image_metadata_dict[image_id]["img_path"]
if os.path.isfile(img_path):
filename = image_metadata_dict[image_id]["filename"]
image = Image.open(img_path).convert("RGB")
plt.subplot(9, 9, len(original_images_urls) + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
original_images_urls.append(filename)
images_shown += 1
if images_shown >= 81:
break
plt.tight_layout()
plot_images(image_metadata_dict)
/Users/zach/Library/Caches/pypoetry/virtualenvs/llama-index-cFuSqcva-py3.12/lib/python3.12/site-packages/PIL/Image.py:3218: DecompressionBombWarning: Image size (101972528 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. warnings.warn(
def plot_images(image_paths):
images_shown = 0
plt.figure(figsize=(16, 9))
for img_path in image_paths:
if os.path.isfile(img_path):
image = Image.open(img_path)
plt.subplot(2, 3, images_shown + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
images_shown += 1
if images_shown >= 9:
break
获取一些示例查询的多模态检索结果¶
test_query = "Who are the band members in BTS?"
# generate retrieval results
retriever = index.as_retriever(similarity_top_k=3, image_similarity_top_k=5)
retrieval_results = retriever.retrieve(test_query)
from llama_index.core.response.notebook_utils import display_source_node
from llama_index.core.schema import ImageNode
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
**节点 ID:** 57e904ab-803b-4bf0-8d39-d4c07b80fa7a
相似度 0.8063886499053818
**文本:** BTS(韩语:방탄소년단;RR:Bangtan Sonyeondan;字面意思:防弹少年团),也称为 Bangtan Boys,是一个成立于 2010 年的韩国男子组合。该组合由 Jin、Suga、J-Hope、RM、Jimi...
**节点 ID:** 2deb16e2-d4a6-4725-9a9d-e72c910885c3
相似度 0.7790615531161136
**文本:** === 慈善 ===
BTS 以其慈善事业而闻名。该组合的几位成员已被纳入著名的捐赠俱乐部,例如联合国儿童基金会荣誉俱乐部和 Green N...
**节点 ID:** d80dd35c-be67-4226-b0b8-fbff4981a3cf
相似度 0.7593813810748964
**文本:** == 名称 == BTS 是韩语短语 Bangtan Sonyeondan(韩语:방탄소년단;汉字:防弹少年团)的缩写,字面意思是“防弹少年团”。成员 J-Hope 表示,这个名字象征着...
test_query = "What are Vincent van Gogh's famous paintings"
# generate retrieval results
retriever = index.as_retriever(similarity_top_k=3, image_similarity_top_k=5)
retrieval_results = retriever.retrieve(test_query)
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
**节点 ID:** e385577c-b150-4ead-9758-039461125962
相似度 0.83218262953011
**文本:** 文森特·威廉·梵·高(荷兰语:[ˈvɪnsɛnt ˈʋɪləɱ‿vɑŋ‿ˈɣɔx];1853 年 3 月 30 日 – 1890 年 7 月 29 日)是一位荷兰后印象派画家,是历史上最著名和最有影响力的人物之一...
**节点 ID:** a3edf96b-47ca-48ec-969f-d3a47febd539
相似度 0.8288469749568774
**文本:** 这部小说和 1956 年的电影进一步提升了他的名声,尤其是在美国,斯通猜测在他那本出人意料的畅销书之前,只有几百人听说过梵高....
**节点 ID:** 4e8de603-dac6-4ead-8851-85b4526ac8ca
相似度 0.8060470396548032
**文本:** 1890 年 1 月,十幅画作在布鲁塞尔的独立艺术家协会展出。据说法国总统 Marie François Sadi Carnot 对梵高的作品印象深刻。之后...
test_query = "What are the popular tourist attraction in San Francisco"
# generate retrieval results
retriever = index.as_retriever(similarity_top_k=3, image_similarity_top_k=5)
retrieval_results = retriever.retrieve(test_query)
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
**节点 ID:** c2b89622-c61a-4b70-bbc1-1b3708464426
相似度 0.7699549146961432
**文本:** 截至 2023 年 9 月,旧金山在全球金融中心指数中排名世界第五,美国第二。尽管市中心持续有企业外迁,但...
**节点 ID:** 0363c291-80d0-4766-85b6-02407b46e8e1
相似度 0.7672793963976988
**文本:** 然而,到 2016 年,旧金山在“商业友好度调查”中被小型企业评为低分。
与许多美国城市一样,旧金山曾经拥有一个重要的制造业部门,雇佣了将近...
**节点 ID:** 676c2719-7da8-4044-aa70-f84b8e45281e
相似度 0.7605001448191087
**文本:** == 公园与娱乐 ==
旧金山的几个公园和几乎所有的海滩都是金门国家游乐区的一部分,该区域是国家游乐区中访问量最大的区域之一...
test_query = "Which company makes Tesla"
# generate retrieval results
retriever = index.as_retriever(similarity_top_k=3, image_similarity_top_k=5)
retrieval_results = retriever.retrieve(test_query)
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
**节点 ID:** 63c77d12-3420-4c1c-bc35-edcf968238c0
相似度 0.7183866127180777
**文本:** 特斯拉 Model S 是一款由 Tesla, Inc. 自 2012 年以来制造的掀背式车身风格的纯电动行政轿车。Model S 采用电池驱动的双电机全轮驱动布局,尽管...
**节点 ID:** 6e95a173-44b6-4837-b424-86ce223ce801
相似度 0.7103282638750231
**文本:** === 零售销售模式 ===
特斯拉直接向消费者销售汽车,没有经销商网络,这与其他制造商不同,许多州立法也要求有经销商网络。为了支持其销售模式...
**节点 ID:** 30fe5ba5-7790-44d4-a1ac-17d5ffff6e70
相似度 0.7057133871456653
**文本:** === 按国家/地区划分的销量 ===
==== 亚洲/太平洋 ====
首批九辆澳大利亚 Model S 于 2014 年 12 月 9 日在悉尼交付。特斯拉在圣伦纳兹开设了第一家门店和服务中心,并且...
test_query = "what is the main character in Batman"
# generate retrieval results
retriever = index.as_retriever(similarity_top_k=3, image_similarity_top_k=5)
retrieval_results = retriever.retrieve(test_query)
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
**节点 ID:** 9df946c8-2d86-43ef-ad49-52d02fc9ca9f
相似度 0.813633584027285
**文本:** 蝙蝠侠是出现在 DC Comics 出版的美国漫画书中的超级英雄。该角色由艺术家 Bob Kane 和作家 Bill Finger 创作,首次亮相于漫画书第 27 期...
**节点 ID:** cd23d57f-1baa-4b64-98e8-f137437f1977
相似度 0.8057558559295224
**文本:** ==== 性格 ==== 蝙蝠侠的主要性格特征可概括为“财富;体能;推理能力和执着”。蝙蝠侠漫画书的细节和基调各不相同...
**节点 ID:** 5e49c94a-54de-493b-a31e-5cf3567a96cb
相似度 0.7948625863921873
**文本:** == 人物塑造 ==
=== 布鲁斯·韦恩 ===
蝙蝠侠的秘密身份是布鲁斯·韦恩,一位富有的美国实业家。小时候,布鲁斯目睹了他的父母托马斯·韦恩博士和...
使用 Claude 3 的多模态 RAG¶
使用 Nomic Embed 和 Claude 3,我们现在可以执行多模态 RAG!图像和文本被传递给 Claude 3 进行推理。
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal
query_engine = index.as_query_engine(
llm=AnthropicMultiModal(), similarity_top_k=2, image_similarity_top_k=1
)
response = query_engine.query(
"What are Vincent van Gogh's famous paintings and popular subjects?"
)
print(str(response))
Based on the provided context, some of Vincent van Gogh's most famous paintings and popular subjects include: - Landscapes, still lifes, portraits, and self-portraits characterized by bold colors and dramatic brushwork. This contributed to the rise of expressionism in modern art. - In his early works, he depicted mostly still lifes and peasant laborers. - After moving to Arles in southern France in 1888, his paintings grew brighter and he turned his attention to depicting the natural world, including local olive groves, wheat fields and sunflowers. - Some of his most expensive paintings that have sold for over $100 million (in today's equivalent prices) include Portrait of Dr Gachet, Portrait of Joseph Roulin, and Irises. - The Metropolitan Museum of Art acquired his painting Wheat Field with Cypresses in 1993 for $57 million. So in summary, Van Gogh is especially well-known for his vibrant, expressive landscapes of places he lived like Arles, portraits, and still life paintings of subjects like sunflowers, olive groves and wheat fields. His bold use of color and thick, dramatic brushstrokes were highly influential on later art movements.