使用 Mistral `pixtral-large` 进行图像推理的多模态 LLM¶
在本 notebook 中,我们将展示如何使用 MistralAI MultiModal LLM 类/抽象来进行图像理解/推理。
我们将演示 MistralAI Pixtral 多模态 LLM 支持的以下函数:
complete
(同步和异步):用于单个 prompt 和图像列表stream complete
(同步和异步):用于流式输出 complete 结果
%pip install llama-index-multi-modal-llms-mistralai
%pip install matplotlib
import os
from IPython.display import Markdown, display
os.environ[
"MISTRAL_API_KEY"
] = "<YOUR API KEY>" # Your MistralAI API token here
初始化 MistralAIMultiModal
¶
from llama_index.multi_modal_llms.mistralai import MistralAIMultiModal
mistralai_mm_llm = MistralAIMultiModal(
model="pixtral-large-latest", max_new_tokens=1000
)
[nltk_data] Downloading package stopwords to [nltk_data] /Users/ravithejad/Desktop/llamaindex/lib/python3.9/sit [nltk_data] e-packages/llama_index/core/_static/nltk_cache... [nltk_data] Package stopwords is already up-to-date! /Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020 warnings.warn(
从 URL 加载图片¶
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
image_urls = [
"https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
"https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg",
]
image_documents = load_image_urls(image_urls)
第一张图片¶
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
img_response = requests.get(image_urls[0], headers=headers)
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg
<matplotlib.image.AxesImage at 0x2c2566520>
第二张图片¶
img_response = requests.get(image_urls[1], headers=headers)
print(image_urls[1])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg
<matplotlib.image.AxesImage at 0x2c26940a0>
使用一组图片完成一个 prompt¶
complete_response = mistralai_mm_llm.complete(
prompt="Describe the images as an alternative text in a few words",
image_documents=image_documents,
)
display(Markdown(f"{complete_response}"))
第一张图片显示了被雪覆盖的巴黎埃菲尔铁塔,树木和小路也覆盖着白雪。
第二张图片是一张题为“法国的社会分化”的信息图,比较了弱势地区和整个法国之间的社会经济指标。关键点包括:
- 弱势地区33.5%的人属于工人阶级,而整个法国的这一比例为14.5%。
- 弱势地区的失业率为18.1%,全国为7.3%。
- 弱势地区25.2%的16-25岁年轻人未在校且失业,而全国的这一比例为12.9%。
- 弱势地区的月收入中位数为1,168欧元,而全国为1,822欧元。
- 弱势地区的贫困率为43.3%,而全国为15.5%。
- 弱势地区22.0%的家庭居住在拥挤的房屋中,而全国的这一比例为8.7%。
数据来源包括 Insee、ONPV、DARES 和 Observatoire des Inégalités,最新可用数据来自 2
使用一组图片流式完成一个 prompt¶
stream_complete_response = mistralai_mm_llm.stream_complete(
prompt="give me more context for this images in a few words",
image_documents=image_documents,
)
for r in stream_complete_response:
print(r.delta, end="")
The images highlight the socio-economic disparities in France, particularly between disadvantaged areas and the country as a whole. The first image shows the Eiffel Tower in a snowy landscape, symbolizing France. The second image is an infographic comparing various socio-economic indicators between disadvantaged areas and the entire nation. Key points include: 1. **Working-Class Participation**: Only 33.5% of people in disadvantaged areas are part of the working class, compared to 45.5% nationally. 2. **Unemployment Rate**: The unemployment rate in disadvantaged areas is significantly higher at 18.1%, compared to 7.3% nationally. 3. **Youth Not in School or Employed**: 25.2% of youth aged 16-25 in disadvantaged areas are neither in school nor employed, compared to 12.9% nationally. 4. **Median Monthly Income**: The median monthly income in disadvantaged areas is €1,168, much lower than the national median of €1,822. 5. **Poverty Rate**: The poverty rate in disadvantaged areas is 43.3%, compared to 15.5% nationally. 6. **Overcrowded Housing**: 22% of
异步完成¶
response_acomplete = await mistralai_mm_llm.acomplete(
prompt="Describe the images as an alternative text in a few words",
image_documents=image_documents,
)
display(Markdown(f"{response_acomplete}"))
第一张图片显示了被雪覆盖的法国巴黎埃菲尔铁塔,树木和小路也覆盖着白雪。
第二张图片是一张题为“法国的社会分化”的信息图,比较了弱势地区和整个法国之间的社会经济指标。关键点包括:
- 属于工人阶级的人口百分比:弱势地区为33.5%,整个法国为14.5%。
- 失业率:弱势地区为18.1%,整个法国为7.3%。
- 弱势地区未在校且失业的16-25岁年轻人百分比:25.2%,整个法国为12.9%。
- 月收入中位数:弱势地区为1,168欧元,整个法国为1,822欧元。
- 贫困率:弱势地区为43.3%,整个法国为15.5%。
- 居住在拥挤房屋中的家庭:弱势地区为22.0%,整个法国为8.7%。
数据来源为 Insee, ONPV
异步流式完成¶
response_astream_complete = await mistralai_mm_llm.astream_complete(
prompt="Describe the images as an alternative text in a few words",
image_documents=image_documents,
)
async for delta in response_astream_complete:
print(delta.delta, end="")
The first image shows the Eiffel Tower in Paris, France, covered in snow with trees and a pathway also covered in snow. The second image is an infographic titled "France's Social Divide," comparing socio-economic indicators between disadvantaged areas and France as a whole. Key points include: - 33.5% of people in disadvantaged areas are part of the working class, compared to 14.5% in the whole of France. - The unemployment rate is 18.1% in disadvantaged areas versus 7.3% in the whole of France. - 25.2% of 16-25 year-olds in disadvantaged areas are not in school and unemployed, compared to 12.9% in the whole of France. - The median monthly income is €1,168 in disadvantaged areas and €1,822 in the whole of France. - The poverty rate is 43.3% in disadvantaged areas and 15.5% in the whole of France. - 22.0% of households in disadvantaged areas live in overcrowded housing, compared to 8.7% in the whole of France. The data sources include Insee, ONPV, DARES, and Observatoire des In
用两张图片完成¶
image_urls = [
"https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
"https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg",
]
img_response = requests.get(image_urls[0], headers=headers)
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg
<matplotlib.image.AxesImage at 0x2c4d4c280>
第二张图片¶
img_response = requests.get(image_urls[1], headers=headers)
print(image_urls[1])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg
<matplotlib.image.AxesImage at 0x2c510d850>
image_documents_compare = load_image_urls(image_urls)
response_multi = mistralai_mm_llm.complete(
prompt="What are the differences between two images?",
image_documents=image_documents_compare,
)
display(Markdown(f"{response_multi}"))
第一张图片显示了冬季被雪覆盖的埃菲尔铁塔,地面和树木都有雪;第二张图片显示了一个网球场,有一群人正在观看比赛。第一张图片中没有人,而第二张图片中有很多。第一张图片是在白天拍摄的,而第二张图片是在晚上拍摄的。第一张图片中有一个公园长椅,而第二张图片中有一个网球场。第一张图片中有一个围栏,而第二张图片中有一个体育场。
从本地文件加载图片¶
!wget 'https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg' -O 'receipt.jpg'
--2024-11-18 21:51:11-- https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg Resolving www.boredpanda.com (www.boredpanda.com)... 18.161.111.92, 18.161.111.28, 18.161.111.66, ... Connecting to www.boredpanda.com (www.boredpanda.com)|18.161.111.92|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 112631 (110K) [image/jpeg] Saving to: ‘receipt.jpg’ receipt.jpg 100%[===================>] 109.99K 450KB/s in 0.2s 2024-11-18 21:51:16 (450 KB/s) - ‘receipt.jpg’ saved [112631/112631]
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open("./receipt.jpg")
plt.imshow(img)
<matplotlib.image.AxesImage at 0x2c51a5a60>
from llama_index.core import SimpleDirectoryReader
# put your local directore here
image_documents = SimpleDirectoryReader(
input_files=["./receipt.jpg"]
).load_data()
response = mistralai_mm_llm.complete(
prompt="Transcribe the text in the image",
image_documents=image_documents,
)
display(Markdown(f"{response}"))
好的,这是图片中文字的转录:
堂食
收银员:Raul 2022年4月2日 5:01:56P
1 个肉馅饼 - 牛肉 $3.00 1 个肉馅饼 - 芝士 $3.00 1 个肉馅饼 - 鸡肉 $3.00 1 份 Tallarin Huancaina Lomo Saltado $19.99 1 1/2 份 Pisco Sour $15.00
小计 $43.99 本地税 5.5% $2.42
总计 $46.41
移民使美国伟大 他们今天也为您烹饪食物并为您服务 上帝保佑您
在线:https://clover.com/r/D0BQZ3R656MDC
订单 D0BQZ3R656MDC
Clover 隐私政策 https://clover.com/privacy