使用 Mistral `pixtral-large` 进行图像推理的多模态 LLM¶

在本 notebook 中，我们将展示如何使用 MistralAI MultiModal LLM 类/抽象来进行图像理解/推理。

我们将演示 MistralAI Pixtral 多模态 LLM 支持的以下函数：

complete（同步和异步）：用于单个 prompt 和图像列表
stream complete（同步和异步）：用于流式输出 complete 结果

In [ ]

已复制！

%pip install llama-index-multi-modal-llms-mistralai
%pip install matplotlib
%pip install llama-index-multi-modal-llms-mistralai %pip install matplotlib

In [ ]

已复制！

import os
from IPython.display import Markdown, display

os.environ[
    "MISTRAL_API_KEY"
] = "<YOUR API KEY>"  # Your MistralAI API token here
import os from IPython.display import Markdown, display os.environ[ "MISTRAL_API_KEY" ] = "" # 在此处填写您的 MistralAI API token

初始化 `MistralAIMultiModal`¶

¶

In [ ]

已复制！

from llama_index.multi_modal_llms.mistralai import MistralAIMultiModal

mistralai_mm_llm = MistralAIMultiModal(
    model="pixtral-large-latest", max_new_tokens=1000
)
from llama_index.multi_modal_llms.mistralai import MistralAIMultiModal mistralai_mm_llm = MistralAIMultiModal( model="pixtral-large-latest", max_new_tokens=1000 )

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/ravithejad/Desktop/llamaindex/lib/python3.9/sit
[nltk_data]     e-packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Package stopwords is already up-to-date!
/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(

从 URL 加载图片¶

In [ ]

已复制！

from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

image_urls = [
    "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
    "https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg",
]

image_documents = load_image_urls(image_urls)
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls image_urls = [ "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg", "https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg", ] image_documents = load_image_urls(image_urls)

第一张图片¶

In [ ]

已复制！





from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
img_response = requests.get(image_urls[0], headers=headers)

print(image_urls[0])

img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
from PIL import Image import requests from io import BytesIO import matplotlib.pyplot as plt headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } img_response = requests.get(image_urls[0], headers=headers) print(image_urls[0]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img)

https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg

Out [ ]

<matplotlib.image.AxesImage at 0x2c2566520>

No description has been provided for this image

第二张图片¶

In [ ]

已复制！

img_response = requests.get(image_urls[1], headers=headers)

print(image_urls[1])

img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
img_response = requests.get(image_urls[1], headers=headers) print(image_urls[1]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img)

https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg

Out [ ]

<matplotlib.image.AxesImage at 0x2c26940a0>

使用一组图片完成一个 prompt¶

In [ ]

已复制！

complete_response = mistralai_mm_llm.complete(
    prompt="Describe the images as an alternative text in a few words",
    image_documents=image_documents,
)
complete_response = mistralai_mm_llm.complete( prompt="Describe the images as an alternative text in a few words", image_documents=image_documents, )

In [ ]

已复制！

display(Markdown(f"{complete_response}"))
display(Markdown(f"{complete_response}"))

第一张图片显示了被雪覆盖的巴黎埃菲尔铁塔，树木和小路也覆盖着白雪。

第二张图片是一张题为“法国的社会分化”的信息图，比较了弱势地区和整个法国之间的社会经济指标。关键点包括：

弱势地区33.5%的人属于工人阶级，而整个法国的这一比例为14.5%。
弱势地区的失业率为18.1%，全国为7.3%。
弱势地区25.2%的16-25岁年轻人未在校且失业，而全国的这一比例为12.9%。
弱势地区的月收入中位数为1,168欧元，而全国为1,822欧元。
弱势地区的贫困率为43.3%，而全国为15.5%。
弱势地区22.0%的家庭居住在拥挤的房屋中，而全国的这一比例为8.7%。

数据来源包括 Insee、ONPV、DARES 和 Observatoire des Inégalités，最新可用数据来自 2

使用一组图片流式完成一个 prompt¶

In [ ]

已复制！

stream_complete_response = mistralai_mm_llm.stream_complete(
    prompt="give me more context for this images in a few words",
    image_documents=image_documents,
)
stream_complete_response = mistralai_mm_llm.stream_complete( prompt="give me more context for this images in a few words", image_documents=image_documents, )

In [ ]

已复制！

for r in stream_complete_response:
    print(r.delta, end="")
for r in stream_complete_response: print(r.delta, end="")

The images highlight the socio-economic disparities in France, particularly between disadvantaged areas and the country as a whole. The first image shows the Eiffel Tower in a snowy landscape, symbolizing France. The second image is an infographic comparing various socio-economic indicators between disadvantaged areas and the entire nation. Key points include:

1. **Working-Class Participation**: Only 33.5% of people in disadvantaged areas are part of the working class, compared to 45.5% nationally.
2. **Unemployment Rate**: The unemployment rate in disadvantaged areas is significantly higher at 18.1%, compared to 7.3% nationally.
3. **Youth Not in School or Employed**: 25.2% of youth aged 16-25 in disadvantaged areas are neither in school nor employed, compared to 12.9% nationally.
4. **Median Monthly Income**: The median monthly income in disadvantaged areas is €1,168, much lower than the national median of €1,822.
5. **Poverty Rate**: The poverty rate in disadvantaged areas is 43.3%, compared to 15.5% nationally.
6. **Overcrowded Housing**: 22% of

异步完成¶

In [ ]

已复制！

response_acomplete = await mistralai_mm_llm.acomplete(
    prompt="Describe the images as an alternative text in a few words",
    image_documents=image_documents,
)
response_acomplete = await mistralai_mm_llm.acomplete( prompt="Describe the images as an alternative text in a few words", image_documents=image_documents, )

In [ ]

已复制！

display(Markdown(f"{response_acomplete}"))
display(Markdown(f"{response_acomplete}"))

第一张图片显示了被雪覆盖的法国巴黎埃菲尔铁塔，树木和小路也覆盖着白雪。

第二张图片是一张题为“法国的社会分化”的信息图，比较了弱势地区和整个法国之间的社会经济指标。关键点包括：

属于工人阶级的人口百分比：弱势地区为33.5%，整个法国为14.5%。
失业率：弱势地区为18.1%，整个法国为7.3%。
弱势地区未在校且失业的16-25岁年轻人百分比：25.2%，整个法国为12.9%。
月收入中位数：弱势地区为1,168欧元，整个法国为1,822欧元。
贫困率：弱势地区为43.3%，整个法国为15.5%。
居住在拥挤房屋中的家庭：弱势地区为22.0%，整个法国为8.7%。

数据来源为 Insee, ONPV

异步流式完成¶

In [ ]

已复制！

response_astream_complete = await mistralai_mm_llm.astream_complete(
    prompt="Describe the images as an alternative text in a few words",
    image_documents=image_documents,
)
response_astream_complete = await mistralai_mm_llm.astream_complete( prompt="Describe the images as an alternative text in a few words", image_documents=image_documents, )

In [ ]

已复制！

async for delta in response_astream_complete:
    print(delta.delta, end="")
async for delta in response_astream_complete: print(delta.delta, end="")

The first image shows the Eiffel Tower in Paris, France, covered in snow with trees and a pathway also covered in snow.

The second image is an infographic titled "France's Social Divide," comparing socio-economic indicators between disadvantaged areas and France as a whole. Key points include:
- 33.5% of people in disadvantaged areas are part of the working class, compared to 14.5% in the whole of France.
- The unemployment rate is 18.1% in disadvantaged areas versus 7.3% in the whole of France.
- 25.2% of 16-25 year-olds in disadvantaged areas are not in school and unemployed, compared to 12.9% in the whole of France.
- The median monthly income is €1,168 in disadvantaged areas and €1,822 in the whole of France.
- The poverty rate is 43.3% in disadvantaged areas and 15.5% in the whole of France.
- 22.0% of households in disadvantaged areas live in overcrowded housing, compared to 8.7% in the whole of France.

The data sources include Insee, ONPV, DARES, and Observatoire des In

用两张图片完成¶

In [ ]

已复制！

image_urls = [
    "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
    "https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg",
]
image_urls = [ "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg", "https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg", ]

让我们检查图片。¶

第一张图片¶

In [ ]

已复制！

img_response = requests.get(image_urls[0], headers=headers)

print(image_urls[0])

img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
img_response = requests.get(image_urls[0], headers=headers) print(image_urls[0]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img)

https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg

Out [ ]

<matplotlib.image.AxesImage at 0x2c4d4c280>

第二张图片¶

In [ ]

已复制！

img_response = requests.get(image_urls[1], headers=headers)

print(image_urls[1])

img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
img_response = requests.get(image_urls[1], headers=headers) print(image_urls[1]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img)

https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg

Out [ ]

<matplotlib.image.AxesImage at 0x2c510d850>

In [ ]

已复制！

image_documents_compare = load_image_urls(image_urls)

response_multi = mistralai_mm_llm.complete(
    prompt="What are the differences between two images?",
    image_documents=image_documents_compare,
)
image_documents_compare = load_image_urls(image_urls) response_multi = mistralai_mm_llm.complete( prompt="What are the differences between two images?", image_documents=image_documents_compare, )

In [ ]

已复制！

display(Markdown(f"{response_multi}"))
display(Markdown(f"{response_multi}"))

第一张图片显示了冬季被雪覆盖的埃菲尔铁塔，地面和树木都有雪；第二张图片显示了一个网球场，有一群人正在观看比赛。第一张图片中没有人，而第二张图片中有很多。第一张图片是在白天拍摄的，而第二张图片是在晚上拍摄的。第一张图片中有一个公园长椅，而第二张图片中有一个网球场。第一张图片中有一个围栏，而第二张图片中有一个体育场。

从本地文件加载图片¶

In [ ]

已复制！

!wget 'https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg' -O 'receipt.jpg'
!wget 'https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg' -O 'receipt.jpg'

--2024-11-18 21:51:11--  https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg
Resolving www.boredpanda.com (www.boredpanda.com)... 18.161.111.92, 18.161.111.28, 18.161.111.66, ...
Connecting to www.boredpanda.com (www.boredpanda.com)|18.161.111.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 112631 (110K) [image/jpeg]
Saving to: ‘receipt.jpg’

receipt.jpg         100%[===================>] 109.99K   450KB/s    in 0.2s    

2024-11-18 21:51:16 (450 KB/s) - ‘receipt.jpg’ saved [112631/112631]

In [ ]

已复制！

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("./receipt.jpg")
plt.imshow(img)
from PIL import Image import matplotlib.pyplot as plt img = Image.open("./receipt.jpg") plt.imshow(img)

Out [ ]

<matplotlib.image.AxesImage at 0x2c51a5a60>

In [ ]

已复制！





from llama_index.core import SimpleDirectoryReader

# put your local directore here
image_documents = SimpleDirectoryReader(
    input_files=["./receipt.jpg"]
).load_data()

response = mistralai_mm_llm.complete(
    prompt="Transcribe the text in the image",
    image_documents=image_documents,
)
from llama_index.core import SimpleDirectoryReader # 在此处填写您的本地目录 image_documents = SimpleDirectoryReader( input_files=["./receipt.jpg"] ).load_data() response = mistralai_mm_llm.complete( prompt="Transcribe the text in the image", image_documents=image_documents, )

In [ ]

已复制！

display(Markdown(f"{response}"))
display(Markdown(f"{response}"))

好的，这是图片中文字的转录：

堂食

收银员：Raul 2022年4月2日 5:01:56P

1 个肉馅饼 - 牛肉 $3.00 1 个肉馅饼 - 芝士 $3.00 1 个肉馅饼 - 鸡肉 $3.00 1 份 Tallarin Huancaina Lomo Saltado $19.99 1 1/2 份 Pisco Sour $15.00

小计 $43.99 本地税 5.5% $2.42

总计 $46.41

移民使美国伟大他们今天也为您烹饪食物并为您服务上帝保佑您

在线：https://clover.com/r/D0BQZ3R656MDC

订单 D0BQZ3R656MDC

Clover 隐私政策 https://clover.com/privacy

使用 Mistral `pixtral-large` 进行图像推理的多模态 LLM¶

初始化 MistralAIMultiModal¶

¶

从 URL 加载图片¶

第一张图片¶

第二张图片¶

使用一组图片完成一个 prompt¶

使用一组图片流式完成一个 prompt¶

异步完成¶

异步流式完成¶

用两张图片完成¶

让我们检查图片。¶

第一张图片¶

第二张图片¶

从本地文件加载图片¶

初始化 `MistralAIMultiModal`¶