使用 OpenAI GPT-4V 模型进行图像推理¶
在本 Notebook 中,我们将展示如何使用带有 GPT4V 的 OpenAI
LLM 抽象进行图像理解/推理。
我们还将展示在与 GPT4V 一起使用时,OpenAI
LLM 类当前支持的几个函数。
complete
(同步和异步): 用于单个提示和图像列表chat
(同步和异步): 用于多个聊天消息stream complete
(同步和异步): 用于流式输出 completestream chat
(同步和异步): 用于流式输出 chat
In [ ]
已复制!
%pip install llama-index-llms-openai matplotlib
%pip install llama-index-llms-openai matplotlib
使用 GPT4V 理解来自 URL 的图像¶
In [ ]
已复制!
import os
OPENAI_API_KEY = "sk-..." # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
import os OPENAI_API_KEY = "sk-..." # 您的 OpenAI API token os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
初始化 OpenAIMultiModal
并从 URL 加载图像¶
In [ ]
已复制!
from llama_index.llms.openai import OpenAI
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
"https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg",
"https://i2-prod.mirror.co.uk/incoming/article7160664.ece/ALTERNATES/s1200d/FIFA-Ballon-dOr-Gala-2015.jpg",
]
openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)
from llama_index.llms.openai import OpenAI image_urls = [ "https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg", "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg", "https://i2-prod.mirror.co.uk/incoming/article7160664.ece/ALTERNATES/s1200d/FIFA-Ballon-dOr-Gala-2015.jpg", ] openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)
In [ ]
已复制!
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
from PIL import Image import requests from io import BytesIO import matplotlib.pyplot as plt img_response = requests.get(image_urls[0]) print(image_urls[0]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img)
https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg
Out[ ]
<matplotlib.image.AxesImage at 0x11a2dc920>
要求模型描述它所看到的内容¶
In [ ]
已复制!
from llama_index.core.llms import (
ChatMessage,
ImageBlock,
TextBlock,
MessageRole,
)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the images as an alternative text"),
ImageBlock(url=image_urls[0]),
ImageBlock(url=image_urls[1]),
],
)
response = openai_llm.chat(messages=[msg])
from llama_index.core.llms import ( ChatMessage, ImageBlock, TextBlock, MessageRole, ) msg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="将图像描述为替代文本"), ImageBlock(url=image_urls[0]), ImageBlock(url=image_urls[1]), ], ) response = openai_llm.chat(messages=[msg])
In [ ]
已复制!
print(response)
print(response)
assistant: **Image 1:** The Colosseum in Rome is illuminated at night with the colors of the Italian flag: green, white, and red. The ancient amphitheater stands prominently against a deep blue sky, with some clouds visible. The foreground shows a construction area with barriers and a few people walking nearby. **Image 2:** A line graph titled "The U.S. Mortgage Rate Surge" compares the U.S. 30-year fixed-rate mortgage (in red) with existing home sales (in blue) from 2014 to 2023. The mortgage rate line shows a significant increase, reaching its highest level in over 20 years. Existing home sales fluctuate, with a notable decline in recent years. A text box highlights that in 2023, high mortgage rates and rising home prices have led to the lowest housing affordability since 1989.
我们也可以异步地流式传输模型响应
In [ ]
已复制!
async_resp = await openai_llm.astream_chat(messages=[msg])
async for delta in async_resp:
print(delta.delta, end="")
async_resp = await openai_llm.astream_chat(messages=[msg]) async for delta in async_resp: print(delta.delta, end="")
**Image 1:** The Colosseum in Rome is illuminated at night with the colors of the Italian flag: green, white, and red. The ancient structure stands prominently against a deep blue sky, with some clouds visible. The lower part of the image shows a construction area with barriers and a few people walking nearby. **Image 2:** A line graph titled "The U.S. Mortgage Rate Surge" compares the U.S. 30-year fixed-rate mortgage (in red) with existing home sales (in blue) from 2014 to 2023. The graph shows mortgage rates rising sharply in recent years, while home sales fluctuate. A note highlights that in 2023, high mortgage rates and rising home prices have led to the lowest housing affordability since 1989.
使用 GPT4V 理解来自本地文件的图像¶
In [ ]
已复制!
%pip install llama-index-readers-file
%pip install llama-index-readers-file
In [ ]
已复制!
from pathlib import Path
import shutil
import requests
img_path = Path().resolve() / "image.jpg"
response = requests.get(image_urls[-1])
with open(img_path, "wb") as file:
file.write(response.content)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the image as an alternative text"),
ImageBlock(path=img_path, image_mimetype="image/jpeg"),
],
)
response = openai_llm.chat(messages=[msg])
from pathlib import Path import shutil import requests img_path = Path().resolve() / "image.jpg" response = requests.get(image_urls[-1]) with open(img_path, "wb") as file: file.write(response.content) msg = ChatMessage( role=MessageRole.USER, blocks=[ TextBlock(text="将图像描述为替代文本"), ImageBlock(path=img_path, image_mimetype="image/jpeg"), ], ) response = openai_llm.chat(messages=[msg])
In [ ]
已复制!
print(response)
print(response)
assistant: A person in a black tuxedo and bow tie is holding a golden soccer ball trophy on a stage. The background is a warm yellow color with spotlights shining upwards.