检索增强图像字幕¶
在此示例中,我们展示如何利用 LLaVa + Replicate 进行图像理解/字幕生成,并根据图像理解从 Tesla 10K 文件中检索相关的非结构化文本和嵌入式表格。
- LLaVa 可以根据用户提示提供图像理解。
- 我们使用 Unstructured 解析出表格,并使用 LlamaIndex 递归检索来索引/检索表格和文本。
- 我们可以利用步骤 1 的图像理解,从步骤 2 生成的知识库(由 LlamaIndex 索引)中检索相关信息
LLaVA 上下文:大型语言与视觉助手
对于 LlamaIndex:LlaVa+Replicate 使我们能够本地运行图像理解,并将多模态知识与我们的 RAG 知识库系统相结合。
待办:等待 llama-cpp-python 在 Python 封装中支持 LLaVA 模型。这样 LlamaIndex 就可以利用 LlamaCPP
类直接/本地服务 LLaVA 模型。
通过 LlamaIndex 使用 Replicate 服务 LLaVA 模型¶
通过 Llama.cpp 本地构建和运行 LLaVA 模型(已弃用)¶
- git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
。请查阅 llama.cpp 仓库了解更多详情。make
- 从此 Hugging Face 仓库下载 LLaVA 模型,包括
ggml-model-*
和mmproj-model-*
。请根据您自己的本地配置选择一个模型 ./llava
用于检查 LLaVA 是否在本地运行
In [ ]
Copied!
%pip install llama-index-readers-file
%pip install llama-index-multi-modal-llms-replicate
%pip install llama-index-readers-file %pip install llama-index-multi-modal-llms-replicate
In [ ]
Copied!
%load_ext autoreload
% autoreload 2
%load_ext autoreload % autoreload 2
UsageError: Line magic function `%` not found.
In [ ]
Copied!
!pip install unstructured
!pip install unstructured
In [ ]
Copied!
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
from unstructured.partition.html import partition_html import pandas as pd pd.set_option("display.max_rows", None) pd.set_option("display.max_columns", None) pd.set_option("display.width", None) pd.set_option("display.max_colwidth", None)
WARNING: CPU random generator seem to be failing, disabling hardware random number generation WARNING: RDRND generated: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
从 Tesla 10K 文件执行数据提取¶
在这些章节中,我们使用 Unstructured 解析出表格和非表格元素。
提取元素¶
我们使用 Unstructured 从 10-K 文件中提取表格和非表格元素。
In [ ]
Copied!
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm !wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg !wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
In [ ]
Copied!
from llama_index.readers.file import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
from llama_index.readers.file import FlatReader from pathlib import Path reader = FlatReader() docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
In [ ]
Copied!
from llama_index.core.node_parser import UnstructuredElementNodeParser
node_parser = UnstructuredElementNodeParser()
from llama_index.core.node_parser import UnstructuredElementNodeParser node_parser = UnstructuredElementNodeParser()
In [ ]
Copied!
import os
REPLICATE_API_TOKEN = "..." # Your Relicate API token here
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
import os REPLICATE_API_TOKEN = "..." # 在此填写您的 Replicate API token os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
In [ ]
Copied!
import openai
OPENAI_API_KEY = "sk-..."
openai.api_key = OPENAI_API_KEY # add your openai api key here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
import openai OPENAI_API_KEY = "sk-..." openai.api_key = OPENAI_API_KEY # 在此填写您的 openai api key os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
In [ ]
Copied!
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
import os import pickle if not os.path.exists("2021_nodes.pkl"): raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021) pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb")) else: raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
In [ ]
Copied!
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
设置可组合检索器¶
现在我们已经提取了表格及其摘要,可以在 LlamaIndex 中设置一个可组合检索器来查询这些表格。
构建检索器¶
In [ ]
Copied!
from llama_index.core import VectorStoreIndex
# construct top-level vector index + query engine
vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021)
query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
from llama_index.core import VectorStoreIndex # 构建顶层向量索引 + 查询引擎 vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021) query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
In [ ]
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image import matplotlib.pyplot as plt imageUrl = "./tesla_supercharger.jpg" image = Image.open(imageUrl).convert("RGB") plt.figure(figsize=(16, 5)) plt.imshow(image)
Out [ ]
<matplotlib.image.AxesImage at 0x7f24f9bb8410>
通过 LlamaIndex 使用 Replicate 运行 LLaVA 模型进行图像理解¶
In [ ]
Copied!
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal
from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.replicate.base import (
REPLICATE_MULTI_MODAL_LLM_MODELS,
)
multi_modal_llm = ReplicateMultiModal(
model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"],
max_new_tokens=200,
temperature=0.1,
)
prompt = "what is the main object for tesla in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal from llama_index.core.schema import ImageDocument from llama_index.multi_modal_llms.replicate.base import ( REPLICATE_MULTI_MODAL_LLM_MODELS, ) multi_modal_llm = ReplicateMultiModal( model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"], max_new_tokens=200, temperature=0.1, ) prompt = "what is the main object for tesla in the image?" llava_response = multi_modal_llm.complete( prompt=prompt, image_documents=[ImageDocument(image_path=imageUrl)], )
根据 LLaVA 图像理解从 LlamaIndex 知识库中检索相关信息¶
In [ ]
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "请提供关于以下内容的相关信息:" rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieval entering id_1836_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station. Retrieval entering id_431_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station.
显示 LlamaIndex 的最终 RAG 图像字幕结果¶
In [ ]
Copied!
print(str(rag_response))
print(str(rag_response))
The main object for Tesla in the image is a red and white electric car charging station.
In [ ]
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image import matplotlib.pyplot as plt imageUrl = "./shanghai.jpg" image = Image.open(imageUrl).convert("RGB") plt.figure(figsize=(16, 5)) plt.imshow(image)
Out [ ]
<matplotlib.image.AxesImage at 0x7f24f787aa50>
从 LlamaIndex 检索新图像的相关信息¶
In [ ]
Copied!
prompt = "which Tesla factory is shown in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
prompt = "图像中显示的是哪个特斯拉工厂?" llava_response = multi_modal_llm.complete( prompt=prompt, image_documents=[ImageDocument(image_path=imageUrl)], )
In [ ]
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "请提供关于以下内容的相关信息:" rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieving with query id None: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieved node with id, entering: id_431_table Retrieving with query id id_431_table: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieving text node: We continue to increase the degree of localized procurement and manufacturing there. Gigafactory Shanghai is representative of our plan to iteratively improve our manufacturing operations as we establish new factories, as we implemented the learnings from our Model 3 and Model Y ramp at the Fremont Factory to commence and ramp our production at Gigafactory Shanghai quickly and cost-effectively. Other Manufacturing Generally, we continue to expand production capacity at our existing facilities. We also intend to further increase cost-competitiveness in our significant markets by strategically adding local manufacturing, including at Gigafactory Berlin in Germany and Gigafactory Texas in Austin, Texas, which will begin production in 2022. Supply Chain Our products use thousands of purchased parts that are sourced from hundreds of suppliers across the world. We have developed close relationships with vendors of key parts such as battery cells, electronics and complex vehicle assemblies. Certain components purchased from these suppliers are shared or are similar across many product lines, allowing us to take advantage of pricing efficiencies from economies of scale. As is the case for most automotive companies, most of our procured components and systems are sourced from single suppliers. Where multiple sources are available for certain key components, we work to qualify multiple suppliers for them where it is sensible to do so in order to minimize production risks owing to disruptions in their supply. We also mitigate risk by maintaining safety stock for key parts and assemblies and die banks for components with lengthy procurement lead times. Our products use various raw materials including aluminum, steel, cobalt, lithium, nickel and copper. Pricing for these materials is governed by market conditions and may fluctuate due to various factors outside of our control, such as supply and demand and market speculation. We strive to execute long-term supply contracts for such materials at competitive pricing when feasible, and we currently believe that we have adequate access to raw materials supplies in order to meet the needs of our operations. Governmental Programs, Incentives and Regulations Globally, both the operation of our business by us and the ownership of our products by our customers are impacted by various government programs, incentives and other arrangements. Our business and products are also subject to numerous governmental regulations that vary among jurisdictions. Programs and Incentives California Alternative Energy and Advanced Transportation Financing Authority Tax Incentives We have agreements with the California Alternative Energy and Advanced Transportation Financing Authority that provide multi-year sales tax exclusions on purchases of manufacturing equipment that will be used for specific purposes, including the expansion and ongoing development of electric vehicles and powertrain production in California, thus reducing our cost basis in the related assets in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory Nevada—Nevada Tax Incentives In connection with the construction of Gigafactory Nevada, we entered into agreements with the State of Nevada and Storey County in Nevada that provide abatements for specified taxes, discounts to the base tariff energy rates and transferable tax credits in consideration of capital investment and hiring targets that were met at Gigafactory Nevada. These incentives are available until June 2024 or June 2034, depending on the incentive and primarily offset related costs in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory New York—New York State Investment and Lease We have a lease through the Research Foundation for the State University of New York (the “SUNY Foundation”) with respect to Gigafactory New York. Under the lease and a related research and development agreement, we are continuing to designate further buildouts at the facility. We are required to comply with certain covenants, including hiring and cumulative investment targets. This incentive offsets the related lease costs of the facility in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. As we temporarily suspended most of our manufacturing operations at Gigafactory New York pursuant to a New York State executive order issued in March 2020 as a result of the COVID-19 pandemic, we were granted a deferral of our obligation to be compliant with our applicable targets through December 31, 2021 in an amendment memorialized in August 2021. As of December 31, 2021, we are in excess of such targets relating to investments and personnel in the State of New York and Buffalo. Gigafactory Shanghai—Land Use Rights and Economic Benefits We have an agreement with the local government of Shanghai for land use rights at Gigafactory Shanghai. Under the terms of the arrangement, we are required to meet a cumulative capital expenditure target and an annual tax revenue target starting at the end of 2023. In addition, the Shanghai government has granted to our Gigafactory Shanghai subsidiary certain incentives to be used in connection with eligible capital investments at Gigafactory Shanghai.
显示 LlamaIndex 的最终 RAG 图像字幕结果¶
In [ ]
Copied!
print(rag_response)
print(rag_response)
The Gigafactory Shanghai in Shanghai, China is a large Tesla factory that produces electric vehicles for the global market. The factory has a white roof and is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. This scene gives an impression of a busy and well-organized facility.