NVIDIA NIMs¶
llama-index-llms-nvidia
包包含了 LlamaIndex 的集成,用于使用 NVIDIA NIM 推理微服务上的模型构建应用。NIM 支持来自社区以及 NVIDIA 的各种领域的模型,如聊天、嵌入和重排模型。这些模型经过 NVIDIA 优化,可在 NVIDIA 加速基础设施上提供最佳性能,并作为 NIM 部署——NIM 是一种易于使用的预构建容器,只需一条命令即可在 NVIDIA 加速基础设施上的任何地方部署。
NVIDIA 托管的 NIM 部署可在 NVIDIA API 目录上进行测试。测试后,可以使用 NVIDIA AI Enterprise 许可将 NIM 从 NVIDIA API 目录导出,并在本地或云端运行,从而赋予企业对其 IP 和 AI 应用的所有权和完全控制权。
NIM 以每个模型为基础打包为容器镜像,并通过 NVIDIA NGC 目录作为 NGC 容器镜像分发。其核心是,NIM 为在 AI 模型上运行推理提供了简单、一致且熟悉的 API。
!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-llms-nvidia
!pip install llama-index-embeddings-nvidia
!pip install llama-index-postprocessor-nvidia-rerank
引入一个测试数据集,这是一个关于2021年旧金山住房建设的 PDF 文件。
!mkdir data
!wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"
--2024-05-28 17:42:44-- https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 4808625 (4.6M) [application/pdf] Saving to: ‘data/housing_data.pdf’ data/housing_data.p 100%[===================>] 4.58M 8.26MB/s in 0.6s 2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]
设置¶
导入依赖项,并设置来自 API 目录 https://build.nvidia.com 的 NVIDIA API 密钥,用于目录中托管的两个模型(嵌入模型和重排模型)。
入门
在 NVIDIA 创建一个免费账户,该平台托管 NVIDIA AI Foundation 模型。
点击您选择的模型。
在 Input 下选择 Python 选项卡,然后点击
Get API Key
。接着点击Generate Key
。复制生成的密钥并保存为 NVIDIA_API_KEY。然后,您应该就可以访问端点了。
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from google.colab import userdata
import os
os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")
让我们使用 NVIDIA 托管的 NIM 作为嵌入模型。
NVIDIA 的默认嵌入仅嵌入前 512 个 token,因此我们将分块大小设置为 500,以最大化嵌入的准确性。
Settings.text_splitter = SentenceSplitter(chunk_size=500)
documents = SimpleDirectoryReader("./data").load_data()
我们将嵌入模型设置为 NVIDIA 的默认模型。如果一个分块超出了模型可以编码的 token 数量,默认会抛出错误,因此我们设置 truncate="END"
来丢弃超出限制的 token(希望由于我们上面设置的分块大小,不会丢弃太多)。
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
index = VectorStoreIndex.from_documents(documents)
现在我们已经嵌入了数据并在内存中进行了索引,接下来我们将设置本地自托管的 LLM。按照此NIM 快速入门指南,只需 5 分钟即可使用 Docker 在本地托管 NIM。
下面,我们将展示如何:
- 将 Meta 的开源模型
meta/llama3-8b-instruct
用作本地 NIM,以及 - 将 NVIDIA 托管的 API 目录中的
meta/llama3-70b-instruct
用作 NIM。
如果您使用的是本地 NIM,请确保将 base_url
更改为您部署的 NIM URL!
我们将检索排名前 5 的最相关分块来回答我们的问题。
# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below
# and comment the line using the API catalog
# Settings.llm = NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")
# api catalog NIM: if you're using a self-hosted NIM comment the line below
# and un-comment the line using local NIM above
Settings.llm = NVIDIA(model="meta/llama3-70b-instruct")
query_engine = index.as_query_engine(similarity_top_k=20)
让我们问一个简单的问题,我们知道文档中有一个地方(第 18 页)回答了这个问题。
response = query_engine.query(
"How many new housing units were built in San Francisco in 2021?"
)
print(response)
There was a net addition of 4,649 units to the City’s housing stock in 2021.
现在让我们问一个更复杂的问题,需要读取表格(在文档的第 41 页)
response = query_engine.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.
这不行!这是净新增数,不是我们想要的数字。让我们试试一个更高级的 PDF 解析器,LlamaParse
!pip install llama-parse
from llama_parse import LlamaParse
# in a notebook, LlamaParse requires this to work
import nest_asyncio
nest_asyncio.apply()
# you can get a key at cloud.llamaindex.ai
os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key")
# set up parser
parser = LlamaParse(
result_type="markdown" # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents2 = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine(similarity_top_k=20)
response = query_engine2.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
The net gain in housing units in the Mission in 2021 was 1,305 units.
完美!有了更好的解析器,LLM 能够回答这个问题了。
现在让我们试试一个更棘手的问题
response = query_engine2.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
Repeat: 110
LLM 混淆了;这似乎是住房单位的百分比增长。
让我们尝试给 LLM 更多上下文(40 而不是 20),然后使用重排器对这些分块进行排序。我们将为此使用 NVIDIA 的重排器。
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
query_engine3 = index2.as_query_engine(
similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]
)
response = query_engine3.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
1,495
太棒了!现在数字是正确的(如果你想知道的话,这在第 35 页)。