NVIDIA NIMs¶

llama-index-llms-nvidia 包包含了 LlamaIndex 的集成，用于使用 NVIDIA NIM 推理微服务上的模型构建应用。NIM 支持来自社区以及 NVIDIA 的各种领域的模型，如聊天、嵌入和重排模型。这些模型经过 NVIDIA 优化，可在 NVIDIA 加速基础设施上提供最佳性能，并作为 NIM 部署——NIM 是一种易于使用的预构建容器，只需一条命令即可在 NVIDIA 加速基础设施上的任何地方部署。

NVIDIA 托管的 NIM 部署可在 NVIDIA API 目录上进行测试。测试后，可以使用 NVIDIA AI Enterprise 许可将 NIM 从 NVIDIA API 目录导出，并在本地或云端运行，从而赋予企业对其 IP 和 AI 应用的所有权和完全控制权。

NIM 以每个模型为基础打包为容器镜像，并通过 NVIDIA NGC 目录作为 NGC 容器镜像分发。其核心是，NIM 为在 AI 模型上运行推理提供了简单、一致且熟悉的 API。

In [ ]

已复制！

!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-llms-nvidia
!pip install llama-index-embeddings-nvidia
!pip install llama-index-postprocessor-nvidia-rerank
!pip install llama-index-core !pip install llama-index-readers-file !pip install llama-index-llms-nvidia !pip install llama-index-embeddings-nvidia !pip install llama-index-postprocessor-nvidia-rerank

引入一个测试数据集，这是一个关于2021年旧金山住房建设的 PDF 文件。

In [ ]

已复制！

!mkdir data
!wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"
!mkdir data !wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"

--2024-05-28 17:42:44--  https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following]
--2024-05-28 17:42:45--  https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file
Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f
Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following]
--2024-05-28 17:42:45--  https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file
Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 4808625 (4.6M) [application/pdf]
Saving to: ‘data/housing_data.pdf’

data/housing_data.p 100%[===================>]   4.58M  8.26MB/s    in 0.6s    

2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]

设置¶

导入依赖项，并设置来自 API 目录 https://build.nvidia.com 的 NVIDIA API 密钥，用于目录中托管的两个模型（嵌入模型和重排模型）。

入门

在 NVIDIA 创建一个免费账户，该平台托管 NVIDIA AI Foundation 模型。
点击您选择的模型。
在 Input 下选择 Python 选项卡，然后点击 Get API Key。接着点击 Generate Key。
复制生成的密钥并保存为 NVIDIA_API_KEY。然后，您应该就可以访问端点了。

In [ ]

已复制！





from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from google.colab import userdata
import os

os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex from llama_index.embeddings.nvidia import NVIDIAEmbedding from llama_index.llms.nvidia import NVIDIA from llama_index.core.node_parser import SentenceSplitter from llama_index.core import Settings from google.colab import userdata import os os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")

让我们使用 NVIDIA 托管的 NIM 作为嵌入模型。

NVIDIA 的默认嵌入仅嵌入前 512 个 token，因此我们将分块大小设置为 500，以最大化嵌入的准确性。

In [ ]

已复制！

Settings.text_splitter = SentenceSplitter(chunk_size=500)

documents = SimpleDirectoryReader("./data").load_data()
Settings.text_splitter = SentenceSplitter(chunk_size=500) documents = SimpleDirectoryReader("./data").load_data()

我们将嵌入模型设置为 NVIDIA 的默认模型。如果一个分块超出了模型可以编码的 token 数量，默认会抛出错误，因此我们设置 truncate="END" 来丢弃超出限制的 token（希望由于我们上面设置的分块大小，不会丢弃太多）。

In [ ]

已复制！

Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")

index = VectorStoreIndex.from_documents(documents)
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END") index = VectorStoreIndex.from_documents(documents)

现在我们已经嵌入了数据并在内存中进行了索引，接下来我们将设置本地自托管的 LLM。按照此NIM 快速入门指南，只需 5 分钟即可使用 Docker 在本地托管 NIM。

下面，我们将展示如何：

将 Meta 的开源模型 meta/llama3-8b-instruct 用作本地 NIM，以及
将 NVIDIA 托管的 API 目录中的 meta/llama3-70b-instruct 用作 NIM。

如果您使用的是本地 NIM，请确保将 base_url 更改为您部署的 NIM URL！

我们将检索排名前 5 的最相关分块来回答我们的问题。

In [ ]

已复制！

# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below
# and comment the line using the API catalog
# Settings.llm = NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")

# api catalog NIM: if you're using a self-hosted NIM comment the line below
# and un-comment the line using local NIM above
Settings.llm = NVIDIA(model="meta/llama3-70b-instruct")

query_engine = index.as_query_engine(similarity_top_k=20)
# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below # and comment the line using the API catalog # Settings.llm = NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1") # api catalog NIM: if you're using a self-hosted NIM comment the line below # and un-comment the line using local NIM above Settings.llm = NVIDIA(model="meta/llama3-70b-instruct") query_engine = index.as_query_engine(similarity_top_k=20)

让我们问一个简单的问题，我们知道文档中有一个地方（第 18 页）回答了这个问题。

In [ ]

已复制！

response = query_engine.query(
    "How many new housing units were built in San Francisco in 2021?"
)
print(response)
response = query_engine.query( "How many new housing units were built in San Francisco in 2021?" ) print(response)

There was a net addition of 4,649 units to the City’s housing stock in 2021.

现在让我们问一个更复杂的问题，需要读取表格（在文档的第 41 页）

In [ ]

已复制！

response = query_engine.query(
    "What was the net gain in housing units in the Mission in 2021?"
)
print(response)
response = query_engine.query( "What was the net gain in housing units in the Mission in 2021?" ) print(response)

There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.

这不行！这是净新增数，不是我们想要的数字。让我们试试一个更高级的 PDF 解析器，LlamaParse

In [ ]

已复制！

!pip install llama-parse
!pip install llama-parse

In [ ]

已复制！





from llama_parse import LlamaParse

# in a notebook, LlamaParse requires this to work
import nest_asyncio

nest_asyncio.apply()

# you can get a key at cloud.llamaindex.ai
os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key")

# set up parser
parser = LlamaParse(
    result_type="markdown"  # "markdown" and "text" are available
)

# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents2 = SimpleDirectoryReader(
    "./data", file_extractor=file_extractor
).load_data()
from llama_parse import LlamaParse # in a notebook, LlamaParse requires this to work import nest_asyncio nest_asyncio.apply() # you can get a key at cloud.llamaindex.ai os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key") # set up parser parser = LlamaParse( result_type="markdown" # "markdown" and "text" are available ) # use SimpleDirectoryReader to parse our file file_extractor = {".pdf": parser} documents2 = SimpleDirectoryReader( "./data", file_extractor=file_extractor ).load_data()`

Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892

In [ ]

已复制！

index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine(similarity_top_k=20)
index2 = VectorStoreIndex.from_documents(documents2) query_engine2 = index2.as_query_engine(similarity_top_k=20)

In [ ]

已复制！

response = query_engine2.query(
    "What was the net gain in housing units in the Mission in 2021?"
)
print(response)
response = query_engine2.query( "What was the net gain in housing units in the Mission in 2021?" ) print(response)

The net gain in housing units in the Mission in 2021 was 1,305 units.

完美！有了更好的解析器，LLM 能够回答这个问题了。

现在让我们试试一个更棘手的问题

In [ ]

已复制！

response = query_engine2.query(
    "How many affordable housing units were completed in 2021?"
)
print(response)
response = query_engine2.query( "How many affordable housing units were completed in 2021?" ) print(response)

Repeat: 110

LLM 混淆了；这似乎是住房单位的百分比增长。

让我们尝试给 LLM 更多上下文（40 而不是 20），然后使用重排器对这些分块进行排序。我们将为此使用 NVIDIA 的重排器。

In [ ]

已复制！

from llama_index.postprocessor.nvidia_rerank import NVIDIARerank

query_engine3 = index2.as_query_engine(
    similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]
)
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank query_engine3 = index2.as_query_engine( similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)] )

In [ ]

已复制！

response = query_engine3.query(
    "How many affordable housing units were completed in 2021?"
)
print(response)
response = query_engine3.query( "How many affordable housing units were completed in 2021?" ) print(response)

1,495

太棒了！现在数字是正确的（如果你想知道的话，这在第 35 页）。