NVIDIA 的大型语言模型文本补全 API¶
扩展 NVIDIA 类以支持以下模型的 /completion API
- bigcode/starcoder2-7b
- bigcode/starcoder2-15b
安装¶
In [ ]
已复制!
!pip install --force-reinstall llama_index-llms-nvidia
!pip install --force-reinstall llama_index-llms-nvidia
In [ ]
已复制!
!which python
!which python
In [ ]
已复制!
import getpass
import os
# del os.environ['NVIDIA_API_KEY'] ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
assert nvapi_key.startswith(
"nvapi-"
), f"{nvapi_key[:5]}... is not a valid key"
os.environ["NVIDIA_API_KEY"] = nvapi_key
import getpass import os # del os.environ['NVIDIA_API_KEY'] ## delete key and reset if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"): print("Valid NVIDIA_API_KEY already in environment. Delete to reset") else: nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ") assert nvapi_key.startswith( "nvapi-" ), f"{nvapi_key[:5]}... is not a valid key" os.environ["NVIDIA_API_KEY"] = nvapi_key
In [ ]
已复制!
os.environ["NVIDIA_API_KEY"]
os.environ["NVIDIA_API_KEY"]
In [ ]
已复制!
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio import nest_asyncio nest_asyncio.apply()
In [ ]
已复制!
from llama_index.llms.nvidia import NVIDIA
llm = NVIDIA(model="bigcode/starcoder2-15b", use_chat_completions=False)
from llama_index.llms.nvidia import NVIDIA llm = NVIDIA(model="bigcode/starcoder2-15b", use_chat_completions=False)
可用模型¶
可以使用 is_chat_model
获取可用的文本补全模型
In [ ]
已复制!
print([model for model in llm.available_models if model.is_chat_model])
print([model for model in llm.available_models if model.is_chat_model])
使用 NVIDIA NIMs¶
除了连接到托管的 NVIDIA NIMs,此连接器还可用于连接到本地微服务实例。这有助于您在必要时将应用程序本地化。
有关如何设置本地微服务实例的说明,请参见 https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/
In [ ]
已复制!
from llama_index.llms.nvidia import NVIDIA
# connect to an chat NIM running at localhost:8080, spcecifying a specific model
llm = NVIDIA(base_url="http://localhost:8080/v1")
from llama_index.llms.nvidia import NVIDIA # connect to an chat NIM running at localhost:8080, spcecifying a specific model llm = NVIDIA(base_url="http://localhost:8080/v1")
In [ ]
已复制!
print(llm.complete("# Function that does quicksort:"))
print(llm.complete("# Function that does quicksort:"))
正如 LlamaIndex 所期望的,我们得到一个 CompletionResponse
作为响应。
异步补全:.acomplete()
¶
还有一个异步实现,也可以用同样的方式利用!
In [ ]
已复制!
await llm.acomplete("# Function that does quicksort:")
await llm.acomplete("# Function that does quicksort:")
流式处理¶
In [ ]
已复制!
x = llm.stream_complete(prompt="# Reverse string in python:", max_tokens=512)
x = llm.stream_complete(prompt="# Reverse string in python:", max_tokens=512)
In [ ]
已复制!
for t in x:
print(t.delta, end="")
for t in x: print(t.delta, end="")
异步流式处理¶
In [ ]
已复制!
x = await llm.astream_complete(
prompt="# Reverse program in python:", max_tokens=512
)
x = await llm.astream_complete( prompt="# Reverse program in python:", max_tokens=512 )
In [ ]
已复制!
async for t in x:
print(t.delta, end="")
async for t in x: print(t.delta, end="")