RunGPT¶
RunGPT 是一个开源的云原生大规模多模态模型 (LMM) 服务框架。它旨在简化在分布式 GPU 集群上部署和管理大型语言模型。RunGPT 旨在成为一个一站式解决方案,提供一个集中且易于访问的地方来收集优化大规模多模态模型的技术,并使其易于供所有人使用。在 RunGPT 中,我们额外支持了许多 LLM,例如 LLaMA、Pythia、StableLM、Vicuna、MOSS,以及 MiniGPT-4 和 OpenFlamingo 等大型多模态模型 (LMM)。
设置¶
如果您在 Colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
输入 [ ]
已复制!
%pip install llama-index-llms-rungpt
%pip install llama-index-llms-rungpt
输入 [ ]
已复制!
!pip install llama-index
!pip install llama-index
您需要在您的 Python 环境中通过 pip install
安装 rungpt 包。
输入 [ ]
已复制!
!pip install rungpt
!pip install rungpt
成功安装后,可以通过一行命令部署 RunGPT 支持的模型。此选项将从开源平台下载目标语言模型,并将其部署为在 localhost 端口上的服务,可以通过 http 或 grpc 请求访问。我建议您不要在 Jupyter Notebook 中运行此命令,而是在命令行中运行。
输入 [ ]
已复制!
!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced
!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced
输入 [ ]
已复制!
from llama_index.llms.rungpt import RunGptLLM
llm = RunGptLLM()
promot = "What public transportation might be available in a city?"
response = llm.complete(promot)
from llama_index.llms.rungpt import RunGptLLM llm = RunGptLLM() promot = "What public transportation might be available in a city?" response = llm.complete(promot)
输入 [ ]
已复制!
print(response)
print(response)
I don't want to go to work, so what should I do? I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?
使用消息列表调用 chat
¶
输入 [ ]
已复制!
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(
role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
),
ChatMessage(
role=MessageRole.USER,
content="How many points determine a straight line?",
),
]
llm = RunGptLLM()
response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)
from llama_index.core.llms import ChatMessage, MessageRole from llama_index.llms.rungpt import RunGptLLM messages = [ ChatMessage( role=MessageRole.USER, content="Now, I want you to do some math for me.", ), ChatMessage( role=MessageRole.ASSISTANT, content="Sure, I would like to help you." ), ChatMessage( role=MessageRole.USER, content="How many points determine a straight line?", ), ] llm = RunGptLLM() response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)
输入 [ ]
已复制!
print(response)
print(response)
流式传输¶
使用 stream_complete
端点
输入 [ ]
已复制!
promot = "What public transportation might be available in a city?"
response = RunGptLLM().stream_complete(promot)
for item in response:
print(item.text)
promot = "What public transportation might be available in a city?" response = RunGptLLM().stream_complete(promot) for item in response: print(item.text)
使用 stream_chat
端点
输入 [ ]
已复制!
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(
role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
),
ChatMessage(
role=MessageRole.USER,
content="How many points determine a straight line?",
),
]
response = RunGptLLM().stream_chat(messages=messages)
from llama_index.llms.rungpt import RunGptLLM messages = [ ChatMessage( role=MessageRole.USER, content="Now, I want you to do some math for me.", ), ChatMessage( role=MessageRole.ASSISTANT, content="Sure, I would like to help you." ), ChatMessage( role=MessageRole.USER, content="How many points determine a straight line?", ), ] response = RunGptLLM().stream_chat(messages=messages)
输入 [ ]
已复制!
for item in response:
print(item.message)
for item in response: print(item.message)