OpenVINO GenAI LLMs¶
OpenVINO™ 是一个用于优化和部署 AI 推理的开源工具包。OpenVINO™ Runtime 可以在各种硬件设备上运行同一优化模型。加速您在语言 + LLMs、计算机视觉、自动语音识别等用例中的深度学习性能。
OpenVINOGenAILLM
是 OpenVINO-GenAI API 的包装器。通过 LlamaIndex 包装的此实体可以在本地运行 OpenVINO 模型
在下面的行中,我们安装此演示所需的软件包
输入 [ ]
已复制!
%pip install llama-index-llms-openvino-genai
%pip install llama-index-llms-openvino-genai
输入 [ ]
已复制!
%pip install optimum[openvino]
%pip install optimum[openvino]
现在我们已经设置好了,让我们来试玩一下
如果您在 Colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
输入 [ ]
已复制!
!pip install llama-index
!pip install llama-index
输入 [ ]
已复制!
from llama_index.llms.openvino_genai import OpenVINOGenAILLM
from llama_index.llms.openvino_genai import OpenVINOGenAILLM
/home2/ethan/intel/llama_index/llama_test/lib/python3.10/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_path" in OpenVINOGenAILLM has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn(
输入 [ ]
已复制!
!optimum-cli export openvino --model microsoft/Phi-3-mini-4k-instruct --task text-generation-with-past --weight-format int4 model_path
!optimum-cli export openvino --model microsoft/Phi-3-mini-4k-instruct --task text-generation-with-past --weight-format int4 model_path
您可以从 Hugging Face 的 OpenVINO 模型中心下载优化的 IR 模型。
输入 [ ]
已复制!
import huggingface_hub as hf_hub
model_id = "OpenVINO/Phi-3-mini-4k-instruct-int4-ov"
model_path = "Phi-3-mini-4k-instruct-int4-ov"
hf_hub.snapshot_download(model_id, local_dir=model_path)
import huggingface_hub as hf_hub model_id = "OpenVINO/Phi-3-mini-4k-instruct-int4-ov" model_path = "Phi-3-mini-4k-instruct-int4-ov" hf_hub.snapshot_download(model_id, local_dir=model_path)
Fetching 17 files: 0%| | 0/17 [00:00<?, ?it/s]
输出 [ ]
'/home2/ethan/intel/llama_index/docs/docs/examples/llm/Phi-3-mini-4k-instruct-int4-ov'
输入 [ ]
已复制!
ov_llm = OpenVINOGenAILLM(
model_path=model_path,
device="CPU",
)
ov_llm = OpenVINOGenAILLM( model_path=model_path, device="CPU", )
您可以通过 ov_llm.config
传递生成配置参数。支持的参数列在 openvino_genai.GenerationConfig 中。
输入 [ ]
已复制!
ov_llm.config.max_new_tokens = 100
ov_llm.config.max_new_tokens = 100
输入 [ ]
已复制!
response = ov_llm.complete("What is the meaning of life?")
print(str(response))
response = ov_llm.complete("What is the meaning of life?") print(str(response))
# Answer The meaning of life is a profound and complex question that has been debated by philosophers, theologians, scientists, and thinkers throughout history. Different cultures, religions, and individuals have their own interpretations and beliefs about what gives life purpose and significance. From a philosophical standpoint, existentialists like Jean-Paul Sartre and Albert Camus have argued that life inherently has no meaning, and it is
流式传输¶
使用 stream_complete
端点
输入 [ ]
已复制!
response = ov_llm.stream_complete("Who is Paul Graham?")
for r in response:
print(r.delta, end="")
response = ov_llm.stream_complete("Who is Paul Graham?") for r in response: print(r.delta, end="")
Paul Graham is a computer scientist and entrepreneur who is best known for founding the startup accelerator program Y Combinator. He is also the founder of the web development company Viaweb, which was acquired by PayPal for $497 million in 1raneworks. What is Y Combinator? Y Combinator is a startup accelerator program that provides funding, mentorship, and resources to early-stage start
使用 stream_chat
端点
输入 [ ]
已复制!
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = ov_llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
from llama_index.core.llms import ChatMessage messages = [ ChatMessage( role="system", content="You are a pirate with a colorful personality" ), ChatMessage(role="user", content="What is your name"), ] resp = ov_llm.stream_chat(messages) for r in resp: print(r.delta, end="")
I'm Phi, Microsoft's AI assistant. How can I assist you today?