Hugging Face LLM¶
有多种方法可以与来自 Hugging Face 的 LLM 交互。Hugging Face 本身提供了几个 Python 包来实现访问,LlamaIndex 将其封装到 LLM
实体中。
transformers
包:使用llama_index.llms.HuggingFaceLLM
- Hugging Face 推理 API,由
huggingface_hub[inference]
封装:使用llama_index.llms.HuggingFaceInferenceAPI
这两者有许多可能的组合,因此本笔记本只详细介绍了其中的几种。让我们以 Hugging Face 的文本生成任务为例。
在下面的行中,我们将安装此演示所需的包
HuggingFaceLLM
需要transformers[torch]
HuggingFaceInferenceAPI
需要huggingface_hub[inference]
- Z shell (
zsh
) 需要引号
In [ ]
已复制!
%pip install llama-index-llms-huggingface
%pip install llama-index-llms-huggingface-api
%pip install llama-index-llms-huggingface %pip install llama-index-llms-huggingface-api
In [ ]
已复制!
!pip install "transformers[torch]" "huggingface_hub[inference]"
!pip install "transformers[torch]" "huggingface_hub[inference]"
现在我们已设置完毕,来试玩一下
如果您在 Colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
In [ ]
已复制!
import os
from typing import List, Optional
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
# SEE: https://hugging-face.cn/docs/hub/security-tokens
# We just need a token with read permissions for this demo
HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN")
# NOTE: None default will fall back on Hugging Face's token storage
# when this token gets used within HuggingFaceInferenceAPI
import os from typing import List, Optional from llama_index.llms.huggingface import HuggingFaceLLM from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI # 参阅:https://hugging-face.cn/docs/hub/security-tokens # 对于本演示,我们只需要一个具有读取权限的令牌 HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN") # 注意:默认 None 将回退到 Hugging Face 的令牌存储 # 当此令牌在 HuggingFaceInferenceAPI 中使用时
In [ ]
已复制!
# This uses https://hugging-face.cn/HuggingFaceH4/zephyr-7b-alpha
# downloaded (if first invocation) to the local Hugging Face model cache,
# and actually runs the model on your local machine's hardware
locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha")
# This will use the same model, but run remotely on Hugging Face's servers,
# accessed via the Hugging Face Inference API
# Note that using your token will not charge you money,
# the Inference API is free it just has rate limits
remotely_run = HuggingFaceInferenceAPI(
model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN
)
# Or you can skip providing a token, using Hugging Face Inference API anonymously
remotely_run_anon = HuggingFaceInferenceAPI(
model_name="HuggingFaceH4/zephyr-7b-alpha"
)
# If you don't provide a model_name to the HuggingFaceInferenceAPI,
# Hugging Face's recommended model gets used (thanks to huggingface_hub)
remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)
# 这使用了 https://hugging-face.cn/HuggingFaceH4/zephyr-7b-alpha # 下载到本地 Hugging Face 模型缓存(如果是首次调用), # 并实际在本地机器的硬件上运行模型 locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha") # 这将使用相同的模型,但在 Hugging Face 的服务器上远程运行, # 通过 Hugging Face 推理 API 访问 # 请注意,使用您的令牌不会收取费用, # 推理 API 是免费的,只是有速率限制 remotely_run = HuggingFaceInferenceAPI( model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN ) # 或者您可以跳过提供令牌,匿名使用 Hugging Face 推理 API remotely_run_anon = HuggingFaceInferenceAPI( model_name="HuggingFaceH4/zephyr-7b-alpha" ) # 如果您没有向 HuggingFaceInferenceAPI 提供 model_name, # 将使用 Hugging Face 推荐的模型(归功于 huggingface_hub) remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)
使用 HuggingFaceInferenceAPI
完成的基础是 Hugging Face 的文本生成任务。
In [ ]
已复制!
completion_response = remotely_run_recommended.complete("To infinity, and")
print(completion_response)
completion_response = remotely_run_recommended.complete("To infinity, and") print(completion_response)
beyond! The Infinity Wall Clock is a unique and stylish way to keep track of time. The clock is made of a durable, high-quality plastic and features a bright LED display. The Infinity Wall Clock is powered by batteries and can be mounted on any wall. It is a great addition to any home or office.
如果您正在修改 LLM,您也应该更改全局分词器以匹配!
In [ ]
已复制!
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode
)
from llama_index.core import set_global_tokenizer from transformers import AutoTokenizer set_global_tokenizer( AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode )
如果您好奇,其他已封装的 Hugging Face 推理 API 任务包括
llama_index.llms.HuggingFaceInferenceAPI.chat
:对话任务llama_index.embeddings.HuggingFaceInferenceAPIEmbedding
:特征提取任务
是的,Hugging Face 嵌入模型也支持
transformers[torch]
:由HuggingFaceEmbedding
封装huggingface_hub[inference]
:由HuggingFaceInferenceAPIEmbedding
封装
上述两者都继承自 llama_index.embeddings.base.BaseEmbedding
。
使用 Hugging Face text-generation-inference
¶
新的 TextGenerationInference
类允许与运行 text-generation-inference
(TGI) 的端点进行交互。除了极快的推理速度外,它还从版本 2.0.1
开始支持 tool
(工具)使用。
In [ ]
已复制!
%pip install llama-index-llms-text-generation-inference
%pip install llama-index-llms-text-generation-inference
要初始化 TextGenerationInference
的实例,您需要提供端点 URL(TGI 的自托管实例或使用 TGI 在 Hugging Face 上创建的公共推理端点)。如果是私有推理端点,则需要提供您的 HF 令牌(作为初始化参数或环境变量)。
In [ ]
已复制!
import os
from typing import List, Optional
from llama_index.llms.text_generation_inference import (
TextGenerationInference,
)
URL = "your_tgi_endpoint"
model = TextGenerationInference(
model_url=URL, token=False
) # set token to False in case of public endpoint
completion_response = model.complete("To infinity, and")
print(completion_response)
import os from typing import List, Optional from llama_index.llms.text_generation_inference import ( TextGenerationInference, ) URL = "your_tgi_endpoint" model = TextGenerationInference( model_url=URL, token=False ) # 对于公共端点,将 token 设置为 False completion_response = model.complete("To infinity, and") print(completion_response)
beyond! This phrase is a reference to the famous line from the movie "Toy Story" when Buzz Lightyear, a toy astronaut, exclaims "To infinity and beyond!" as he soars through space. It has since become a catchphrase for reaching for the stars and striving for greatness. However, if you meant to ask a mathematical question, "To infinity" refers to a very large, infinite number, and "and beyond" could be interpreted as continuing infinitely in a certain direction. For example, "2 to the power of infinity" would represent a very large, infinite number.
要在 TextGenerationInference
中使用工具,您可以使用现有工具或自定义工具
In [ ]
已复制!
from typing import List, Literal
from llama_index.core.bridge.pydantic import BaseModel, Field
from llama_index.core.tools import FunctionTool
from llama_index.core.base.llms.types import (
ChatMessage,
MessageRole,
)
def get_current_weather(location: str, format: str):
"""Get the current weather
Args:
location (str): The city and state, e.g. San Francisco, CA
format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location.
"""
...
class WeatherArgs(BaseModel):
location: str = Field(
description="The city and region, e.g. Paris, Ile-de-France"
)
format: Literal["fahrenheit", "celsius"] = Field(
description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.",
)
weather_tool = FunctionTool.from_defaults(
fn=get_current_weather,
name="get_current_weather",
description="Get the current weather",
fn_schema=WeatherArgs,
)
def get_current_weather_n_days(location: str, format: str, num_days: int):
"""Get the weather forecast for the next N days
Args:
location (str): The city and state, e.g. San Francisco, CA
format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location.
num_days (int): The number of days for the weather forecast.
"""
...
class ForecastArgs(BaseModel):
location: str = Field(
description="The city and region, e.g. Paris, Ile-de-France"
)
format: Literal["fahrenheit", "celsius"] = Field(
description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.",
)
num_days: int = Field(
description="The duration for the weather forecast in days.",
)
forecast_tool = FunctionTool.from_defaults(
fn=get_current_weather_n_days,
name="get_current_weather_n_days",
description="Get the current weather for n days",
fn_schema=ForecastArgs,
)
usr_msg = ChatMessage(
role=MessageRole.USER,
content="What's the weather like in Paris over next week?",
)
response = model.chat_with_tools(
user_msg=usr_msg,
tools=[
weather_tool,
forecast_tool,
],
tool_choice="get_current_weather_n_days",
)
print(response.message.additional_kwargs)
from typing import List, Literal from llama_index.core.bridge.pydantic import BaseModel, Field from llama_index.core.tools import FunctionTool from llama_index.core.base.llms.types import ( ChatMessage, MessageRole, ) def get_current_weather(location: str, format: str): """获取当前天气 Args: location (str): 城市和州,例如 San Francisco, CA format (str): 温度单位('celsius' 或 'fahrenheit')。从用户位置推断。 """ ... class WeatherArgs(BaseModel): location: str = Field( description="城市和地区,例如 Paris, Ile-de-France" ) format: Literal["fahrenheit", "celsius"] = Field( description="温度单位('fahrenheit' 或 'celsius')。从位置推断。", ) weather_tool = FunctionTool.from_defaults( fn=get_current_weather, name="get_current_weather", description="获取当前天气", fn_schema=WeatherArgs, ) def get_current_weather_n_days(location: str, format: str, num_days: int): """获取未来 N 天的天气预报 Args: location (str): 城市和州,例如 San Francisco, CA format (str): 温度单位('celsius' 或 'fahrenheit')。从用户位置推断。 num_days (int): 天气预报的天数。 """ ... class ForecastArgs(BaseModel): location: str = Field( description="城市和地区,例如 Paris, Ile-de-France" ) format: Literal["fahrenheit", "celsius"] = Field( description="温度单位('fahrenheit' 或 'celsius')。从位置推断。", ) num_days: int = Field( description="天气预报的持续天数。", ) forecast_tool = FunctionTool.from_defaults( fn=get_current_weather_n_days, name="get_current_weather_n_days", description="获取未来 n 天的天气", fn_schema=ForecastArgs, ) usr_msg = ChatMessage( role=MessageRole.USER, content="What's the weather like in Paris over next week?", ) response = model.chat_with_tools( user_msg=usr_msg, tools=[ weather_tool, forecast_tool, ], tool_choice="get_current_weather_n_days", ) print(response.message.additional_kwargs)
{'tool_calls': [{'id': 0, 'type': 'function', 'function': {'description': None, 'name': 'get_current_weather_n_days', 'arguments': {'format': 'celsius', 'location': 'Paris, Ile-de-France', 'num_days': 7}}}]}