Hugging Face LLM¶

有多种方法可以与来自 Hugging Face 的 LLM 交互。Hugging Face 本身提供了几个 Python 包来实现访问，LlamaIndex 将其封装到 LLM 实体中。

transformers 包：使用 llama_index.llms.HuggingFaceLLM
Hugging Face 推理 API，由 huggingface_hub[inference] 封装：使用 llama_index.llms.HuggingFaceInferenceAPI

这两者有许多可能的组合，因此本笔记本只详细介绍了其中的几种。让我们以 Hugging Face 的文本生成任务为例。

在下面的行中，我们将安装此演示所需的包

HuggingFaceLLM 需要 transformers[torch]
HuggingFaceInferenceAPI 需要 huggingface_hub[inference]
Z shell (zsh) 需要引号

In [ ]

已复制！

%pip install llama-index-llms-huggingface
%pip install llama-index-llms-huggingface-api
%pip install llama-index-llms-huggingface %pip install llama-index-llms-huggingface-api

In [ ]

已复制！

!pip install "transformers[torch]" "huggingface_hub[inference]"
!pip install "transformers[torch]" "huggingface_hub[inference]"

现在我们已设置完毕，来试玩一下

如果您在 Colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制！

!pip install llama-index
!pip install llama-index

In [ ]

已复制！





import os
from typing import List, Optional

from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

# SEE: https://hugging-face.cn/docs/hub/security-tokens
# We just need a token with read permissions for this demo
HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN")
# NOTE: None default will fall back on Hugging Face's token storage
# when this token gets used within HuggingFaceInferenceAPI
import os from typing import List, Optional from llama_index.llms.huggingface import HuggingFaceLLM from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI # 参阅：https://hugging-face.cn/docs/hub/security-tokens # 对于本演示，我们只需要一个具有读取权限的令牌 HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN") # 注意：默认 None 将回退到 Hugging Face 的令牌存储 # 当此令牌在 HuggingFaceInferenceAPI 中使用时

In [ ]

已复制！





# This uses https://hugging-face.cn/HuggingFaceH4/zephyr-7b-alpha
# downloaded (if first invocation) to the local Hugging Face model cache,
# and actually runs the model on your local machine's hardware
locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha")

# This will use the same model, but run remotely on Hugging Face's servers,
# accessed via the Hugging Face Inference API
# Note that using your token will not charge you money,
# the Inference API is free it just has rate limits
remotely_run = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN
)

# Or you can skip providing a token, using Hugging Face Inference API anonymously
remotely_run_anon = HuggingFaceInferenceAPI(
    model_name="HuggingFaceH4/zephyr-7b-alpha"
)

# If you don't provide a model_name to the HuggingFaceInferenceAPI,
# Hugging Face's recommended model gets used (thanks to huggingface_hub)
remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)
# 这使用了 https://hugging-face.cn/HuggingFaceH4/zephyr-7b-alpha # 下载到本地 Hugging Face 模型缓存（如果是首次调用）， # 并实际在本地机器的硬件上运行模型 locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha") # 这将使用相同的模型，但在 Hugging Face 的服务器上远程运行， # 通过 Hugging Face 推理 API 访问 # 请注意，使用您的令牌不会收取费用， # 推理 API 是免费的，只是有速率限制 remotely_run = HuggingFaceInferenceAPI( model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN ) # 或者您可以跳过提供令牌，匿名使用 Hugging Face 推理 API remotely_run_anon = HuggingFaceInferenceAPI( model_name="HuggingFaceH4/zephyr-7b-alpha" ) # 如果您没有向 HuggingFaceInferenceAPI 提供 model_name， # 将使用 Hugging Face 推荐的模型（归功于 huggingface_hub） remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)

使用 HuggingFaceInferenceAPI 完成的基础是 Hugging Face 的文本生成任务。

In [ ]

已复制！

completion_response = remotely_run_recommended.complete("To infinity, and")
print(completion_response)
completion_response = remotely_run_recommended.complete("To infinity, and") print(completion_response)

 beyond!
The Infinity Wall Clock is a unique and stylish way to keep track of time. The clock is made of a durable, high-quality plastic and features a bright LED display. The Infinity Wall Clock is powered by batteries and can be mounted on any wall. It is a great addition to any home or office.

如果您正在修改 LLM，您也应该更改全局分词器以匹配！

In [ ]

已复制！

from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode
)
from llama_index.core import set_global_tokenizer from transformers import AutoTokenizer set_global_tokenizer( AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode )

如果您好奇，其他已封装的 Hugging Face 推理 API 任务包括

llama_index.llms.HuggingFaceInferenceAPI.chat：对话任务
llama_index.embeddings.HuggingFaceInferenceAPIEmbedding：特征提取任务

是的，Hugging Face 嵌入模型也支持

transformers[torch]：由 HuggingFaceEmbedding 封装
huggingface_hub[inference]：由 HuggingFaceInferenceAPIEmbedding 封装

上述两者都继承自 llama_index.embeddings.base.BaseEmbedding。

使用 Hugging Face `text-generation-inference`¶

新的 TextGenerationInference 类允许与运行 text-generation-inference (TGI) 的端点进行交互。除了极快的推理速度外，它还从版本 2.0.1 开始支持 tool（工具）使用。

In [ ]

已复制！

%pip install llama-index-llms-text-generation-inference
%pip install llama-index-llms-text-generation-inference

要初始化 TextGenerationInference 的实例，您需要提供端点 URL（TGI 的自托管实例或使用 TGI 在 Hugging Face 上创建的公共推理端点）。如果是私有推理端点，则需要提供您的 HF 令牌（作为初始化参数或环境变量）。

In [ ]

已复制！





import os
from typing import List, Optional

from llama_index.llms.text_generation_inference import (
    TextGenerationInference,
)

URL = "your_tgi_endpoint"
model = TextGenerationInference(
    model_url=URL, token=False
)  # set token to False in case of public endpoint

completion_response = model.complete("To infinity, and")
print(completion_response)
import os from typing import List, Optional from llama_index.llms.text_generation_inference import ( TextGenerationInference, ) URL = "your_tgi_endpoint" model = TextGenerationInference( model_url=URL, token=False ) # 对于公共端点，将 token 设置为 False completion_response = model.complete("To infinity, and") print(completion_response)

 beyond! This phrase is a reference to the famous line from the movie "Toy Story" when Buzz Lightyear, a toy astronaut, exclaims "To infinity and beyond!" as he soars through space. It has since become a catchphrase for reaching for the stars and striving for greatness. However, if you meant to ask a mathematical question, "To infinity" refers to a very large, infinite number, and "and beyond" could be interpreted as continuing infinitely in a certain direction. For example, "2 to the power of infinity" would represent a very large, infinite number.

要在 TextGenerationInference 中使用工具，您可以使用现有工具或自定义工具

In [ ]

已复制！





from typing import List, Literal
from llama_index.core.bridge.pydantic import BaseModel, Field
from llama_index.core.tools import FunctionTool
from llama_index.core.base.llms.types import (
    ChatMessage,
    MessageRole,
)


def get_current_weather(location: str, format: str):
    """Get the current weather

    Args:
    location (str): The city and state, e.g. San Francisco, CA
    format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location.
    """
    ...


class WeatherArgs(BaseModel):
    location: str = Field(
        description="The city and region, e.g. Paris, Ile-de-France"
    )
    format: Literal["fahrenheit", "celsius"] = Field(
        description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.",
    )


weather_tool = FunctionTool.from_defaults(
    fn=get_current_weather,
    name="get_current_weather",
    description="Get the current weather",
    fn_schema=WeatherArgs,
)


def get_current_weather_n_days(location: str, format: str, num_days: int):
    """Get the weather forecast for the next N days

    Args:
    location (str): The city and state, e.g. San Francisco, CA
    format (str): The temperature unit to use ('celsius' or 'fahrenheit'). Infer this from the users location.
    num_days (int): The number of days for the weather forecast.
    """
    ...


class ForecastArgs(BaseModel):
    location: str = Field(
        description="The city and region, e.g. Paris, Ile-de-France"
    )
    format: Literal["fahrenheit", "celsius"] = Field(
        description="The temperature unit to use ('fahrenheit' or 'celsius'). Infer this from the location.",
    )
    num_days: int = Field(
        description="The duration for the weather forecast in days.",
    )


forecast_tool = FunctionTool.from_defaults(
    fn=get_current_weather_n_days,
    name="get_current_weather_n_days",
    description="Get the current weather for n days",
    fn_schema=ForecastArgs,
)

usr_msg = ChatMessage(
    role=MessageRole.USER,
    content="What's the weather like in Paris over next week?",
)

response = model.chat_with_tools(
    user_msg=usr_msg,
    tools=[
        weather_tool,
        forecast_tool,
    ],
    tool_choice="get_current_weather_n_days",
)

print(response.message.additional_kwargs)
from typing import List, Literal from llama_index.core.bridge.pydantic import BaseModel, Field from llama_index.core.tools import FunctionTool from llama_index.core.base.llms.types import ( ChatMessage, MessageRole, ) def get_current_weather(location: str, format: str): """获取当前天气 Args: location (str): 城市和州，例如 San Francisco, CA format (str): 温度单位（'celsius' 或 'fahrenheit'）。从用户位置推断。 """ ... class WeatherArgs(BaseModel): location: str = Field( description="城市和地区，例如 Paris, Ile-de-France" ) format: Literal["fahrenheit", "celsius"] = Field( description="温度单位（'fahrenheit' 或 'celsius'）。从位置推断。", ) weather_tool = FunctionTool.from_defaults( fn=get_current_weather, name="get_current_weather", description="获取当前天气", fn_schema=WeatherArgs, ) def get_current_weather_n_days(location: str, format: str, num_days: int): """获取未来 N 天的天气预报 Args: location (str): 城市和州，例如 San Francisco, CA format (str): 温度单位（'celsius' 或 'fahrenheit'）。从用户位置推断。 num_days (int): 天气预报的天数。 """ ... class ForecastArgs(BaseModel): location: str = Field( description="城市和地区，例如 Paris, Ile-de-France" ) format: Literal["fahrenheit", "celsius"] = Field( description="温度单位（'fahrenheit' 或 'celsius'）。从位置推断。", ) num_days: int = Field( description="天气预报的持续天数。", ) forecast_tool = FunctionTool.from_defaults( fn=get_current_weather_n_days, name="get_current_weather_n_days", description="获取未来 n 天的天气", fn_schema=ForecastArgs, ) usr_msg = ChatMessage( role=MessageRole.USER, content="What's the weather like in Paris over next week?", ) response = model.chat_with_tools( user_msg=usr_msg, tools=[ weather_tool, forecast_tool, ], tool_choice="get_current_weather_n_days", ) print(response.message.additional_kwargs)

{'tool_calls': [{'id': 0, 'type': 'function', 'function': {'description': None, 'name': 'get_current_weather_n_days', 'arguments': {'format': 'celsius', 'location': 'Paris, Ile-de-France', 'num_days': 7}}}]}

Hugging Face LLM¶

使用 Hugging Face text-generation-inference¶

使用 Hugging Face `text-generation-inference`¶