Cleanlab 可信语言模型¶
本 Notebook 演示了如何使用 Cleanlab 的可信语言模型 (TLM) 和可信度得分。
TLM 是一种更可靠的 LLM,能够提供高质量的输出,并在不确定问题答案时进行提示,使其适用于那些不受控的幻觉是严重问题的应用场景。
可信度得分量化了您对响应质量的信心程度(值越高表示可信度越高)。这些得分结合了偶然不确定性 (aleatoric uncertainty) 和认知不确定性 (epistemic uncertainty) 的估计,以提供对可信度的整体衡量。
在 Cleanlab Studio 文档上阅读更多关于 TLM API 的信息。对于更高级的用法,请参考快速入门教程。
访问 https://cleanlab.ai 并注册以获取免费 API 密钥。
设置¶
如果您在 colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
%pip install llama-index-llms-cleanlab
%pip install llama-index
from llama_index.llms.cleanlab import CleanlabTLM
# set api key in env or in llm
# import os
# os.environ["CLEANLAB_API_KEY"] = "your api key"
llm = CleanlabTLM(api_key="your_api_key")
resp = llm.complete("Who is Paul Graham?")
print(resp)
Paul Graham is an American computer scientist, entrepreneur, and venture capitalist. He is best known as the co-founder of the startup accelerator Y Combinator, which has helped launch numerous successful companies including Dropbox, Airbnb, and Reddit. Graham is also a prolific writer and essayist, known for his insightful and thought-provoking essays on topics ranging from startups and entrepreneurship to technology and society. He has been influential in the tech industry and is highly regarded for his expertise and contributions to the startup ecosystem.
您还可以通过 additional_kwargs
获取上述响应的可信度得分。TLM 会自动计算所有 <prompt, response> 对的这个得分。
print(resp.additional_kwargs)
{'trustworthiness_score': 0.8659043183923533}
高分表明 LLM 的响应是可信的。我们来看另一个例子。
resp = llm.complete(
"What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)
print(resp)
The first automobile engine used in a commercial truck in the United States was the 1899 Winton Motor Carriage Company Model 10, which had a 2-cylinder engine with 20 horsepower.
print(resp.additional_kwargs)
{'trustworthiness_score': 0.5820799504369166}
低分表明 LLM 的响应不应被信任。
通过这两个简单的例子,我们可以观察到得分最高的 LLM 响应直接、准确且细节恰当。
另一方面,可信度得分低的 LLM 响应则传达了无用或事实不准确的答案,有时被称为幻觉。
流式传输¶
Cleanlab 的 TLM 本身不支持同时流式传输响应和可信度得分。但是,有一种可用的替代方法可以实现低延迟的流式响应,供您的应用程序使用。
有关此方法的详细信息以及示例代码,请参见此处。
TLM 的高级用法¶
TLM 可以通过以下选项进行配置:
- model:要使用的底层 LLM
- max_tokens:响应中要生成的最大 token 数
- num_candidate_responses:TLM 内部生成的替代候选响应数量
- num_consistency_samples:评估 LLM 响应一致性的内部采样量
- use_self_reflection:是否要求 LLM 对其生成的响应进行自我反思并自我评估该响应
- log:指定要返回的附加元数据。在此处包含“explanation”以获取对响应可信度得分较低的原因的解释
这些配置在初始化时作为字典传递给 CleanlabTLM
对象。
有关这些选项的更多详细信息,请参考Cleanlab 的 API 文档,并在此notebook中探讨了这些选项的一些用例。
让我们考虑一个应用程序需要使用 gpt-4
模型并生成 128
个输出 token 的例子。
options = {
"model": "gpt-4",
"max_tokens": 128,
}
llm = CleanlabTLM(api_key="your_api_key", options=options)
resp = llm.complete("Who is Paul Graham?")
print(resp)
Paul Graham is a British-born American computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for co-founding Viaweb, which was sold to Yahoo in 1998 for over $49 million and became Yahoo Store. He also co-founded the influential startup accelerator and seed capital firm Y Combinator, which has launched over 2,000 companies including Dropbox, Airbnb, Stripe, and Reddit. Graham is also known for his essays on startup companies and programming languages.
要了解 TLM 为之前与马力相关的问题估计的可信度较低的原因,请在初始化 TLM 时指定 "explanation"
标志。
options = {
"log": ["explanation"],
}
llm = CleanlabTLM(api_key="your_api_key", options=options)
resp = llm.complete(
"What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)
print(resp)
The first automobile engine used in a commercial truck in the United States was in the 1899 "Motor Truck" built by the American company, the "GMC Truck Company." This early truck was equipped with a 2-horsepower engine. However, it's important to note that the development of commercial trucks evolved rapidly, and later models featured significantly more powerful engines.
print(resp.additional_kwargs["explanation"])
The proposed answer incorrectly attributes the first commercial truck in the United States to the GMC Truck Company and states that it was built in 1899 with a 2-horsepower engine. In reality, the first commercial truck is generally recognized as the "Motor Truck" built by the American company, the "GMC Truck Company," but it was actually produced by the "GMC" brand, which was established later. The first commercial truck is often credited to the "Benz Velo" or similar early models, which had varying horsepower ratings. The specific claim of a 2-horsepower engine is also misleading, as early trucks typically had more powerful engines. Therefore, the answer contains inaccuracies regarding both the manufacturer and the specifications of the engine. This response is untrustworthy due to lack of consistency in possible responses from the model. Here's one inconsistent alternate response that the model considered (which may not be accurate either): The horsepower of the first automobile engine used in a commercial truck in the United States was 6 horsepower.