使用 PostHog 和 Langfuse 分析和调试 LlamaIndex 应用程序¶

在本指南中，我们将向您展示如何使用 LlamaIndex 构建一个 RAG 应用程序，使用 Langfuse 观察各个步骤，并在 PostHog 中分析数据。

什么是 Langfuse？¶

Langfuse 是一个开源的 LLM 工程平台，旨在帮助工程师理解和优化用户与语言模型应用程序的交互。它提供了用于跟踪、调试和改进 LLM 在实际用例中性能的工具。Langfuse 既提供托管的云解决方案，也支持本地或自托管部署。

什么是 PostHog？¶

PostHog 是产品分析的热门选择。将 Langfuse 的 LLM 分析与 PostHog 的产品分析相结合，可以轻松实现：

分析用户参与度：确定用户与特定 LLM 功能交互的频率，并了解他们的整体活动模式。
关联反馈与行为：查看 Langfuse 中捕获的用户反馈与 PostHog 中用户行为的关联程度。
监控 LLM 性能：跟踪和分析模型成本、延迟和用户反馈等指标，以优化 LLM 性能。

什么是 LlamaIndex？¶

LlamaIndex (GitHub) 是一个数据框架，旨在将 LLM 与外部数据源连接起来。它有助于有效地构建、索引和查询数据。这使得开发者更容易构建高级 LLM 应用程序。

如何使用 LlamaIndex 和 Mistral 构建一个简单的 RAG 应用程序¶

在本教程中，我们将演示如何创建一个聊天应用程序，该应用程序可以回答有关刺猬护理的问题。我们使用 LlamaIndex 和 Mistral 8x22B 模型对刺猬护理指南进行向量化。然后，使用 Langfuse 的 LlamaIndex 集成跟踪所有模型生成过程。

最后，PostHog 集成允许您直接在 PostHog 中查看有关刺猬应用程序的详细分析数据。

步骤 1：设置 LlamaIndex 和 Mistral¶

首先，我们将 Mistral API 密钥设置为环境变量。如果您还没有账号，请注册 Mistral 账户。然后订阅免费试用或付费计划，之后您就可以生成 API 密钥了（💡 您可以使用 LlamaIndex 支持的任何其他模型；我们在本指南中仅使用 Mistral）。

然后，我们使用 LlamaIndex 初始化 Mistral 语言模型和嵌入模型。接着，我们将这些模型设置到 LlamaIndex 的 Settings 对象中。

In [ ]

已复制！

%pip install llama-index llama-index-llms-mistralai llama-index-embeddings-mistralai nest_asyncio --upgrade
%pip install llama-index llama-index-llms-mistralai llama-index-embeddings-mistralai nest_asyncio --upgrade

In [ ]

已复制！





# Set the Mistral API key
import os

os.environ["MISTRAL_API_KEY"] = "***Your-Mistral-API-Key***"

# Ensures that sync and async code can be used together without issues
import nest_asyncio

nest_asyncio.apply()

# Import and set up llama index
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

# Define your LLM and embedding model
llm = MistralAI(model="open-mixtral-8x22b", temperature=0.1)
embed_model = MistralAIEmbedding(model_name="mistral-embed")

# Set the LLM and embedding model in the Settings object
Settings.llm = llm
Settings.embed_model = embed_model
# Set the Mistral API key import os os.environ["MISTRAL_API_KEY"] = "***Your-Mistral-API-Key***" # Ensures that sync and async code can be used together without issues import nest_asyncio nest_asyncio.apply() # Import and set up llama index from llama_index.llms.mistralai import MistralAI from llama_index.embeddings.mistralai import MistralAIEmbedding from llama_index.core import Settings # Define your LLM and embedding model llm = MistralAI(model="open-mixtral-8x22b", temperature=0.1) embed_model = MistralAIEmbedding(model_name="mistral-embed") # Set the LLM and embedding model in the Settings object Settings.llm = llm Settings.embed_model = embed_model

步骤 2：初始化 Langfuse¶

接下来，我们初始化 Langfuse 客户端。如果您还没有账号，请注册 Langfuse。从您的项目设置中复制您的 API 密钥，并将其添加到您的环境中。

In [ ]

已复制！

%pip install langfuse
%pip install langfuse

In [ ]

已复制！

import os

# get keys for your project from https://cloud.langfuse.com
LANGFUSE_SECRET_KEY = "sk-lf-..."
LANGFUSE_PUBLIC_KEY = "pk-lf-..."
LANGFUSE_HOST = "https://cloud.langfuse.com"  # 🇪🇺 EU region
# LANGFUSE_HOST="https://us.cloud.langfuse.com" # 🇺🇸 US region
import os # get keys for your project from https://cloud.langfuse.com LANGFUSE_SECRET_KEY = "sk-lf-..." LANGFUSE_PUBLIC_KEY = "pk-lf-..." LANGFUSE_HOST = "https://cloud.langfuse.com" # 🇪🇺 EU region # LANGFUSE_HOST="https://us.cloud.langfuse.com" # 🇺🇸 US region

In [ ]

已复制！

from langfuse import Langfuse

langfuse = Langfuse()
from langfuse import Langfuse langfuse = Langfuse()

最后，我们在应用程序的根目录中将 Langfuse 的 LlamaIndexCallbackHandler 注册到 LlamaIndex 的 Settings.callback_manager 中。

如需了解更多关于 Langfuse 的 LlamaIndex 集成，请参阅此处。

In [ ]

已复制！

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from langfuse.llama_index import LlamaIndexCallbackHandler

langfuse_callback_handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([langfuse_callback_handler])
from llama_index.core import Settings from llama_index.core.callbacks import CallbackManager from langfuse.llama_index import LlamaIndexCallbackHandler langfuse_callback_handler = LlamaIndexCallbackHandler() Settings.callback_manager = CallbackManager([langfuse_callback_handler])

步骤 3：下载数据¶

我们下载用于 RAG 的文件。在本例中，我们使用刺猬护理指南的 PDF 文件，以便让语言模型能够回答有关刺猬护理的问题 🦔。

In [ ]

已复制！

!wget 'https://www.pro-igel.de/downloads/merkblaetter_engl/wildtier_engl.pdf' -O './hedgehog.pdf'
!wget 'https://www.pro-igel.de/downloads/merkblaetter_engl/wildtier_engl.pdf' -O './hedgehog.pdf'

--2024-09-20 13:16:39--  https://www.pro-igel.de/downloads/merkblaetter_engl/wildtier_engl.pdf
Resolving www.pro-igel.de (www.pro-igel.de)... 152.53.23.200
Connecting to www.pro-igel.de (www.pro-igel.de)|152.53.23.200|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1160174 (1.1M) [application/pdf]
Saving to: ‘./hedgehog.pdf’

./hedgehog.pdf      100%[===================>]   1.11M  2.03MB/s    in 0.5s    

2024-09-20 13:16:40 (2.03 MB/s) - ‘./hedgehog.pdf’ saved [1160174/1160174]

接着，我们使用 LlamaIndex 的 SimpleDirectoryReader 加载 PDF 文件。

In [ ]

已复制！

from llama_index.core import SimpleDirectoryReader

hedgehog_docs = SimpleDirectoryReader(
    input_files=["./hedgehog.pdf"]
).load_data()
from llama_index.core import SimpleDirectoryReader hedgehog_docs = SimpleDirectoryReader( input_files=["./hedgehog.pdf"] ).load_data()

步骤 4：构建基于刺猬文档的 RAG¶

接下来，我们使用 VectorStoreIndex 创建刺猬文档的向量嵌入，然后将其转换为可查询引擎，以便根据查询检索信息。

In [ ]

已复制！

from llama_index.core import VectorStoreIndex

hedgehog_index = VectorStoreIndex.from_documents(hedgehog_docs)
hedgehog_query_engine = hedgehog_index.as_query_engine(similarity_top_k=5)
from llama_index.core import VectorStoreIndex hedgehog_index = VectorStoreIndex.from_documents(hedgehog_docs) hedgehog_query_engine = hedgehog_index.as_query_engine(similarity_top_k=5)

最后，为了将所有内容整合在一起，我们查询引擎并打印响应。

In [ ]

已复制！

response = hedgehog_query_engine.query("Which hedgehogs require help?")
print(response)
response = hedgehog_query_engine.query("哪些刺猬需要帮助？") print(response)

Hedgehogs that require help are those that are sick, injured, and helpless, such as orphaned hoglets. These hedgehogs in need may be temporarily taken into human care and must be released into the wild as soon as they can survive there independently.

LLM 链中的所有步骤现在都在 Langfuse 中被跟踪。

Langfuse 中的示例跟踪：https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/367db23d-5b03-446b-bc73-36e289596c00

Example trace in the Langfuse UI

步骤 5：（可选）实现用户反馈以查看应用程序性能¶

为了监控您的刺猬聊天应用程序的质量，您可以使用 Langfuse 评分来存储用户反馈（例如，点赞/点踩或评论）。然后可以在 PostHog 中分析这些评分。

评分用于评估单个观察结果或整个跟踪。您可以通过 Langfuse UI 中的标注工作流创建它们，运行基于模型的评估，或者像我们在本例中一样通过 SDK 导入。

要获取当前观察结果的上下文，我们使用 observe() 装饰器并将其应用于 hedgehog_helper() 函数。

In [ ]

已复制！





from langfuse.decorators import langfuse_context, observe


# Langfuse observe() decorator to automatically create a trace for the top-level function and spans for any nested functions.
@observe()
def hedgehog_helper(user_message):
    response = hedgehog_query_engine.query(user_message)
    trace_id = langfuse_context.get_current_trace_id()

    print(response)

    return trace_id


trace_id = hedgehog_helper("Can I keep the hedgehog as a pet?")

# Score the trace, e.g. to add user feedback using the trace_id
langfuse.score(
    trace_id=trace_id,
    name="user-explicit-feedback",
    value=0.9,
    data_type="NUMERIC",  # optional, inferred if not provided
    comment="Good to know!",  # optional
)
from langfuse.decorators import langfuse_context, observe # Langfuse observe() decorator to automatically create a trace for the top-level function and spans for any nested functions. @observe() def hedgehog_helper(user_message): response = hedgehog_query_engine.query(user_message) trace_id = langfuse_context.get_current_trace_id() print(response) return trace_id trace_id = hedgehog_helper("我可以把刺猬当作宠物养吗？") # Score the trace, e.g. to add user feedback using the trace_id langfuse.score( trace_id=trace_id, name="user-explicit-feedback", value=0.9, data_type="NUMERIC", # optional, inferred if not provided comment="Good to know!", # optional )

Based on the provided context, it is not recommended to keep wild hedgehogs as pets. The Federal Nature Conservation Act protects hedgehogs as a native mammal species, making it illegal to chase, catch, injure, kill, or take their nesting and refuge places. Exceptions apply only to sick, injured, and helpless hedgehogs, which may be temporarily taken into human care and released into the wild as soon as they can survive independently. It is important to respect the natural habitats and behaviors of wild animals, including hedgehogs.

Out [ ]

<langfuse.client.StatefulClient at 0x7c7cd656e2f0>

步骤 6：在 PostHog 中查看您的数据¶

最后，我们将 PostHog 连接到我们的 Langfuse 账户。以下是操作步骤摘要（或参阅文档了解完整详情）：

如果您还没有 PostHog 账户，请注册免费账户。
从您的项目设置中复制您的项目 API 密钥和主机。
在您的 Langfuse 控制面板中，点击设置并向下滚动到集成部分，找到 PostHog 集成。
点击配置，然后粘贴您的 PostHog 主机和项目 API 密钥（您可以在您的 PostHog 项目设置中找到这些信息）。
点击启用，然后点击保存。

然后，Langfuse 将开始每天一次将您的数据导出到 PostHog。

使用 Langfuse 控制面板模板

安装集成后，控制面板模板可以帮助您快速设置相关的洞察分析。

对于我们的刺猬聊天应用程序，我们正在使用下面所示的模板控制面板。这使您能够在 PostHog 中分析模型成本、用户反馈和延迟。

从模板创建您自己的控制面板

转到 PostHog 中的控制面板标签页。
点击右上角的 新建控制面板 按钮。
从模板列表中选择 LLM 指标 – Langfuse。

Posthog Dashboard showing user feedback and number of traces

Posthog Dashboard showing latency and costs

反馈¶

如果您有任何反馈或请求，请在 GitHub 上创建 Issue，或在 Discord 上与社区分享您的想法。