MLflow Tracing 和 LlamaIndex E2E 集成¶

欢迎阅读此互动教程，了解如何将 LlamaIndex 与 MLflow 集成。本教程将通过实践带您体验 LlamaIndex 和 MLflow 的核心功能。

mlflow-tracing

下载此笔记本

为什么要将 LlamaIndex 与 MLflow 结合使用？¶

将 LlamaIndex 与 MLflow 集成，为开发和管理 LlamaIndex 应用程序提供了无缝体验

MLflow Tracing 是一种强大的可观测性工具，用于监控和调试 LlamaIndex 模型内部的运行情况，帮助您快速识别潜在的瓶颈或问题。
MLflow Experiment 允许您在 MLflow 中跟踪您的索引/引擎/工作流，并管理组成 LlamaIndex 项目的许多活动部分，例如提示、LLM、工具、全局配置等。
MLflow Model 将您的 LlamaIndex 应用程序与所有依赖版本、输入和输出接口以及其他基本元数据打包在一起。
MLflow Evaluate 有助于高效评估您的 LlamaIndex 应用程序的性能，确保强大的性能分析和快速迭代。

您将学到什么¶

通过本教程的学习，您将能够

在 LlamaIndex 中创建 MVP VectorStoreIndex。
使用该索引作为查询引擎进行推理，并使用 MLflow Tracing 进行检查。
将索引记录到 MLflow Experiment。
探索 MLflow UI，了解 MLflow Model 如何打包您的 LlamaIndex 应用程序。

这些基础知识将帮助您熟悉 MLflow 中的 LlamaIndex 用户旅程。如果您想了解更多关于高级用例（例如工具调用代理）的集成信息，请参阅本高级教程。

设置¶

安装 MLflow 和 LlamaIndex

In [ ]

已复制！

%pip install mlflow>=2.18 llama-index>=0.10.44 -q
%pip install mlflow>=2.18 llama-index>=0.10.44 -q

打开一个单独的终端并运行 mlflow ui --port 5000 来启动 MLflow UI（如果尚未启动）。如果您在云环境中运行此笔记本，请参阅如何运行教程指南，了解 MLflow 的不同设置。
创建 MLflow Experiment 并将笔记本连接到它

In [ ]

已复制！

import mlflow

mlflow.set_experiment("llama-index-tutorial")
mlflow.set_tracking_uri(
    "http://localhost:5000"
)  # Or your remote tracking server URI
import mlflow mlflow.set_experiment("llama-index-tutorial") mlflow.set_tracking_uri( "http://localhost:5000" ) # Or your remote tracking server URI

将 OpenAI API 密钥设置为环境变量。如果您使用的是不同的 LLM 提供商，请设置相应的环境变量。

In [ ]

已复制！

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
import os from getpass import getpass os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

启用 MLflow Tracing¶

只需一行代码即可启用 LlamaIndex 的 MLflow Tracing。

In [ ]

已复制！

mlflow.llama_index.autolog()
mlflow.llama_index.autolog()

创建索引¶

向量存储索引是 LlamaIndex 中的核心组件之一。它们包含已摄取文档块的嵌入向量（有时也包含文档块本身）。这些向量可用于在 LlamaIndex 中使用不同引擎类型进行推理任务。

Query Engine：：执行直接查询以根据用户问题检索相关信息。非常适合获取简洁的答案或匹配特定查询的文档，类似于搜索引擎。
Chat Engine：：参与需要跨多次交互维护上下文和历史记录的对话式 AI 任务。适用于客户支持机器人或虚拟助手等交互式应用程序，其中对话上下文很重要。

In [ ]

已复制！

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage

# Create an index with a single dummy document
llama_index_example_document = Document.example()
index = VectorStoreIndex.from_documents([llama_index_example_document])
from llama_index.core import Document, VectorStoreIndex from llama_index.core.llms import ChatMessage # Create an index with a single dummy document llama_index_example_document = Document.example() index = VectorStoreIndex.from_documents([llama_index_example_document])

查询索引¶

让我们使用此索引通过查询引擎执行推理。

In [ ]

已复制！

query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)
query_response = index.as_query_engine().query("What is llama_index?") print(query_response)

除了打印出的响应外，您还应该在输出单元格中看到 MLflow Trace UI。这提供了查询引擎执行流程的详细且直观的可视化，帮助您理解内部工作原理并调试可能出现的任何问题。

这次让我们使用聊天引擎进行另一次查询，看看执行流程有什么不同。

In [ ]

已复制！





chat_response = index.as_chat_engine().chat(
    "What is llama_index?",
    chat_history=[
        ChatMessage(role="system", content="You are an expert on RAG!")
    ],
)
print(chat_response)
chat_response = index.as_chat_engine().chat( "What is llama_index?", chat_history=[ ChatMessage(role="system", content="You are an expert on RAG!") ], ) print(chat_response)

如跟踪记录所示，主要区别在于查询引擎执行静态工作流（RAG），而聊天引擎使用代理工作流从索引中动态提取必要的上下文。

您还可以通过导航到之前创建的 Experiment 并选择 Trace 标签页，在 MLflow UI 中查看已记录的跟踪。如果您不想在输出单元格中显示跟踪，而只想将其记录在 MLflow 中，请在笔记本中运行 mlflow.tracing.disable_notebook_display()。

使用 MLflow 保存索引¶

以下代码使用 MLflow 记录 LlamaIndex 模型，跟踪其参数和示例输入，同时使用唯一的 model_uri 注册它。这确保了在开发、测试和生产环境中模型管理的统一性和可复现性，并简化了部署和共享。

关键参数

engine_type：定义 pyfunc 和 spark_udf 推理类型
input_example：定义输入签名并通过预测推断输出签名
registered_model_name：定义模型在 MLflow 模型注册表中的名称

In [ ]

已复制！





with mlflow.start_run() as run:
    model_info = mlflow.llama_index.log_model(
        index,
        artifact_path="llama_index",
        engine_type="query",
        input_example="hi",
        registered_model_name="my_llama_index_vector_store",
    )
    model_uri = model_info.model_uri
    print(f"Model identifier for loading: {model_uri}")
with mlflow.start_run() as run: model_info = mlflow.llama_index.log_model( index, artifact_path="llama_index", engine_type="query", input_example="hi", registered_model_name="my_llama_index_vector_store", ) model_uri = model_info.model_uri print(f"Model identifier for loading: {model_uri}")

重新加载索引并执行推理¶

以下代码演示了使用加载的模型可以执行的三种核心推理类型。

通过 LlamaIndex 加载并执行推理： 此方法使用 mlflow.llama_index.load_model 加载模型并执行直接查询、聊天或检索。当您想要利用底层 llama index 对象的全部功能时，此方法是理想选择。
通过 MLflow PyFunc 加载并执行推理： 此方法使用 mlflow.pyfunc.load_model 加载模型，从而以通用 PyFunc 格式实现模型预测，引擎类型在记录时指定。这对于使用 mlflow.evaluate 评估模型或部署模型进行服务非常有用。
通过 MLflow Spark UDF 加载并执行推理： 此方法使用 mlflow.pyfunc.spark_udf 将模型加载为 Spark UDF，从而促进在 Spark DataFrame 中跨大型数据集进行分布式推理。它非常适合处理大规模数据，并且与 PyFunc 推理一样，仅支持在记录时定义的引擎类型。

In [ ]

已复制！





print("\n------------- Inference via Llama Index   -------------")
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)

print("\n------------- Inference via MLflow PyFunc -------------")
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)
print("\n------------- Inference via Llama Index -------------") index = mlflow.llama_index.load_model(model_uri) query_response = index.as_query_engine().query("hi") print(query_response) print("\n------------- Inference via MLflow PyFunc -------------") index = mlflow.pyfunc.load_model(model_uri) query_response = index.predict("hi") print(query_response)

In [ ]

已复制！





# Optional: Spark UDF inference
show_spark_udf_inference = False
if show_spark_udf_inference:
    print("\n------------- Inference via MLflow Spark UDF -------------")
    from pyspark.sql import SparkSession

    spark = SparkSession.builder.getOrCreate()

    udf = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="string")
    df = spark.createDataFrame([("hi",), ("hello",)], ["text"])
    df.withColumn("response", udf("text")).toPandas()
# Optional: Spark UDF inference show_spark_udf_inference = False if show_spark_udf_inference: print("\n------------- Inference via MLflow Spark UDF -------------") from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() udf = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="string") df = spark.createDataFrame([("hi",), ("hello",)], ["text"]) df.withColumn("response", udf("text")).toPandas()

探索 MLflow Experiment UI¶

最后，让我们探索 MLflow UI，看看我们目前记录的内容。您可以通过在浏览器中打开 http://localhost:5000 来访问 UI，或者运行以下单元格在笔记本中显示它。

In [ ]

已复制！

# Directly renders MLflow UI within the notebook for easy browsing:)
IFrame(src="http://localhost:5000", width=1000, height=600)
# Directly renders MLflow UI within the notebook for easy browsing:) IFrame(src="http://localhost:5000", width=1000, height=600)

让我们导航到屏幕左上角的 experiments 标签页，然后点击我们最近的运行，如下图所示。

“运行”页面显示了关于您的 Experiment 的整体元数据。您可以进一步导航到 Artifacts 标签页查看已记录的工件（模型）。

MLflow 在运行期间记录与您的模型及其环境相关的工件。大多数已记录的文件，例如 conda.yaml、python_env.yml 和 requirements.txt，是所有 MLflow 记录的标准文件，有助于在不同环境之间实现可复现性。但是，有两组工件是 LlamaIndex 特有的：

index：一个存储序列化向量存储的目录。有关更多详细信息，请访问LlamaIndex 的序列化文档。
settings.json：序列化的 llama_index.core.Settings 服务上下文。有关更多详细信息，请访问LlamaIndex 的 Settings 文档。

通过存储这些对象，MLflow 能够重新创建您记录模型时的环境。

重要提示： MLflow 不会序列化 API 密钥。这些密钥必须作为环境变量存在于您的模型加载环境中。

最后，您可以通过导航到 Tracing 标签页查看本教程期间记录的所有跟踪列表。点击每一行，您可以看到与前面输出单元格中所示类似的详细跟踪视图。

自定义和后续步骤¶

在处理生产系统时，用户通常会利用自定义的服务上下文，这可以通过 LlamaIndex 的 Settings 对象来实现。