结构化预测#

结构化预测允许您更精细地控制应用程序如何调用LLM以及如何使用Pydantic。我们将使用与之前示例相同的Invoice类，加载PDF，并像以前一样使用OpenAI。我们将不再创建结构化LLM，而是直接在LLM本身上调用structured_predict；这是每个LLM类的一个方法。

结构化预测接受一个Pydantic类和一个Prompt Template作为参数，以及提示模板中任何变量的关键字参数。

from llama_index.core.prompts import PromptTemplate

prompt = PromptTemplate(
    "Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
)

response = llm.structured_predict(
    Invoice, prompt, text=text, company_name="Uber"
)

如您所见，这允许我们包含额外的提示方向，指导LLM在Pydantic不足以正确解析数据时应执行的操作。在这种情况下，响应对象就是Pydantic对象本身。如果需要，我们可以将输出获取为JSON

json_output = response.model_dump_json()
print(json.dumps(json.loads(json_output), indent=2))

{
    "invoice_id": "Uber-2024-10-10",
    "date": "2024-10-10T19:49:00",
    "line_items": [
        {"item_name": "Trip fare", "price": 12.18},
        {"item_name": "Access for All Fee", "price": 0.1},
        ...,
    ],
}

structured_predict有几种适用于不同用例的变体，包括异步（astructured_predict）和流式（stream_structured_predict，astream_structured_predict）。

底层实现#

根据您使用的LLM，structured_predict 使用以下两种不同的类之一来处理调用LLM和解析输出。

FunctionCallingProgram#

如果您使用的LLM具有函数调用API，FunctionCallingProgram 将会：

将Pydantic对象转换为工具
提示LLM，同时强制它使用此工具
返回生成的Pydantic对象

这通常是一种更可靠的方法，如果可用，将优先使用。然而，一些LLM是纯文本的，它们将使用另一种方法。

LLMTextCompletionProgram#

如果LLM是纯文本的，LLMTextCompletionProgram 将会：

将Pydantic模式输出为JSON
将模式和数据发送到LLM，并提供提示指令，要求其以符合模式的形式响应
在Pydantic对象上调用model_validate_json()，传入LLM返回的原始文本

这种方法可靠性明显较低，但所有基于文本的LLM都支持。

直接调用预测类#

实际上，structured_predict 对于任何LLM都应该工作良好，但如果您需要更底层的控制，可以直接调用FunctionCallingProgram 和LLMTextCompletionProgram，并进一步定制发生的情况。

textCompletion = LLMTextCompletionProgram.from_defaults(
    output_cls=Invoice,
    llm=llm,
    prompt=PromptTemplate(
        "Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
    ),
)

output = textCompletion(company_name="Uber", text=text)

上述代码与在没有函数调用API的LLM上调用structured_predict 相同，并且像structured_predict一样返回一个Pydantic对象。然而，您可以通过继承PydanticOutputParser来定制输出的解析方式

from llama_index.core.output_parsers import PydanticOutputParser


class MyOutputParser(PydanticOutputParser):
    def get_pydantic_object(self, text: str):
        # do something more clever than this
        return self.output_parser.model_validate_json(text)


textCompletion = LLMTextCompletionProgram.from_defaults(
    llm=llm,
    prompt=PromptTemplate(
        "Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
    ),
    output_parser=MyOutputParser(output_cls=Invoice),
)

如果您使用的是低功率的LLM，并且需要帮助进行解析，这会很有用。

在最后一部分，我们将看看如何以更底层的方式提取结构化数据，包括在同一次调用中提取多个结构。