LLM Pydantic 程序¶
本指南介绍如何使用我们的 LLMTextCompletionProgram
生成结构化数据。给定一个 LLM 以及一个输出 Pydantic 类,生成一个结构化的 Pydantic 对象。
对于目标对象,您可以选择直接指定 output_cls
,或者指定一个 PydanticOutputParser
或任何其他生成 Pydantic 对象的 BaseOutputParser。
在下面的示例中,我们将展示将数据提取到 Album
对象(可以包含 Song 对象列表)的不同方法。
提取到 Album
类¶
这是一个将输出解析为 Album
模式的简单示例,Album
模式可以包含多首歌曲。
只需在初始化 LLMTextCompletionProgram
时将 Album
传递给 output_cls
属性即可。
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
In [ ]
已复制!
from pydantic import BaseModel
from typing import List
from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel from typing import List from llama_index.core.program import LLMTextCompletionProgram
定义输出模式
In [ ]
已复制!
class Song(BaseModel):
"""Data model for a song."""
title: str
length_seconds: int
class Album(BaseModel):
"""Data model for an album."""
name: str
artist: str
songs: List[Song]
class Song(BaseModel): """一首歌曲的数据模型。""" title: str length_seconds: int class Album(BaseModel): """一张专辑的数据模型。""" name: str artist: str songs: List[Song]
定义 LLM Pydantic 程序
In [ ]
已复制!
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.program import LLMTextCompletionProgram
In [ ]
已复制!
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
output_cls=Album,
prompt_template_str=prompt_template_str,
verbose=True,
)
prompt_template_str = """\ 生成一个示例专辑,包含艺术家和歌曲列表。\ 以电影 {movie_name} 为灵感。\ """ program = LLMTextCompletionProgram.from_defaults( output_cls=Album, prompt_template_str=prompt_template_str, verbose=True, )
运行程序以获取结构化输出。
In [ ]
已复制!
output = program(movie_name="The Shining")
output = program(movie_name="The Shining")
输出是一个有效的 Pydantic 对象,我们可以用它来调用函数/API。
In [ ]
已复制!
output
output
Out[ ]
Album(name='The Overlook', artist='Jack Torrance', songs=[Song(title='Redrum', length_seconds=240), Song(title="Here's Johnny", length_seconds=180), Song(title='Room 237', length_seconds=300), Song(title='All Work and No Play', length_seconds=210), Song(title='The Maze', length_seconds=270)])
使用 Pydantic Output Parser 初始化¶
上述方法等同于定义一个 Pydantic 输出解析器并将其传入,而不是直接传入 output_cls
。
In [ ]
已复制!
from llama_index.core.output_parsers import PydanticOutputParser
program = LLMTextCompletionProgram.from_defaults(
output_parser=PydanticOutputParser(output_cls=Album),
prompt_template_str=prompt_template_str,
verbose=True,
)
from llama_index.core.output_parsers import PydanticOutputParser program = LLMTextCompletionProgram.from_defaults( output_parser=PydanticOutputParser(output_cls=Album), prompt_template_str=prompt_template_str, verbose=True, )
In [ ]
已复制!
output = program(movie_name="Lord of the Rings")
output
output = program(movie_name="Lord of the Rings") output
Out[ ]
Album(name='The Fellowship of the Ring', artist='Middle-earth Ensemble', songs=[Song(title='The Shire', length_seconds=240), Song(title='Concerning Hobbits', length_seconds=180), Song(title='The Ring Goes South', length_seconds=300), Song(title='A Knife in the Dark', length_seconds=270), Song(title='Flight to the Ford', length_seconds=210), Song(title='Many Meetings', length_seconds=240), Song(title='The Council of Elrond', length_seconds=330), Song(title='The Great Eye', length_seconds=180), Song(title='The Breaking of the Fellowship', length_seconds=360)])
定义自定义输出解析器¶
有时您可能想以自己的方式将输出解析为 JSON 对象。
In [ ]
已复制!
from llama_index.core.output_parsers import ChainableOutputParser
class CustomAlbumOutputParser(ChainableOutputParser):
"""Custom Album output parser.
Assume first line is name and artist.
Assume each subsequent line is the song.
"""
def __init__(self, verbose: bool = False):
self.verbose = verbose
def parse(self, output: str) -> Album:
"""Parse output."""
if self.verbose:
print(f"> Raw output: {output}")
lines = output.split("\n")
name, artist = lines[0].split(",")
songs = []
for i in range(1, len(lines)):
title, length_seconds = lines[i].split(",")
songs.append(Song(title=title, length_seconds=length_seconds))
return Album(name=name, artist=artist, songs=songs)
from llama_index.core.output_parsers import ChainableOutputParser class CustomAlbumOutputParser(ChainableOutputParser): """自定义 Album 输出解析器。假设第一行是专辑名称和艺术家。假设后续每一行都是歌曲。""" def __init__(self, verbose: bool = False): self.verbose = verbose def parse(self, output: str) -> Album: """解析输出。""" if self.verbose: print(f"> Raw output: {output}") lines = output.split("\n") name, artist = lines[0].split(",") songs = [] for i in range(1, len(lines)): title, length_seconds = lines[i].split(",") songs.append(Song(title=title, length_seconds=length_seconds)) return Album(name=name, artist=artist, songs=songs)
In [ ]
已复制!
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
Return answer in following format.
The first line is:
<album_name>, <album_artist>
Every subsequent line is a song with format:
<song_title>, <song_length_seconds>
"""
program = LLMTextCompletionProgram.from_defaults(
output_parser=CustomAlbumOutputParser(verbose=True),
output_cls=Album,
prompt_template_str=prompt_template_str,
verbose=True,
)
prompt_template_str = """\ 生成一个示例专辑,包含艺术家和歌曲列表。\ 以电影 {movie_name} 为灵感。\ 以以下格式返回答案。第一行是, 随后的每一行都是一首歌曲,格式为, """ program = LLMTextCompletionProgram.from_defaults( output_parser=CustomAlbumOutputParser(verbose=True), output_cls=Album, prompt_template_str=prompt_template_str, verbose=True, )
In [ ]
已复制!
output = program(movie_name="The Dark Knight")
output = program(movie_name="The Dark Knight")
> Raw output: Gotham's Reckoning, The Dark Knight A Dark Knight Rises, 240 The Joker's Symphony, 180 Harvey Dent's Lament, 210 Gotham's Guardian, 195 The Batmobile Chase, 225 The Dark Knight's Theme, 150 The Joker's Mind Games, 180 Rachel's Tragedy, 210 Gotham's Last Stand, 240 The Dark Knight's Triumph, 180
In [ ]
已复制!
output
output
Out[ ]
Album(name="Gotham's Reckoning", artist=' The Dark Knight', songs=[Song(title='A Dark Knight Rises', length_seconds=240), Song(title="The Joker's Symphony", length_seconds=180), Song(title="Harvey Dent's Lament", length_seconds=210), Song(title="Gotham's Guardian", length_seconds=195), Song(title='The Batmobile Chase', length_seconds=225), Song(title="The Dark Knight's Theme", length_seconds=150), Song(title="The Joker's Mind Games", length_seconds=180), Song(title="Rachel's Tragedy", length_seconds=210), Song(title="Gotham's Last Stand", length_seconds=240), Song(title="The Dark Knight's Triumph", length_seconds=180)])