跳过内容

Llama 数据集元数据

LlamaDatasetMetadataPack #

基类: BaseLlamaPack

用于创建和保存提交 llamadataset 所需元数据文件(card.json 和 README.md)的 llamapack。

源代码位于 llama-index-packs/llama-index-packs-llama-dataset-metadata/llama_index/packs/llama_dataset_metadata/base.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
class LlamaDatasetMetadataPack(BaseLlamaPack):
    """
    A llamapack for creating and saving the necessary metadata files for
    submitting a llamadataset: card.json and README.md.
    """

    def run(
        self,
        index: BaseIndex,
        benchmark_df: pd.DataFrame,
        rag_dataset: "LabelledRagDataset",
        name: str,
        description: str,
        baseline_name: str,
        source_urls: Optional[List[str]] = None,
        code_url: Optional[str] = None,
    ):
        """
        Main usage for a llamapack. This will build the card.json and README.md
        and save them to local disk.

        Args:
            index (BaseIndex): the index from which query_engine is derived and
                used in the rag evaluation.
            benchmark_df (pd.DataFrame): the benchmark dataframe after using
                RagEvaluatorPack
            rag_dataset (LabelledRagDataset): the LabelledRagDataset used for
                evaluations
            name (str): The name of the new dataset e.g., "Paul Graham Essay Dataset"
            baseline_name (str): The name of the baseline e.g., "llamaindex"
            description (str): The description of the new dataset.
            source_urls (Optional[List[str]], optional): _description_. Defaults to None.
            code_url (Optional[str], optional): _description_. Defaults to None.

        """
        readme_obj = Readme(name=name)
        card_obj = DatasetCard.from_rag_evaluation(
            index=index,
            benchmark_df=benchmark_df,
            rag_dataset=rag_dataset,
            name=name,
            description=description,
            baseline_name=baseline_name,
            source_urls=source_urls,
            code_url=code_url,
        )

        # save card.json
        with open("card.json", "w") as f:
            json.dump(card_obj.dict(by_alias=True), f)

        # save README.md
        with open("README.md", "w") as f:
            f.write(readme_obj.create_readme())

run #

run(index: BaseIndex, benchmark_df: DataFrame, rag_dataset: LabelledRagDataset, name: str, description: str, baseline_name: str, source_urls: Optional[List[str]] = None, code_url: Optional[str] = None)

llamapack 的主要用法。这将构建 card.json 和 README.md 并将其保存到本地磁盘。

参数

名称 类型 描述 默认值
index BaseIndex

派生 query_engine 并用于 RAG 评估的索引。

必需
benchmark_df DataFrame

使用 RagEvaluatorPack 后得到的基准测试 DataFrame

必需
rag_dataset LabelledRagDataset

用于评估的 LabelledRagDataset

必需
name str

新数据集的名称,例如“Paul Graham 论文数据集”

必需
baseline_name str

基准线的名称,例如“llamaindex”

必需
description str

新数据集的描述。

必需
source_urls Optional[List[str]]

描述。默认为 None。

code_url Optional[str]

描述。默认为 None。

源代码位于 llama-index-packs/llama-index-packs-llama-dataset-metadata/llama_index/packs/llama_dataset_metadata/base.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def run(
    self,
    index: BaseIndex,
    benchmark_df: pd.DataFrame,
    rag_dataset: "LabelledRagDataset",
    name: str,
    description: str,
    baseline_name: str,
    source_urls: Optional[List[str]] = None,
    code_url: Optional[str] = None,
):
    """
    Main usage for a llamapack. This will build the card.json and README.md
    and save them to local disk.

    Args:
        index (BaseIndex): the index from which query_engine is derived and
            used in the rag evaluation.
        benchmark_df (pd.DataFrame): the benchmark dataframe after using
            RagEvaluatorPack
        rag_dataset (LabelledRagDataset): the LabelledRagDataset used for
            evaluations
        name (str): The name of the new dataset e.g., "Paul Graham Essay Dataset"
        baseline_name (str): The name of the baseline e.g., "llamaindex"
        description (str): The description of the new dataset.
        source_urls (Optional[List[str]], optional): _description_. Defaults to None.
        code_url (Optional[str], optional): _description_. Defaults to None.

    """
    readme_obj = Readme(name=name)
    card_obj = DatasetCard.from_rag_evaluation(
        index=index,
        benchmark_df=benchmark_df,
        rag_dataset=rag_dataset,
        name=name,
        description=description,
        baseline_name=baseline_name,
        source_urls=source_urls,
        code_url=code_url,
    )

    # save card.json
    with open("card.json", "w") as f:
        json.dump(card_obj.dict(by_alias=True), f)

    # save README.md
    with open("README.md", "w") as f:
        f.write(readme_obj.create_readme())