跳到内容

Box

BoxAIExtractToolSpec #

基础类: BaseToolSpec

从 Box 文件中提取 AI 生成的内容。

参数

名称 类型 描述 默认值
box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

必需

属性

名称 类型 描述
spec_functions list

支持的函数列表。

_box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

方法

名称 描述
基类:BaseToolSpec

从 Box 文件中提取 AI 生成的内容。

参数

名称 类型 描述 默认值
file_id str

Box 文件的 ID。

必需
参数 str

用于提取的 AI 提示。

必需

返回

名称 类型 描述
Document

一个包含提取的 AI 内容的 Document 对象。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/ai_extract/base.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
class BoxAIExtractToolSpec(BaseToolSpec):
    """
    Extracts AI generated content from a Box file.

    Args:
        box_client (BoxClient): A BoxClient instance for interacting with Box API.

    Attributes:
        spec_functions (list): A list of supported functions.
        _box_client (BoxClient): An instance of BoxClient for interacting with Box API.

    Methods:
        ai_extract(file_id, ai_prompt): Extracts AI generated content from a Box file.

    Args:
        file_id (str): The ID of the Box file.
        ai_prompt (str): The AI prompt to use for extraction.

    Returns:
        Document: A Document object containing the extracted AI content.

    """

    spec_functions = ["ai_extract"]

    _box_client: BoxClient

    def __init__(self, box_client: BoxClient) -> None:
        """
        Initializes the BoxAIExtractToolSpec with a BoxClient instance.

        Args:
            box_client (BoxClient): The BoxClient instance to use for interacting with the Box API.

        """
        self._box_client = add_extra_header_to_box_client(box_client)

    def ai_extract(
        self,
        file_id: str,
        ai_prompt: str,
    ) -> Document:
        """
        Extracts AI generated content from a Box file using the provided AI prompt.

        Args:
            file_id (str): The ID of the Box file to process.
            ai_prompt (str): The AI prompt to use for content extraction.

        Returns:
            Document: A Document object containing the extracted AI content,
            including metadata about the original Box file.

        """
        # Connect to Box
        box_check_connection(self._box_client)

        # get payload information
        box_file = get_box_files_details(
            box_client=self._box_client, file_ids=[file_id]
        )[0]

        box_file = get_files_ai_extract_data(
            box_client=self._box_client,
            box_files=[box_file],
            ai_prompt=ai_prompt,
        )[0]

        doc = box_file_to_llama_document(box_file)
        doc.text = box_file.ai_response if box_file.ai_response else ""
        doc.metadata["ai_prompt"] = box_file.ai_prompt
        doc.metadata["ai_response"] = box_file.ai_response

        return doc

ai_extract #

ai_extract(file_id: str, ai_prompt: str) -> Document

使用提供的 AI 提示从 Box 文件中提取 AI 生成的内容。

参数

名称 类型 描述 默认值
file_id str

要处理的 Box 文件的 ID。

必需
参数 str

用于内容提取的 AI 提示。

必需

返回

名称 类型 描述
Document Document

一个包含提取的 AI 内容的 Document 对象,

Document

包括原始 Box 文件的元数据。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/ai_extract/base.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def ai_extract(
    self,
    file_id: str,
    ai_prompt: str,
) -> Document:
    """
    Extracts AI generated content from a Box file using the provided AI prompt.

    Args:
        file_id (str): The ID of the Box file to process.
        ai_prompt (str): The AI prompt to use for content extraction.

    Returns:
        Document: A Document object containing the extracted AI content,
        including metadata about the original Box file.

    """
    # Connect to Box
    box_check_connection(self._box_client)

    # get payload information
    box_file = get_box_files_details(
        box_client=self._box_client, file_ids=[file_id]
    )[0]

    box_file = get_files_ai_extract_data(
        box_client=self._box_client,
        box_files=[box_file],
        ai_prompt=ai_prompt,
    )[0]

    doc = box_file_to_llama_document(box_file)
    doc.text = box_file.ai_response if box_file.ai_response else ""
    doc.metadata["ai_prompt"] = box_file.ai_prompt
    doc.metadata["ai_response"] = box_file.ai_response

    return doc

BoxAIPromptToolSpec #

基础类: BaseToolSpec

基于 Box 文件生成 AI 提示。

参数

名称 类型 描述 默认值
box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

必需

属性

名称 类型 描述
spec_functions list

支持的函数列表。

_box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

方法

名称 描述
参数

基于 Box 文件生成一个 AI 提示。

参数

名称 类型 描述 默认值
file_id str

Box 文件的 ID。

必需
参数 str

要使用的基础 AI 提示。

必需

返回

名称 类型 描述
Document

一个包含生成的 AI 提示的 Document 对象。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/ai_prompt/base.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class BoxAIPromptToolSpec(BaseToolSpec):
    """
    Generates AI prompts based on a Box file.

    Args:
        box_client (BoxClient): A BoxClient instance for interacting with Box API.

    Attributes:
        spec_functions (list): A list of supported functions.
        _box_client (BoxClient): An instance of BoxClient for interacting with Box API.

    Methods:
        ai_prompt(file_id, ai_prompt): Generates an AI prompt based on a Box file.

    Args:
        file_id (str): The ID of the Box file.
        ai_prompt (str): The base AI prompt to use.

    Returns:
        Document: A Document object containing the generated AI prompt.

    """

    spec_functions = ["ai_prompt"]

    _box_client: BoxClient

    def __init__(self, box_client: BoxClient) -> None:
        """
        Initializes the BoxAIPromptToolSpec with a BoxClient instance.

        Args:
            box_client (BoxClient): The BoxClient instance to use for interacting with the Box API.

        """
        self._box_client = add_extra_header_to_box_client(box_client)

    def ai_prompt(
        self,
        file_id: str,
        ai_prompt: str,
    ) -> Document:
        """
        Generates an AI prompt based on a Box file.

        Retrieves the specified Box file, constructs an AI prompt using the provided base prompt,
        and returns a Document object containing the generated prompt and file metadata.

        Args:
            file_id (str): The ID of the Box file to process.
            ai_prompt (str): The base AI prompt to use as a template.

        Returns:
            Document: A Document object containing the generated AI prompt and file metadata.

        """
        # Connect to Box
        box_check_connection(self._box_client)

        # get box files information
        box_file = get_box_files_details(
            box_client=self._box_client, file_ids=[file_id]
        )[0]

        box_file = get_ai_response_from_box_files(
            box_client=self._box_client,
            box_files=[box_file],
            ai_prompt=ai_prompt,
        )[0]

        doc = box_file_to_llama_document(box_file)
        doc.text = box_file.ai_response if box_file.ai_response else ""
        doc.metadata["ai_prompt"] = box_file.ai_prompt
        doc.metadata["ai_response"] = (
            box_file.ai_response if box_file.ai_response else ""
        )

        return doc

ai_prompt #

ai_prompt(file_id: str, ai_prompt: str) -> Document

基于 Box 文件生成一个 AI 提示。

检索指定的 Box 文件,使用提供的基础提示构建 AI 提示,并返回一个包含生成的提示和文件元数据的 Document 对象。

参数

名称 类型 描述 默认值
file_id str

要处理的 Box 文件的 ID。

必需
参数 str

用作模板的基础 AI 提示。

必需

返回

名称 类型 描述
Document Document

一个包含生成的 AI 提示和文件元数据的 Document 对象。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/ai_prompt/base.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def ai_prompt(
    self,
    file_id: str,
    ai_prompt: str,
) -> Document:
    """
    Generates an AI prompt based on a Box file.

    Retrieves the specified Box file, constructs an AI prompt using the provided base prompt,
    and returns a Document object containing the generated prompt and file metadata.

    Args:
        file_id (str): The ID of the Box file to process.
        ai_prompt (str): The base AI prompt to use as a template.

    Returns:
        Document: A Document object containing the generated AI prompt and file metadata.

    """
    # Connect to Box
    box_check_connection(self._box_client)

    # get box files information
    box_file = get_box_files_details(
        box_client=self._box_client, file_ids=[file_id]
    )[0]

    box_file = get_ai_response_from_box_files(
        box_client=self._box_client,
        box_files=[box_file],
        ai_prompt=ai_prompt,
    )[0]

    doc = box_file_to_llama_document(box_file)
    doc.text = box_file.ai_response if box_file.ai_response else ""
    doc.metadata["ai_prompt"] = box_file.ai_prompt
    doc.metadata["ai_response"] = (
        box_file.ai_response if box_file.ai_response else ""
    )

    return doc

BoxSearchByMetadataToolSpec #

基础类: BaseToolSpec

提供基于元数据搜索 Box 资源的功能。

此类允许您使用 BoxSearchByMetadataOptions 类指定的元数据搜索 Box 资源。它利用 Box API 搜索功能,并返回一个包含找到的资源信息的 Document 对象列表。

属性

名称 类型 描述
spec_functions list

支持的函数列表(始终为 "search")。

_box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

_options BoxSearchByMetadataOptions

一个包含搜索选项的 BoxSearchByMetadataOptions 实例。

方法

名称 描述
类型

Optional[str] = None) -> List[Document]: 根据配置的元数据选项和可选的查询参数对 Box 资源执行搜索。返回一个表示找到的资源的 Document 对象列表。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/search_by_metadata/base.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
class BoxSearchByMetadataToolSpec(BaseToolSpec):
    """
    Provides functionalities for searching Box resources based on metadata.

    This class allows you to search for Box resources based on metadata specified
    using the `BoxSearchByMetadataOptions` class. It utilizes the Box API search
    functionality and returns a list of `Document` objects containing information
    about the found resources.

    Attributes:
        spec_functions (list): A list of supported functions (always "search").
        _box_client (BoxClient): An instance of BoxClient for interacting with Box API.
        _options (BoxSearchByMetadataOptions): An instance of BoxSearchByMetadataOptions
            containing search options.

    Methods:
        search(query_params: Optional[str] = None) -> List[Document]:
            Performs a search for Box resources based on the configured metadata options
            and optional query parameters. Returns a list of `Document` objects representing
            the found resources.

    """

    spec_functions = ["search"]

    _box_client: BoxClient
    _options: BoxSearchByMetadataOptions

    def __init__(
        self, box_client: BoxClient, options: BoxSearchByMetadataOptions
    ) -> None:
        """
        Initializes a `BoxSearchByMetadataToolSpec` instance.

        Args:
            box_client (BoxClient): An authenticated Box API client.
            options (BoxSearchByMetadataToolSpec, optional): An instance of `BoxSearchByMetadataToolSpec` containing search options.
                Defaults to `BoxSearchByMetadataToolSpec()`.

        """
        self._box_client = add_extra_header_to_box_client(box_client)
        self._options = options

    def search(
        self,
        query_params: Optional[str] = None,
    ) -> List[Document]:
        """
        Searches for Box resources based on metadata and returns a list of documents.

        This method leverages the configured metadata options (`self._options`) to
        search for Box resources. It converts the provided JSON string (`query_params`)
        into a dictionary and uses it to refine the search based on additional
        metadata criteria. It retrieves matching Box files and then converts them
        into `Document` objects containing relevant information.

        Args:
            query_params (Optional[str]): An optional JSON string representing additional
                query parameters for filtering by metadata.

        Returns:
            List[Document]: A list of `Document` objects representing the found Box resources.

        """
        box_check_connection(self._box_client)

        # Box API accepts a dictionary of query parameters as a string, so we need to
        # convert the provided JSON string to a dictionary.
        params_dict = json.loads(query_params)

        box_files = search_files_by_metadata(
            box_client=self._box_client,
            from_=self._options.from_,
            ancestor_folder_id=self._options.ancestor_folder_id,
            query=self._options.query,
            query_params=params_dict,
            limit=self._options.limit,
        )

        box_files = get_box_files_details(
            box_client=self._box_client, file_ids=[file.id for file in box_files]
        )

        docs: List[Document] = []

        for file in box_files:
            doc = box_file_to_llama_document(file)
            docs.append(doc)

        return docs

search #

search(query_params: Optional[str] = None) -> List[Document]

基于元数据搜索 Box 资源并返回一个文档列表。

此方法利用配置的元数据选项(self._options)来搜索 Box 资源。它将提供的 JSON 字符串(query_params)转换为字典,并用它根据额外的元数据条件细化搜索。它检索匹配的 Box 文件,然后将它们转换为包含相关信息的 Document 对象。

参数

名称 类型 描述 默认值
query_params Optional[str]

一个可选的 JSON 字符串,表示用于按元数据过滤的额外查询参数。

返回

类型 描述
List[Document]

List[Document]: 一个表示找到的 Box 资源的 Document 对象列表。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/search_by_metadata/base.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def search(
    self,
    query_params: Optional[str] = None,
) -> List[Document]:
    """
    Searches for Box resources based on metadata and returns a list of documents.

    This method leverages the configured metadata options (`self._options`) to
    search for Box resources. It converts the provided JSON string (`query_params`)
    into a dictionary and uses it to refine the search based on additional
    metadata criteria. It retrieves matching Box files and then converts them
    into `Document` objects containing relevant information.

    Args:
        query_params (Optional[str]): An optional JSON string representing additional
            query parameters for filtering by metadata.

    Returns:
        List[Document]: A list of `Document` objects representing the found Box resources.

    """
    box_check_connection(self._box_client)

    # Box API accepts a dictionary of query parameters as a string, so we need to
    # convert the provided JSON string to a dictionary.
    params_dict = json.loads(query_params)

    box_files = search_files_by_metadata(
        box_client=self._box_client,
        from_=self._options.from_,
        ancestor_folder_id=self._options.ancestor_folder_id,
        query=self._options.query,
        query_params=params_dict,
        limit=self._options.limit,
    )

    box_files = get_box_files_details(
        box_client=self._box_client, file_ids=[file.id for file in box_files]
    )

    docs: List[Document] = []

    for file in box_files:
        doc = box_file_to_llama_document(file)
        docs.append(doc)

    return docs

BoxSearchToolSpec #

基础类: BaseToolSpec

提供搜索 Box 资源的功能。

此类允许您使用 BoxSearchOptions 类指定的各种标准搜索 Box 资源。它利用 Box API 搜索功能,并返回一个包含找到的资源信息的 Document 对象列表。

属性

名称 类型 描述
spec_functions list

支持的函数列表(始终为 "box_search")。

_box_client BoxClient

一个 BoxClient 实例,用于与 Box API 交互。

_options BoxSearchOptions

一个包含搜索选项的 BoxSearchOptions 实例。

方法

名称 描述
默认值

str) -> List[Document]: 根据提供的查询和配置的搜索选项对 Box 资源执行搜索。返回一个表示找到的资源的 Document 对象列表。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/search/base.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
class BoxSearchToolSpec(BaseToolSpec):
    """
    Provides functionalities for searching Box resources.

    This class allows you to search for Box resources based on various criteria
    specified using the `BoxSearchOptions` class. It utilizes the Box API search
    functionality and returns a list of `Document` objects containing information
    about the found resources.

    Attributes:
        spec_functions (list): A list of supported functions (always "box_search").
        _box_client (BoxClient): An instance of BoxClient for interacting with Box API.
        _options (BoxSearchOptions): An instance of BoxSearchOptions containing search options.

    Methods:
        box_search(query: str) -> List[Document]:
            Performs a search for Box resources based on the provided query and configured
            search options. Returns a list of `Document` objects representing the found resources.

    """

    spec_functions = ["box_search"]

    _box_client: BoxClient
    _options: BoxSearchOptions

    def __init__(
        self, box_client: BoxClient, options: BoxSearchOptions = BoxSearchOptions()
    ) -> None:
        """
        Initializes a `BoxSearchToolSpec` instance.

        Args:
            box_client (BoxClient): An authenticated Box API client.
            options (BoxSearchOptions, optional): An instance of `BoxSearchOptions` containing search options.
                Defaults to `BoxSearchOptions()`.

        """
        self._box_client = add_extra_header_to_box_client(box_client)
        self._options = options

    def box_search(
        self,
        query: str,
    ) -> List[Document]:
        """
        Searches for Box resources based on the provided query and configured search options.

        This method utilizes the Box API search functionality to find resources matching the provided
        query and search options specified in the `BoxSearchOptions` object. It returns a list of
        `Document` objects containing information about the found resources.

        Args:
            query (str): The search query to use for searching Box resources.

        Returns:
            List[Document]: A list of `Document` objects representing the found Box resources.

        """
        box_check_connection(self._box_client)

        box_files = search_files(
            box_client=self._box_client,
            query=query,
            scope=self._options.scope,
            file_extensions=self._options.file_extensions,
            created_at_range=self._options.created_at_range,
            updated_at_range=self._options.updated_at_range,
            size_range=self._options.size_range,
            owner_user_ids=self._options.owner_user_ids,
            recent_updater_user_ids=self._options.recent_updater_user_ids,
            ancestor_folder_ids=self._options.ancestor_folder_ids,
            content_types=self._options.content_types,
            limit=self._options.limit,
            offset=self._options.offset,
        )

        box_files = get_box_files_details(
            box_client=self._box_client, file_ids=[file.id for file in box_files]
        )

        docs: List[Document] = []

        for file in box_files:
            doc = box_file_to_llama_document(file)
            docs.append(doc)

        return docs
box_search(query: str) -> List[Document]

根据提供的查询和配置的搜索选项搜索 Box 资源。

此方法利用 Box API 搜索功能查找与提供的查询和 BoxSearchOptions 对象中指定的搜索选项匹配的资源。它返回一个包含找到的资源信息的 Document 对象列表。

参数

名称 类型 描述 默认值
query str

用于搜索 Box 资源的搜索查询。

必需

返回

类型 描述
List[Document]

List[Document]: 一个表示找到的 Box 资源的 Document 对象列表。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/search/base.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def box_search(
    self,
    query: str,
) -> List[Document]:
    """
    Searches for Box resources based on the provided query and configured search options.

    This method utilizes the Box API search functionality to find resources matching the provided
    query and search options specified in the `BoxSearchOptions` object. It returns a list of
    `Document` objects containing information about the found resources.

    Args:
        query (str): The search query to use for searching Box resources.

    Returns:
        List[Document]: A list of `Document` objects representing the found Box resources.

    """
    box_check_connection(self._box_client)

    box_files = search_files(
        box_client=self._box_client,
        query=query,
        scope=self._options.scope,
        file_extensions=self._options.file_extensions,
        created_at_range=self._options.created_at_range,
        updated_at_range=self._options.updated_at_range,
        size_range=self._options.size_range,
        owner_user_ids=self._options.owner_user_ids,
        recent_updater_user_ids=self._options.recent_updater_user_ids,
        ancestor_folder_ids=self._options.ancestor_folder_ids,
        content_types=self._options.content_types,
        limit=self._options.limit,
        offset=self._options.offset,
    )

    box_files = get_box_files_details(
        box_client=self._box_client, file_ids=[file.id for file in box_files]
    )

    docs: List[Document] = []

    for file in box_files:
        doc = box_file_to_llama_document(file)
        docs.append(doc)

    return docs

BoxTextExtractToolSpec #

基础类: BaseToolSpec

Box 文本提取工具规范。

此类提供从 Box 文件提取文本内容并创建 Document 对象的规范。它利用 Box API 检索指定 Box 文件的文本表示(如果可用)。

属性

名称 类型 描述
_box_client BoxClient

一个用于与 Box API 交互的 Box 客户端实例。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/text_extract/base.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class BoxTextExtractToolSpec(BaseToolSpec):
    """
    Box Text Extraction Tool Specification.

    This class provides a specification for extracting text content from Box files
    and creating Document objects. It leverages the Box API to retrieve the
    text representation (if available) of specified Box files.

    Attributes:
        _box_client (BoxClient): An instance of the Box client for interacting
            with the Box API.

    """

    spec_functions = ["extract"]
    _box_client: BoxClient

    def __init__(self, box_client: BoxClient) -> None:
        """
        Initializes the Box Text Extraction Tool Specification with the
        provided Box client instance.

        Args:
            box_client (BoxClient): The Box client instance.

        """
        self._box_client = add_extra_header_to_box_client(box_client)

    def extract(
        self,
        file_id: str,
    ) -> Document:
        """
        Extracts text content from Box files and creates Document objects.

        This method utilizes the Box API to retrieve the text representation
        (if available) of the specified Box files. It then creates Document
        objects containing the extracted text and file metadata.

        Args:
            file_id (str): A of Box file ID
                to extract text from.

        Returns:
            List[Document]: A list of Document objects containing the extracted
                text content and file metadata.

        """
        # Connect to Box
        box_check_connection(self._box_client)

        # get payload information
        box_file = get_box_files_details(
            box_client=self._box_client, file_ids=[file_id]
        )[0]

        box_file = get_text_representation(
            box_client=self._box_client,
            box_files=[box_file],
        )[0]

        doc = box_file_to_llama_document(box_file)
        doc.text = box_file.text_representation if box_file.text_representation else ""
        return doc

extract #

extract(file_id: str) -> Document

从 Box 文件提取文本内容并创建 Document 对象。

此方法利用 Box API 检索指定 Box 文件的文本表示(如果可用)。然后它创建包含提取的文本和文件元数据的 Document 对象。

参数

名称 类型 描述 默认值
file_id str

要提取文本的 Box 文件 ID。

必需

返回

类型 描述
Document

List[Document]: 一个包含提取的文本内容和文件元数据的 Document 对象列表。

源代码位于 llama-index-integrations/tools/llama-index-tools-box/llama_index/tools/box/text_extract/base.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def extract(
    self,
    file_id: str,
) -> Document:
    """
    Extracts text content from Box files and creates Document objects.

    This method utilizes the Box API to retrieve the text representation
    (if available) of the specified Box files. It then creates Document
    objects containing the extracted text and file metadata.

    Args:
        file_id (str): A of Box file ID
            to extract text from.

    Returns:
        List[Document]: A list of Document objects containing the extracted
            text content and file metadata.

    """
    # Connect to Box
    box_check_connection(self._box_client)

    # get payload information
    box_file = get_box_files_details(
        box_client=self._box_client, file_ids=[file_id]
    )[0]

    box_file = get_text_representation(
        box_client=self._box_client,
        box_files=[box_file],
    )[0]

    doc = box_file_to_llama_document(box_file)
    doc.text = box_file.text_representation if box_file.text_representation else ""
    return doc