Steamship

SteamshipFileReader #

基类: BaseReader

读取持久化的 Steamship 文件并将其转换为 Documents。

参数

名称	类型	描述	默认值
`api_key`	`可选[字符串]`	Steamship API 密钥。如果未提供，默认为 STEAMSHIP_API_KEY 环境变量的值。	`无`

注意

需要安装 steamship 包和有效的 Steamship API 密钥。要获取 Steamship API 密钥，请访问：https://steamship.com/account/api。获取 API 密钥后，可以通过名为 STEAMSHIP_API_KEY 的环境变量公开它，或者将其作为 init 参数 (`api_key`) 传递。

源代码位于 llama-index-integrations/readers/llama-index-readers-steamship/llama_index/readers/steamship/base.py

class SteamshipFileReader(BaseReader):
    """
    Reads persistent Steamship Files and converts them to Documents.

    Args:
        api_key: Steamship API key. Defaults to STEAMSHIP_API_KEY value if not provided.

    Note:
        Requires install of `steamship` package and an active Steamship API Key.
        To get a Steamship API Key, visit: https://steamship.com/account/api.
        Once you have an API Key, expose it via an environment variable named
        `STEAMSHIP_API_KEY` or pass it as an init argument (`api_key`).

    """

    def __init__(self, api_key: Optional[str] = None) -> None:
        """Initialize the Reader."""
        try:
            import steamship  # noqa

            self.api_key = api_key
        except ImportError:
            raise ImportError(
                "`steamship` must be installed to use the SteamshipFileReader.\n"
                "Please run `pip install --upgrade steamship."
            )

    def load_data(
        self,
        workspace: str,
        query: Optional[str] = None,
        file_handles: Optional[List[str]] = None,
        collapse_blocks: bool = True,
        join_str: str = "\n\n",
    ) -> List[Document]:
        """
        Load data from persistent Steamship Files into Documents.

        Args:
            workspace: the handle for a Steamship workspace
                (see: https://docs.steamship.com/workspaces/index.html)
            query: a Steamship tag query for retrieving files
                (ex: 'filetag and value("import-id")="import-001"')
            file_handles: a list of Steamship File handles
                (ex: `smooth-valley-9kbdr`)
            collapse_blocks: whether to merge individual File Blocks into a
                single Document, or separate them.
            join_str: when collapse_blocks is True, this is how the block texts
                will be concatenated.

        Note:
            The collection of Files from both `query` and `file_handles` will be
            combined. There is no (current) support for deconflicting the collections
            (meaning that if a file appears both in the result set of the query and
            as a handle in file_handles, it will be loaded twice).

        """
        from steamship import File, Steamship

        client = Steamship(workspace=workspace, api_key=self.api_key)
        files = []
        if query:
            files_from_query = File.query(client=client, tag_filter_query=query).files
            files.extend(files_from_query)

        if file_handles:
            files.extend([File.get(client=client, handle=h) for h in file_handles])

        docs = []
        for file in files:
            metadata = {"source": file.handle}

            for tag in file.tags:
                metadata[tag.kind] = tag.value

            if collapse_blocks:
                text = join_str.join([b.text for b in file.blocks])
                docs.append(Document(text=text, id_=file.handle, metadata=metadata))
            else:
                docs.extend(
                    [
                        Document(text=b.text, id_=file.handle, metadata=metadata)
                        for b in file.blocks
                    ]
                )

        return docs

load_data #

load_data(workspace: str, query: Optional[str] = None, file_handles: Optional[List[str]] = None, collapse_blocks: bool = True, join_str: str = '\n\n') -> List[Document]

从持久化的 Steamship 文件加载数据到 Documents。

参数

名称	类型	描述	默认值
`workspace`	`字符串`	Steamship workspace 的 handle（参见：https://docs.steamship.com/workspaces/index.html）	必需
`query`	`可选[字符串]`	用于检索文件的 Steamship tag query（例如：'filetag and value("import-id")="import-001"'）	`无`
`file_handles`	`可选[列表[字符串]]`	Steamship 文件 handles 列表（例如：`smooth-valley-9kbdr`）	`无`
`collapse_blocks`	`布尔值`	是否将单个 File Blocks 合并到一个 Document 中，或者分开它们。	`True`
`join_str`	`字符串`	当 collapse_blocks 为 True 时，这是块文本的连接方式。	`'\n\n'`

注意

来自 query 和 file_handles 的文件集合将被合并。目前不支持消除集合之间的冲突（这意味着如果一个文件同时出现在查询结果集和 file_handles 中，它将被加载两次）。

源代码位于 llama-index-integrations/readers/llama-index-readers-steamship/llama_index/readers/steamship/base.py

def load_data(
    self,
    workspace: str,
    query: Optional[str] = None,
    file_handles: Optional[List[str]] = None,
    collapse_blocks: bool = True,
    join_str: str = "\n\n",
) -> List[Document]:
    """
    Load data from persistent Steamship Files into Documents.

    Args:
        workspace: the handle for a Steamship workspace
            (see: https://docs.steamship.com/workspaces/index.html)
        query: a Steamship tag query for retrieving files
            (ex: 'filetag and value("import-id")="import-001"')
        file_handles: a list of Steamship File handles
            (ex: `smooth-valley-9kbdr`)
        collapse_blocks: whether to merge individual File Blocks into a
            single Document, or separate them.
        join_str: when collapse_blocks is True, this is how the block texts
            will be concatenated.

    Note:
        The collection of Files from both `query` and `file_handles` will be
        combined. There is no (current) support for deconflicting the collections
        (meaning that if a file appears both in the result set of the query and
        as a handle in file_handles, it will be loaded twice).

    """
    from steamship import File, Steamship

    client = Steamship(workspace=workspace, api_key=self.api_key)
    files = []
    if query:
        files_from_query = File.query(client=client, tag_filter_query=query).files
        files.extend(files_from_query)

    if file_handles:
        files.extend([File.get(client=client, handle=h) for h in file_handles])

    docs = []
    for file in files:
        metadata = {"source": file.handle}

        for tag in file.tags:
            metadata[tag.kind] = tag.value

        if collapse_blocks:
            text = join_str.join([b.text for b in file.blocks])
            docs.append(Document(text=text, id_=file.handle, metadata=metadata))
        else:
            docs.extend(
                [
                    Document(text=b.text, id_=file.handle, metadata=metadata)
                    for b in file.blocks
                ]
            )

    return docs