Scrapegraph

ScrapegraphToolSpec #

基础: BaseToolSpec

scrapegraph 工具规范，用于网页抓取操作。

源代码位于 llama-index-integrations/tools/llama-index-tools-scrapegraph/llama_index/tools/scrapegraph/base.py

class ScrapegraphToolSpec(BaseToolSpec):
    """scrapegraph tool specification for web scraping operations."""

    spec_functions = [
        "scrapegraph_smartscraper",
        "scrapegraph_markdownify",
        "scrapegraph_search",
    ]

    def scrapegraph_smartscraper(
        self,
        prompt: str,
        url: str,
        api_key: str,
        schema: Optional[List[BaseModel]] = None,
    ) -> List[Dict]:
        """
        Perform synchronous web scraping using scrapegraph.

        Args:
            prompt (str): User prompt describing the scraping task
            url (str): Target website URL to scrape
            api_key (str): scrapegraph API key
            schema (Optional[List[BaseModel]]): Pydantic models defining the output structure

        Returns:
            List[Dict]: Scraped data matching the provided schema

        """
        client = Client(api_key=api_key)

        # Basic usage
        return client.smartscraper(
            website_url=url, user_prompt=prompt, output_schema=schema
        )

    def scrapegraph_markdownify(self, url: str, api_key: str) -> str:
        """
        Convert webpage content to markdown format using scrapegraph.

        Args:
            url (str): Target website URL to convert
            api_key (str): scrapegraph API key

        Returns:
            str: Markdown representation of the webpage content

        """
        client = Client(api_key=api_key)

        return client.markdownify(website_url=url)

    def scrapegraph_search(self, query: str, api_key: str) -> str:
        """
        Perform a search query using scrapegraph.

        Args:
            query (str): Search query to execute
            api_key (str): scrapegraph API key

        Returns:
            str: Search results from scrapegraph

        """
        client = Client(api_key=api_key)
        return client.search(query=query)

scrapegraph_smartscraper #

scrapegraph_smartscraper(prompt: str, url: str, api_key: str, schema: Optional[List[BaseModel]] = None) -> List[Dict]

使用 scrapegraph 执行同步网页抓取。

参数

名称	类型	描述	默认值
`提示词`	`str`	描述抓取任务的用户提示词	必需
`url`	`str`	要抓取的目标网站 URL	必需
`api_key`	`str`	scrapegraph API 密钥	必需
`schema`	`可选[列表[BaseModel]]`	定义输出结构的 Pydantic 模型	`无`

返回

类型	描述
`列表[字典]`	List[Dict]: 符合所提供模式的抓取数据

源代码位于 llama-index-integrations/tools/llama-index-tools-scrapegraph/llama_index/tools/scrapegraph/base.py

def scrapegraph_smartscraper(
    self,
    prompt: str,
    url: str,
    api_key: str,
    schema: Optional[List[BaseModel]] = None,
) -> List[Dict]:
    """
    Perform synchronous web scraping using scrapegraph.

    Args:
        prompt (str): User prompt describing the scraping task
        url (str): Target website URL to scrape
        api_key (str): scrapegraph API key
        schema (Optional[List[BaseModel]]): Pydantic models defining the output structure

    Returns:
        List[Dict]: Scraped data matching the provided schema

    """
    client = Client(api_key=api_key)

    # Basic usage
    return client.smartscraper(
        website_url=url, user_prompt=prompt, output_schema=schema
    )

scrapegraph_markdownify #

scrapegraph_markdownify(url: str, api_key: str) -> str

使用 scrapegraph 将网页内容转换为 Markdown 格式。

参数

名称	类型	描述	默认值
`url`	`str`	要转换的目标网站 URL	必需
`api_key`	`str`	scrapegraph API 密钥	必需

返回

名称	类型	描述
`str`	`str`	网页内容的 Markdown 表示

源代码位于 llama-index-integrations/tools/llama-index-tools-scrapegraph/llama_index/tools/scrapegraph/base.py

def scrapegraph_markdownify(self, url: str, api_key: str) -> str:
    """
    Convert webpage content to markdown format using scrapegraph.

    Args:
        url (str): Target website URL to convert
        api_key (str): scrapegraph API key

    Returns:
        str: Markdown representation of the webpage content

    """
    client = Client(api_key=api_key)

    return client.markdownify(website_url=url)

scrapegraph_search #

scrapegraph_search(query: str, api_key: str) -> str

使用 scrapegraph 执行搜索查询。

参数

名称	类型	描述	默认值
`query`	`str`	要执行的搜索查询	必需
`api_key`	`str`	scrapegraph API 密钥	必需

返回

名称	类型	描述
`str`	`str`	scrapegraph 的搜索结果

源代码位于 llama-index-integrations/tools/llama-index-tools-scrapegraph/llama_index/tools/scrapegraph/base.py

def scrapegraph_search(self, query: str, api_key: str) -> str:
    """
    Perform a search query using scrapegraph.

    Args:
        query (str): Search query to execute
        api_key (str): scrapegraph API key

    Returns:
        str: Search results from scrapegraph

    """
    client = Client(api_key=api_key)
    return client.search(query=query)