多模态 RAG 护栏 Gemini LLMGuard LLMGuard

带 LLM-GUARD 提供的护栏的多模态 RAG 流水线¶

本指南介绍了一种鲁棒的多模态检索增强生成 (RAG) 流水线，该流水线集成了护栏 LLM GUARD，以实现安全、可靠和上下文准确的响应。该流水线处理文本、表格、图像和图表等多模态输入，同时使用护栏监控和验证输入与输出。详细信息请查看我的 README.md：https://github.com/vntuananhbui/MultimodalRAG-LlamaIndex-Guardrail/blob/main/README.md

注意：¶

该流水线通过免费 API 利用 Gemini 1.5 Flash 模型进行推理，使其易于访问且成本效益高，适合开发和实验。

扩展：¶

您也可以使用其他护栏框架，例如 Guardrail AI 等。

流水线概述¶

多模态 RAG 流水线旨在通过原生处理不同的文档布局和模态来克服传统基于文本的 RAG 系统的局限性。它利用文本和图像嵌入来检索和合成上下文感知的答案。

主要特性：¶

多模态输入处理:
- 直接处理文本、图像和复杂布局。
- 将文档内容转换为强大的嵌入用于检索。
护栏集成:
- 添加输入/输出扫描器以强制执行安全和质量。
- 动态验证查询和响应是否存在毒性或 Token 溢出等风险。
自定义查询引擎:
- 设计用于将护栏集成到查询处理中。
- 根据扫描结果动态阻止、清理或验证输入/输出。
成本效益高的实现:
- 通过免费 API 使用 Gemini 1.5 Flash，在保持高性能的同时最大限度地降低成本。

为什么向多模态 RAG 添加护栏？¶

尽管多模态 RAG 流水线功能强大，但它们容易受到不适当输入、幻觉输出或超出 Token 限制等风险的影响。护栏充当保护措施，确保

安全性：防止有害或冒犯性的查询和输出。
可靠性：验证响应的完整性。
可扩展性：使流水线能够动态处理复杂场景。

架构概述¶

1. 输入扫描器¶

输入扫描器在处理传入查询之前对其进行验证。例如

毒性扫描器：检测并阻止有害语言。
Token 限制扫描器：确保查询不超过处理限制。

2. 自定义查询引擎¶

查询引擎集成检索和合成功能，同时在多个阶段应用护栏

预处理：使用扫描器验证输入查询。
处理：使用多模态嵌入检索相关节点。
后处理：清理和验证输出。

3. 多模态 LLM¶

该流水线使用能够理解和生成上下文感知文本和基于图像输出的多模态 LLM（例如，Gemini 1.5 Flash）。其免费 API 访问使其适合开发，且不会产生显著成本。

护栏工作流¶

输入验证¶

使用预定义扫描器扫描传入查询。
根据扫描器结果阻止或清理查询。

检索¶

获取相关的文本和图像节点。
将内容转换为嵌入用于合成。

输出验证¶

使用输出扫描器分析生成的响应。
根据阈值（例如，毒性）阻止或清理输出。

带护栏的多模态 RAG 的优势¶

安全性提高：对查询和响应进行验证以降低风险。
鲁棒性增强：多模态输入在处理过程中不丢失上下文。
动态控制：护栏提供灵活性以处理不同的输入和输出。
成本效益高：选择性应用输入/输出验证优化资源，而免费的 Gemini 1.5 Flash API 降低了运营费用。

该流水线展示了如何通过增强护栏来构建一个原生多模态 RAG 系统，以便在复杂文档环境中提供安全、可靠和高质量的结果，同时通过使用免费 API 保持成本效益。

设置¶

In [ ]

已复制！

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio nest_asyncio.apply()

设置可观测性¶

加载数据¶

这里我们加载了康菲石油公司 2023 年投资者会议演示文稿。

In [ ]

已复制！

!mkdir data
!mkdir data_images
!wget "https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf" -O data/conocophillips.pdf
!mkdir data !mkdir data_images !wget "https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf" -O data/conocophillips.pdf

mkdir: data: File exists
mkdir: data_images: File exists
zsh:1: command not found: wget

安装依赖项¶

In [ ]

已复制！





!pip install llama-index
!pip install llama-parse
!pip install llama-index-llms-langchain
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-gemini
!pip install llama-index-multi-modal-llms-gemini
!pip install litellm
!pip install llm-guard
!pip install llama-index !pip install llama-parse !pip install llama-index-llms-langchain !pip install llama-index-embeddings-huggingface !pip install llama-index-llms-gemini !pip install llama-index-multi-modal-llms-gemini !pip install litellm !pip install llm-guard

模型设置¶

设置将用于下游编排的模型。

In [ ]

已复制！





from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
import os

LlamaCloud_API_KEY = ""
MultiGeminiKey = ""
GOOGLE_API_KEY = ""
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["GEMINI_API_KEY"] = GOOGLE_API_KEY
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
gemini_multimodal = GeminiMultiModal(
    model_name="models/gemini-1.5-flash", api_key=MultiGeminiKey
)
api_key = GOOGLE_API_KEY
llamaAPI_KEY = LlamaCloud_API_KEY
llm = Gemini(model="models/gemini-1.5-flash", api_key=api_key)
Settings.llm = llm
from llama_index.core import Settings from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.gemini import Gemini from llama_index.multi_modal_llms.gemini import GeminiMultiModal import os LlamaCloud_API_KEY = "" MultiGeminiKey = "" GOOGLE_API_KEY = "" os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY os.environ["GEMINI_API_KEY"] = GOOGLE_API_KEY embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") gemini_multimodal = GeminiMultiModal( model_name="models/gemini-1.5-flash", api_key=MultiGeminiKey ) api_key = GOOGLE_API_KEY llamaAPI_KEY = LlamaCloud_API_KEY llm = Gemini(model="models/gemini-1.5-flash", api_key=api_key) Settings.llm = llm

使用 LlamaParse 解析文本和图像¶

在此示例中，使用 LlamaParse 解析文档中的文本和图像。

我们通过两种方式解析文本

使用默认文本布局算法以常规 text 模式
使用 GPT-4o 以 markdown 模式（gpt4o_mode=True）。这也允许我们捕获页面截图

In [ ]

已复制！

from llama_parse import LlamaParse

parser_text = LlamaParse(result_type="text", api_key=llamaAPI_KEY)
parser_gpt4o = LlamaParse(
    result_type="markdown", gpt4o_mode=True, api_key=llamaAPI_KEY
)
from llama_parse import LlamaParse parser_text = LlamaParse(result_type="text", api_key=llamaAPI_KEY) parser_gpt4o = LlamaParse( result_type="markdown", gpt4o_mode=True, api_key=llamaAPI_KEY )

In [ ]

已复制！

print(f"Parsing text...")
docs_text = parser_text.load_data("data/conocophillips.pdf")
print(f"Parsing PDF file...")
md_json_objs = parser_gpt4o.get_json_result("data/conocophillips.pdf")
md_json_list = md_json_objs[0]["pages"]
print(f"Parsing text...") docs_text = parser_text.load_data("data/conocophillips.pdf") print(f"Parsing PDF file...") md_json_objs = parser_gpt4o.get_json_result("data/conocophillips.pdf") md_json_list = md_json_objs[0]["pages"]

Parsing text...
Started parsing the file under job_id e79a470b-e8d3-4f55-a048-d1b3d81b6d1e
Parsing PDF file...
Started parsing the file under job_id 84943607-b630-45bd-bf89-8470840e73b5

In [ ]

已复制！

print(md_json_list[10]["md"])
print(md_json_list[10]["md"])

# Commitment to Disciplined Reinvestment Rate

| Period       | Description                        | Reinvestment Rate | WTI Average |
|--------------|------------------------------------|-------------------|-------------|
| 2012-2016    | Industry Growth Focus              | >100%             | ~$75/BBL    |
| 2017-2022    | ConocoPhillips Strategy Reset      | <60%              | ~$63/BBL    |
| 2023E        |                                    |                   | at $80/BBL  |
| 2024-2028    | Disciplined Reinvestment Rate      | ~50%              | at $60/BBL  |
| 2029-2032    |                                    | ~6% CFO CAGR      | at $60/BBL  |

- **Historic Reinvestment Rate**: Shown in gray.
- **Reinvestment Rate at $60/BBL WTI**: Shown in blue.
- **Reinvestment Rate at $80/BBL WTI**: Shown with dashed lines.

**Note**: Reinvestment rate and cash from operations (CFO) are non-GAAP measures. Definitions and reconciliations are included in the Appendix.

In [ ]

已复制！

image_dicts = parser_gpt4o.get_images(
    md_json_objs, download_path="data_images"
)
image_dicts = parser_gpt4o.get_images( md_json_objs, download_path="data_images" )

构建多模态索引¶

在本节中，我们在解析后的文档集上构建多模态索引。

我们通过从文档中创建包含引用原始图像路径元数据的文本节点来实现这一点。

在此示例中，我们对文本节点进行索引以便进行检索。文本节点同时引用了解析后的文本和图像截图。

获取文本节点¶

In [ ]

已复制！

from llama_index.core.schema import TextNode
from typing import Optional
from llama_index.core.schema import TextNode from typing import Optional

In [ ]

已复制！





# get pages loaded through llamaparse
import re


def get_page_number(file_name):
    match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
    if match:
        return int(match.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """Get image files sorted by page."""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    sorted_files = sorted(raw_files, key=get_page_number)
    return sorted_files
# get pages loaded through llamaparse import re def get_page_number(file_name): match = re.search(r"-page-(\d+)\.jpg$", str(file_name)) if match: return int(match.group(1)) return 0 def _get_sorted_image_files(image_dir): """Get image files sorted by page.""" raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()] sorted_files = sorted(raw_files, key=get_page_number) return sorted_files

In [ ]

已复制！





# Assuming TextNode class is defined somewhere else in your code
# Attach image metadata to the text nodes
def get_text_nodes(docs, image_dir=None, json_dicts=None):
    """Split docs into nodes, by separator."""
    nodes = []

    # Get image files (if provided)
    image_files = (
        _get_sorted_image_files(image_dir) if image_dir is not None else None
    )

    # Get markdown texts (if provided)
    md_texts = (
        [d["md"] for d in json_dicts] if json_dicts is not None else None
    )

    # Split docs into chunks by separator
    doc_chunks = [c for d in docs for c in d.text.split("---")]

    # Handle both single-page and multi-page cases
    for idx, doc_chunk in enumerate(doc_chunks):
        chunk_metadata = {"page_num": idx + 1}

        # Check if there are image files and handle the single-page case
        if image_files is not None:
            # Use the first image file if there's only one
            image_file = (
                image_files[idx] if idx < len(image_files) else image_files[0]
            )
            chunk_metadata["image_path"] = str(image_file)

        # Check if there are markdown texts and handle the single-page case
        if md_texts is not None:
            # Use the first markdown text if there's only one
            parsed_text_md = (
                md_texts[idx] if idx < len(md_texts) else md_texts[0]
            )
            chunk_metadata["parsed_text_markdown"] = parsed_text_md

        # Add the chunk text as metadata
        chunk_metadata["parsed_text"] = doc_chunk

        # Create the TextNode with the parsed text and metadata
        node = TextNode(
            text="",
            metadata=chunk_metadata,
        )
        nodes.append(node)

    return nodes
# Assuming TextNode class is defined somewhere else in your code # Attach image metadata to the text nodes def get_text_nodes(docs, image_dir=None, json_dicts=None): """Split docs into nodes, by separator.""" nodes = [] # Get image files (if provided) image_files = ( _get_sorted_image_files(image_dir) if image_dir is not None else None ) # Get markdown texts (if provided) md_texts = ( [d["md"] for d in json_dicts] if json_dicts is not None else None ) # Split docs into chunks by separator doc_chunks = [c for d in docs for c in d.text.split("---")] # Handle both single-page and multi-page cases for idx, doc_chunk in enumerate(doc_chunks): chunk_metadata = {"page_num": idx + 1} # Check if there are image files and handle the single-page case if image_files is not None: # Use the first image file if there's only one image_file = ( image_files[idx] if idx < len(image_files) else image_files[0] ) chunk_metadata["image_path"] = str(image_file) # Check if there are markdown texts and handle the single-page case if md_texts is not None: # Use the first markdown text if there's only one parsed_text_md = ( md_texts[idx] if idx < len(md_texts) else md_texts[0] ) chunk_metadata["parsed_text_markdown"] = parsed_text_md # Add the chunk text as metadata chunk_metadata["parsed_text"] = doc_chunk # Create the TextNode with the parsed text and metadata node = TextNode( text="", metadata=chunk_metadata, ) nodes.append(node) return nodes

In [ ]

已复制！





from pathlib import Path

# this will split into pages
text_nodes = get_text_nodes(
    docs_text,
    image_dir="/Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images",
    json_dicts=md_json_list,
)
from pathlib import Path # this will split into pages text_nodes = get_text_nodes( docs_text, image_dir="/Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images", json_dicts=md_json_list, )

In [ ]

已复制！

print(text_nodes[0].get_content(metadata_mode="all"))
print(text_nodes[0].get_content(metadata_mode="all"))

page_num: 1
image_path: /Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images/84943607-b630-45bd-bf89-8470840e73b5-page_51.jpg
parsed_text_markdown: NO_CONTENT_HERE
parsed_text: ConocoPhillips
                2023 Analyst & Investor Meeting

构建索引¶

一旦文本节点准备就绪，我们将其馈送到我们的向量存储索引抽象中，该抽象将这些节点索引到简单的内存向量存储中（当然，您绝对应该查看我们的 40 多种向量存储集成！）

In [ ]

已复制！





import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage_nodes"):
    index = VectorStoreIndex(text_nodes, embed_model=embed_model)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage_nodes")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage_nodes")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

retriever = index.as_retriever()
import os from llama_index.core import ( StorageContext, VectorStoreIndex, load_index_from_storage, ) if not os.path.exists("storage_nodes"): index = VectorStoreIndex(text_nodes, embed_model=embed_model) # save index to disk index.set_index_id("vector_index") index.storage_context.persist("./storage_nodes") else: # rebuild storage context storage_context = StorageContext.from_defaults(persist_dir="storage_nodes") # load index index = load_index_from_storage(storage_context, index_id="vector_index") retriever = index.as_retriever()

构建护栏¶

定义护栏的全局输出格式

In [ ]

已复制！





def result_response(
    guardrail_type,
    activated,
    guard_output,
    is_valid,
    risk_score,
    threshold,
    response_text,
):
    """
    Standardizes the result format for all guardrail checks.
    """
    return {
        "guardrail_type": guardrail_type,
        "activated": activated,
        "guardrail_detail": {
            "guard_output": guard_output,
            "is_valid": is_valid,
            "risk_score/threshold": f"{risk_score}/{threshold}",
            "response_text": response_text,
        },
    }
def result_response( guardrail_type, activated, guard_output, is_valid, risk_score, threshold, response_text, ): """ 标准化所有护栏检查的结果格式。 """ return { "guardrail_type": guardrail_type, "activated": activated, "guardrail_detail": { "guard_output": guard_output, "is_valid": is_valid, "risk_score/threshold": f"{risk_score}/{threshold}", "response_text": response_text, }, }

添加护栏扫描器。¶

您可以访问 "https://llm-guard.com" 查看所需的扫描器。下面是您应遵循此格式函数的 2 个示例扫描器。这里我们使用 Gemini 1.5 Flash 作为扫描器的响应。因此，您可以选择您自己的 LLM，例如 ChatGPT 等。

In [ ]

已复制！





from llm_guard.input_scanners import Toxicity
from llm_guard.input_scanners.toxicity import MatchType
from litellm import completion


def guardrail_toxicLanguage(prompt):
    # Interact with the LLM to generate a response
    print(f"Prompt: {prompt}")

    # Generate the response using the LLM (Gemini-1.5-flash)
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    # Define the threshold and scan for toxicity
    threshold = 0.5
    toxic_scanner = Toxicity(threshold=threshold, match_type=MatchType.FULL)
    sanitized_output, is_valid, risk_score = toxic_scanner.scan(prompt)

    return result_response(
        guardrail_type="Toxicity",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )
from llm_guard.input_scanners import Toxicity from llm_guard.input_scanners.toxicity import MatchType from litellm import completion def guardrail_toxicLanguage(prompt): # Interact with the LLM to generate a response print(f"Prompt: {prompt}") # Generate the response using the LLM (Gemini-1.5-flash) response = completion( model="gemini/gemini-1.5-flash", messages=[ {"role": "system", "content": "您是一个有用的助手。"}, {"role": "user", "content": prompt}, ], ) response_text = response.choices[0].message.content # Define the threshold and scan for toxicity threshold = 0.5 toxic_scanner = Toxicity(threshold=threshold, match_type=MatchType.FULL) sanitized_output, is_valid, risk_score = toxic_scanner.scan(prompt) return result_response( guardrail_type="Toxicity", activated=not is_valid, guard_output=sanitized_output, is_valid=is_valid, risk_score=risk_score, threshold=threshold, response_text=response_text, )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

In [ ]

已复制！





from llm_guard.input_scanners import TokenLimit
from llm_guard import scan_output
from litellm import completion


def guardrail_tokenlimit(prompt):
    threshold = 400
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    scanner = TokenLimit(limit=threshold, encoding_name="cl100k_base")
    sanitized_output, is_valid, risk_score = scanner.scan(prompt)

    # Use the global rail to format the result
    result = result_response(
        guardrail_type="Token limit",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )

    return result
from llm_guard.input_scanners import TokenLimit from llm_guard import scan_output from litellm import completion def guardrail_tokenlimit(prompt): threshold = 400 response = completion( model="gemini/gemini-1.5-flash", messages=[ {"role": "system", "content": "您是一个有用的助手。"}, {"role": "user", "content": prompt}, ], ) response_text = response.choices[0].message.content scanner = TokenLimit(limit=threshold, encoding_name="cl100k_base") sanitized_output, is_valid, risk_score = scanner.scan(prompt) # Use the global rail to format the result result = result_response( guardrail_type="Token limit", activated=not is_valid, guard_output=sanitized_output, is_valid=is_valid, risk_score=risk_score, threshold=threshold, response_text=response_text, ) return result

`InputScanner` - `OutputScanner` 函数¶

InputScanner 函数在一给定输入查询上运行一系列扫描器，并评估它们中是否有任何一个检测到威胁。它返回一个布尔值，指示是否检测到威胁，以及从返回阳性检测的扫描器中获得的结果列表。

参数：¶

query (str): 要扫描潜在威胁的输入。
listOfScanners (list): 扫描函数列表。每个扫描函数应接受查询作为输入，并返回一个字典，其中包含一个键 "activated"（布尔值），指示是否检测到威胁。

返回值：¶

detected (bool): 如果任何扫描器检测到威胁，则为 True，否则为 False。
triggered_scanners (list): 检测到威胁的扫描器返回的字典列表。

主要步骤：¶

将 detected 初始化为 False 以跟踪是否有任何扫描器发现威胁。
创建一个空列表 triggered_scanners 来存储检测到威胁的扫描器的结果。
遍历 listOfScanners 中的每个扫描器。
- 在 query 上运行扫描器。
- 检查扫描器的结果是否包含 "activated": True。
- 如果检测到威胁
  - 将 detected 设置为 True。
  - 将扫描器的结果追加到 triggered_scanners 中。
返回 detected 状态和 triggered_scanners 列表。

In [ ]

已复制！





def InputScanner(query, listOfScanners):
    """
    Runs all scanners on the query and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the query
    for scanner in listOfScanners:
        result = scanner(query)

        if result[
            "activated"
        ]:  # Check if the scanner found a threat (activated=True)
            detected = True  # Set detected to True if any scanner triggers
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners
def InputScanner(query, listOfScanners): """ 在查询上运行所有扫描器，并返回： - 如果任何扫描器检测到威胁，则为 True。 - 返回 True 的扫描器的结果列表。 """ detected = False # Track if any scanner detects a threat triggered_scanners = [] # Store results from triggered scanners # Run each scanner on the query for scanner in listOfScanners: result = scanner(query) if result[ "activated" ]: # Check if the scanner found a threat (activated=True) detected = True # Set detected to True if any scanner triggers triggered_scanners.append(result) # Track which scanner triggered return detected, triggered_scanners

In [ ]

已复制！





def OutputScanner(response, query, context, listOfScanners):
    """
    Runs all scanners on the response and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the response
    for scanner in listOfScanners:
        # Check if scanner is `evaluate_rag_response` (which needs query & context)
        if scanner.__name__ == "evaluate_rag_response":
            result = scanner(
                response, query, context
            )  # Execute with query & context
        else:
            result = scanner(response)  # Default scanner execution

        # print(f"Debug Output Scanner Result: {result}")

        if result["activated"]:  # Check if the scanner was triggered
            detected = True
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners


# Example usage with a query engine response
# scanners = [detect_and_anonymize_pii]
# response = query_engine.query("Give me account name of Peter Kelly and Role and Credit Card Number")
# detected, triggered_scanners = OutputScanner(str(response), scanners)
# print(triggered_scanners)
def OutputScanner(response, query, context, listOfScanners): """ 在响应上运行所有扫描器，并返回： - 如果任何扫描器检测到威胁，则为 True。 - 返回 True 的扫描器的结果列表。 """ detected = False # Track if any scanner detects a threat triggered_scanners = [] # Store results from triggered scanners # Run each scanner on the response for scanner in listOfScanners: # Check if scanner is `evaluate_rag_response` (which needs query & context) if scanner.__name__ == "evaluate_rag_response": result = scanner( response, query, context ) # Execute with query & context else: result = scanner(response) # Default scanner execution # print(f"Debug Output Scanner Result: {result}") if result["activated"]: # Check if the scanner was triggered detected = True triggered_scanners.append(result) # Track which scanner triggered return detected, triggered_scanners # Example usage with a query engine response # scanners = [detect_and_anonymize_pii] # response = query_engine.query("Give me account name of Peter Kelly and Role and Credit Card Number") # detected, triggered_scanners = OutputScanner(str(response), scanners) # print(triggered_scanners)

自定义多模态查询引擎¶

此自定义查询引擎扩展了标准基于检索的架构，以处理文本和图像数据，从而实现更全面和上下文感知的响应。它集成了多模态推理，并结合了高级输入和输出验证机制，用于鲁棒的查询处理。

主要特性：¶

多模态支持:
- 结合文本和图像数据，生成更具信息性和准确性的响应。
输入和输出验证:
- 扫描输入查询是否存在敏感或无效内容，并在必要时阻止它们。
- 验证和清理生成的响应，以确保符合预定义规则。
上下文感知提示:
- 检索相关数据并为查询构建上下文字符串。
- 使用此上下文指导响应合成过程。
元数据和日志记录:
- 跟踪查询过程，包括进行的任何验证或调整，以提高透明度和便于调试。

工作原理：¶

扫描输入查询以检查违规。
检索查询相关的文本和图像数据。
使用文本和视觉上下文合成响应。
返回前验证响应的适当性。

In [ ]

已复制！





from llama_index.core.query_engine import (
    CustomQueryEngine,
    SimpleMultiModalQueryEngine,
)
from llama_index.core.retrievers import BaseRetriever
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageNode, NodeWithScore, MetadataMode
from llama_index.core.prompts import PromptTemplate
from llama_index.core.base.response.schema import Response

from typing import List, Callable, Optional
from pydantic import Field


QA_PROMPT_TMPL = """\
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query if it is related to the context. 
If the query is not related to the context, respond with:
"I'm sorry, but I can't help with that."

Query: {query_str}
Answer: """

QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)


class MultimodalQueryEngine(CustomQueryEngine):
    """Custom multimodal Query Engine.

    Takes in a retriever to retrieve a set of document nodes.
    Also takes in a prompt template and multimodal model.

    """

    qa_prompt: PromptTemplate
    retriever: BaseRetriever
    multi_modal_llm: GeminiMultiModal
    input_scanners: List[Callable[[str], dict]] = Field(default_factory=list)
    output_scanners: List[Callable[[str], dict]] = Field(default_factory=list)

    def __init__(
        self, qa_prompt: Optional[PromptTemplate] = None, **kwargs
    ) -> None:
        """Initialize."""
        super().__init__(qa_prompt=qa_prompt or QA_PROMPT, **kwargs)

    def custom_query(self, query_str: str):
        query_metadata = {
            "input_scanners": [],
            "output_scanners": [],
            "retrieved_nodes": [],
            "response_status": "success",
        }

        input_detected, input_triggered = InputScanner(
            query_str, self.input_scanners
        )
        if input_triggered:
            # print("Triggered Input Scanners:", input_triggered)
            # Log triggered input scanners in metadata
            query_metadata["input_scanners"] = input_triggered
            # If input contains sensitive information, block the query
            if input_detected:
                return Response(
                    response="I'm sorry, but I can't help with that.",
                    source_nodes=[],
                    metadata={
                        "guardrail": "Input Scanner",
                        "triggered_scanners": input_triggered,
                        "response_status": "blocked",
                    },
                )

        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        # create ImageNode items from text nodes
        image_nodes = [
            NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
            for n in nodes
        ]

        # create context string from text nodes, dump into the prompt
        context_str = "\n\n".join(
            [r.get_content(metadata_mode=MetadataMode.LLM) for r in nodes]
        )
        fmt_prompt = self.qa_prompt.format(
            context_str=context_str, query_str=query_str
        )

        # synthesize an answer from formatted text and images
        llm_response = self.multi_modal_llm.complete(
            prompt=fmt_prompt,
            image_documents=[image_node.node for image_node in image_nodes],
        )

        # Step 5: Run Output Scanners
        output_detected, output_triggered = OutputScanner(
            str(llm_response),
            str(query_str),
            str(context_str),
            self.output_scanners,
        )
        if output_triggered:
            # print("Triggered Output Scanners:", output_triggered)
            query_metadata[
                "output_scanners"
            ] = output_triggered  # Store output scanner info

        final_response = str(llm_response)
        if output_detected:
            final_response = "I'm sorry, but I can't help with that."
            query_metadata["response_status"] = "sanitized"
        # Return the response with detailed metadata
        return Response(
            response=final_response,
            source_nodes=nodes,
            metadata=query_metadata,
        )
from llama_index.core.query_engine import ( CustomQueryEngine, SimpleMultiModalQueryEngine, ) from llama_index.core.retrievers import BaseRetriever from llama_index.multi_modal_llms.openai import OpenAIMultiModal from llama_index.core.schema import ImageNode, NodeWithScore, MetadataMode from llama_index.core.prompts import PromptTemplate from llama_index.core.base.response.schema import Response from typing import List, Callable, Optional from pydantic import Field QA_PROMPT_TMPL = """\ --------------------- {context_str} --------------------- 根据上下文信息，而不是先验知识，如果查询与上下文相关，则回答查询。如果查询与上下文不相关，则回复：“抱歉，我无法提供帮助。” 查询：{query_str} 答案： """ QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL) class MultimodalQueryEngine(CustomQueryEngine): """自定义多模态查询引擎。接受一个检索器来检索一组文档节点。还接受一个提示模板和多模态模型。 """ qa_prompt: Optional[PromptTemplate] = None retriever: BaseRetriever multi_modal_llm: GeminiMultiModal input_scanners: List[Callable[[str], dict]] = Field(default_factory=list) output_scanners: List[Callable[[str], dict]] = Field(default_factory=list) def __init__( self, qa_prompt: Optional[PromptTemplate] = None, **kwargs ) -> None: """Initialize.""" super().__init__(qa_prompt=qa_prompt or QA_PROMPT, **kwargs) def custom_query(self, query_str: str): query_metadata = { "input_scanners": [], "output_scanners": [], "retrieved_nodes": [], "response_status": "success", } input_detected, input_triggered = InputScanner( query_str, self.input_scanners ) if input_triggered: # print("Triggered Input Scanners:", input_triggered) # Log triggered input scanners in metadata query_metadata["input_scanners"] = input_triggered # If input contains sensitive information, block the query if input_detected: return Response( response="抱歉，我无法提供帮助。", source_nodes=[], metadata={ "guardrail": "Input Scanner", "triggered_scanners": input_triggered, "response_status": "blocked", }, ) # retrieve text nodes nodes = self.retriever.retrieve(query_str) # create ImageNode items from text nodes image_nodes = [ NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"])) for n in nodes ] # create context string from text nodes, dump into the prompt context_str = "\n\n".join( [r.get_content(metadata_mode=MetadataMode.LLM) for r in nodes] ) fmt_prompt = self.qa_prompt.format( context_str=context_str, query_str=query_str ) # synthesize an answer from formatted text and images llm_response = self.multi_modal_llm.complete( prompt=fmt_prompt, image_documents=[image_node.node for image_node in image_nodes], ) # Step 5: Run Output Scanners output_detected, output_triggered = OutputScanner( str(llm_response), str(query_str), str(context_str), self.output_scanners, ) if output_triggered: # print("Triggered Output Scanners:", output_triggered) query_metadata[ "output_scanners" ] = output_triggered # Store output scanner info final_response = str(llm_response) if output_detected: final_response = "抱歉，我无法提供帮助。" query_metadata["response_status"] = "sanitized" # Return the response with detailed metadata return Response( response=final_response, source_nodes=nodes, metadata=query_metadata, )

输入和输出扫描器配置¶

您可以放置您需要用来保护 RAG 的扫描器。

In [ ]

已复制！

input_scanners = [guardrail_toxicLanguage, guardrail_tokenlimit]
output_scanners = [guardrail_toxicLanguage]
input_scanners = [guardrail_toxicLanguage, guardrail_tokenlimit] output_scanners = [guardrail_toxicLanguage]

In [ ]

已复制！





query_engine = MultimodalQueryEngine(
    retriever=index.as_retriever(similarity_top_k=9),
    multi_modal_llm=gemini_multimodal,
    input_scanners=input_scanners,
    output_scanners=output_scanners,
)
query_engine = MultimodalQueryEngine( retriever=index.as_retriever(similarity_top_k=9), multi_modal_llm=gemini_multimodal, input_scanners=input_scanners, output_scanners=output_scanners, )

尝试查询¶

让我们尝试查询。

In [ ]

已复制！

query = "Tell me about the diverse geographies where Conoco Phillips has a production base"
response = query_engine.query(query)
query = "Tell me about the diverse geographies where Conoco Phillips has a production base" response = query_engine.query(query)

Prompt: Tell me about the diverse geographies where Conoco Phillips has a production base
2024-12-03 17:43:08 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:09 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.00041448281263001263}, {'label': 'male', 'score': 0.00018738119979389012}, {'label': 'insult', 'score': 0.00011956175148952752}, {'label': 'female', 'score': 0.00011725842341547832}, {'label': 'psychiatric_or_mental_illness', 'score': 8.512590284226462e-05}, {'label': 'white', 'score': 7.451862620655447e-05}, {'label': 'christian', 'score': 5.6545581173850223e-05}, {'label': 'muslim', 'score': 5.644273551297374e-05}, {'label': 'black', 'score': 3.8606172893196344e-05}, {'label': 'obscene', 'score': 3.222753730369732e-05}, {'label': 'identity_attack', 'score': 3.1757666874909773e-05}, {'label': 'threat', 'score': 2.8462023692554794e-05}, {'label': 'jewish', 'score': 2.7872381906490773e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 2.5694836949696764e-05}, {'label': 'sexual_explicit', 'score': 1.859129588410724e-05}, {'label': 'severe_toxicity', 'score': 1.0931341876130318e-06}]]
2024-12-03 17:43:15 [debug    ] Prompt fits the maximum tokens num_tokens=15 threshold=400
Prompt: ConocoPhillips has a diverse production base across several geographic locations.  These include:

* **Alaska:**  The company has a significant presence in Alaska's conventional basins, including the Prudhoe Bay area, with a long history of production and existing infrastructure.  The Willow project is also located in Alaska.
* **Lower 48 (United States):**  ConocoPhillips operates extensively in the Lower 48 states, focusing on unconventional plays in the Permian Basin (Delaware and Midland Basins), Eagle Ford, and Bakken.
* **International:** The company has operations in other international locations, including Qatar (LNG), and previously had operations in the UK, Australia, Indonesia, and Canada (though some of these have been divested).  They also have a global marketing presence with offices in London, Singapore, Houston, Calgary, Beijing, and Tokyo.
2024-12-03 17:43:36 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:37 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.0003606641257647425}, {'label': 'male', 'score': 0.000291528704110533}, {'label': 'insult', 'score': 0.00011418585199862719}, {'label': 'psychiatric_or_mental_illness', 'score': 0.00011314846051391214}, {'label': 'female', 'score': 0.00010114537144545466}, {'label': 'white', 'score': 9.688278078101575e-05}, {'label': 'muslim', 'score': 6.954199488973245e-05}, {'label': 'christian', 'score': 5.551999493036419e-05}, {'label': 'black', 'score': 4.1746119677554816e-05}, {'label': 'identity_attack', 'score': 3.3705578971421346e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 3.157216633553617e-05}, {'label': 'obscene', 'score': 2.798157584038563e-05}, {'label': 'jewish', 'score': 2.618367398099508e-05}, {'label': 'threat', 'score': 2.1199964976403862e-05}, {'label': 'sexual_explicit', 'score': 1.9145050828228705e-05}, {'label': 'severe_toxicity', 'score': 1.1292050885458593e-06}]]

In [ ]

已复制！

print(str(response))
print(str(response))

ConocoPhillips has a diverse production base across several geographic locations.  These include:

* **Alaska:**  The company has a significant presence in Alaska's conventional basins, including the Prudhoe Bay area, with a long history of production and existing infrastructure.  The Willow project is also located in Alaska.
* **Lower 48 (United States):**  ConocoPhillips operates extensively in the Lower 48 states, focusing on unconventional plays in the Permian Basin (Delaware and Midland Basins), Eagle Ford, and Bakken.
* **International:** The company has operations in other international locations, including Qatar (LNG), and previously had operations in the UK, Australia, Indonesia, and Canada (though some of these have been divested).  They also have a global marketing presence with offices in London, Singapore, Houston, Calgary, Beijing, and Tokyo.

In [ ]

已复制！

print(str(response.metadata))
print(str(response.metadata))

{'input_scanners': [], 'output_scanners': [], 'retrieved_nodes': [], 'response_status': 'success'}

In [ ]

已复制！

query = """
    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.

While it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.

Creative Writing
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.

Tackle Writers' Block
A random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.

Beginning Writing Routine
Another productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing projects, words are already flowing from their fingers.

Writing Challenge
Another writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.

Programmers
It's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.

Above are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.

If you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.

Frequently Asked Questions

Can I use these random paragraphs for my project?

Yes! All of the random paragraphs in our generator are free to use for your projects.

Does a computer generate these paragraphs?

No! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all. We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.

Can I contribute random paragraphs?

Yes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.

How many words are there in a paragraph?

There are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.
    """
response = query_engine.query(query)
query = """ 如果您正在寻找随机段落，那么您来对地方了。当一个随机词或一个随机句子不够用时，下一步合乎逻辑的选择就是寻找一个随机段落。我们创建随机段落生成器时考虑到了您的需求。过程非常简单。选择您想看到的随机段落数量，然后点击按钮。您选择的段落数量将立即显示。尽管对所有人来说可能不那么明显，但创建随机段落有很多有用的理由。以下段落列出了一些人使用此生成器的几个示例。创意写作生成随机段落是作家们在一天开始时激发创意的好方法。作家在看到随机段落时，不知道它会是关于什么主题的。这迫使作家运用创造力来完成三个常见的写作挑战之一。作家可以将该段落用作短篇故事的开头，并在此基础上进行创作。第二个选择是将随机段落用在他们创作的短篇故事的某个地方。第三个选择是将随机段落用作短篇故事的结尾。无论选择哪种挑战，作家都必须运用创造力将该段落融入他们的写作中。克服写作障碍随机段落也是作家克服写作障碍的好方法。写作障碍通常是因为作家在试图完成当前项目时遇到困难而发生。通过插入一个完全随机的段落作为起点，可以消除一些最初可能导致写作障碍的问题。开始写作日常使用此工具开始每日写作日常的另一种有效方式。一种方法是生成一个随机段落，并尝试在保持原意的情况下重写它。这里的目的是仅仅开始写作，这样当作家开始他们当天的写作项目时，文字就已经从指尖流淌而出。写作挑战另一个写作挑战是将随机段落中的独立句子取出，并将其中一个句子融入新段落中以创作一个短篇故事。与随机句子生成器不同，随机段落中的句子之间会存在一些联系，因此会有点不同。您也不会确切知道随机段落中会出现多少个句子。程序员不仅作家可以从这个免费在线工具中受益。如果您是一名正在处理需要文本块的项目，这个工具可以很好地提供这些内容。这是测试您的编程以及所创建工具是否正常工作的好方法。以上是随机段落生成器有哪些益处的几个示例。确定此随机段落挑选器是否对您的预期目的有用，最好的方法就是试一试。生成一些段落，看看它们是否对您当前的项目有益。如果您发现这个段落工具很有用，请帮个忙，告诉我们您是如何使用它的。了解该工具的不同使用方式对我们非常有益，这样我们可以通过更新来改进它。尤其是在我们创建的生成器有时会以我们最初创建时完全没有预料到的方式被使用时，这一点尤为重要。如果您有时间，请快速给我们留言，告诉我们您希望将来改变或添加什么内容来使其变得更好。常见问题我可以将这些随机段落用于我的项目吗？是的！我们生成器中的所有随机段落均可免费用于您的项目。这些段落是由计算机生成的吗？不是！生成器中的所有段落都是由人类编写的，而不是计算机。在最初构建此生成器时，我们曾考虑使用计算机来生成段落，但它们的效果并不好，很多时候根本没有任何意义。因此，我们花时间专门为此生成器创建了段落，以使其达到最佳效果。我可以贡献随机段落吗？是的。我们一直致力于改进此生成器，最好的方法之一就是向生成器添加新的、有趣的段落。如果您想贡献一些随机段落，请联系我们。一个段落有多少字？一个段落通常有大约200个字，但这可能差异很大。大多数段落都围绕一个主要思想展开，以一句引言句表达，然后是两句或多句支持该思想的句子。短段落可能甚至不到50个字，而长段落可以超过400个字，但总的来说，它们通常长度约为200个字。 """ response = query_engine.query(query)

Prompt: 
    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.

While it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.

Creative Writing
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.

Tackle Writers' Block
A random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.

Beginning Writing Routine
Another productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing projects, words are already flowing from their fingers.

Writing Challenge
Another writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.

Programmers
It's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.

Above are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.

If you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.

Frequently Asked Questions

Can I use these random paragraphs for my project?

Yes! All of the random paragraphs in our generator are free to use for your projects.

Does a computer generate these paragraphs?

No! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all. We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.

Can I contribute random paragraphs?

Yes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.

How many words are there in a paragraph?

There are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.
    
2024-12-03 17:43:42 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:42 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.0011976422974839807}, {'label': 'insult', 'score': 0.00045695927110500634}, {'label': 'male', 'score': 0.00018701529188547283}, {'label': 'psychiatric_or_mental_illness', 'score': 0.00014795312017668039}, {'label': 'white', 'score': 9.39662495511584e-05}, {'label': 'female', 'score': 7.459904009010643e-05}, {'label': 'obscene', 'score': 6.114380084909499e-05}, {'label': 'threat', 'score': 5.259696990833618e-05}, {'label': 'muslim', 'score': 4.745226033264771e-05}, {'label': 'identity_attack', 'score': 3.541662226780318e-05}, {'label': 'black', 'score': 3.5083121474599466e-05}, {'label': 'christian', 'score': 3.272023604949936e-05}, {'label': 'sexual_explicit', 'score': 3.164245936204679e-05}, {'label': 'jewish', 'score': 1.5377421732409857e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 1.5361225450760685e-05}, {'label': 'severe_toxicity', 'score': 1.3027844261159771e-06}]]
2024-12-03 17:43:46 [warning  ] Prompt is too big. Splitting into chunks chunks=["\n    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.\n\nWhile it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.\n\nCreative Writing\nGenerating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.\n\nTackle Writers' Block\nA random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.\n\nBeginning Writing Routine\nAnother productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing", " projects, words are already flowing from their fingers.\n\nWriting Challenge\nAnother writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.\n\nProgrammers\nIt's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.\n\nAbove are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.\n\nIf you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.\n\nFrequently Asked Questions\n\nCan I use these random paragraphs for my project?\n\nYes! All of the random paragraphs in our generator are free to use for your projects.\n\nDoes a computer generate these paragraphs?\n\nNo! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all", ". We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.\n\nCan I contribute random paragraphs?\n\nYes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.\n\nHow many words are there in a paragraph?\n\nThere are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.\n    "] num_tokens=961

In [ ]

已复制！

print(str(response))
print(str(response))

I'm sorry, but I can't help with that.

In [ ]

已复制！

print(str(response.metadata))
print(str(response.metadata))

{'guardrail': 'Input Scanner', 'triggered_scanners': [{'guardrail_type': 'Token limit', 'activated': True, 'guardrail_detail': {'guard_output': "\n    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.\n\nWhile it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.\n\nCreative Writing\nGenerating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.\n\nTackle Writers' Block\nA random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.\n\nBeginning Writing Routine\nAnother productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing", 'is_valid': False, 'risk_score/threshold': '1.0/400', 'response_text': "This text describes a random paragraph generator and its various uses.  Here's a summary broken down by section:\n\n**Introduction:** The text introduces a random paragraph generator, highlighting its simplicity and ease of use.\n\n**Uses of the Generator:**  The core of the text details how the generator can be beneficial in several contexts:\n\n* **Creative Writing:**  It aids writers in overcoming writer's block, sparking creativity, and providing starting points or endings for short stories.  Three specific challenges are suggested: using the paragraph as the beginning, middle, or end of a story.\n\n* **Tackling Writer's Block:** The random paragraph acts as a disruption to overcome creative stagnation.\n\n* **Beginning a Writing Routine:** It helps initiate the writing process by providing a text to rewrite or use as inspiration.\n\n* **Writing Challenges:** The paragraph's sentences can be individually incorporated into new writing projects.\n\n* **Programmers:** The generator provides useful blocks of text for testing purposes in software development.\n\n\n**Call to Action & Feedback:** The authors encourage users to try the generator, provide feedback, and contribute to its improvement by suggesting additions or changes.\n\n**Frequently Asked Questions (FAQ):**  The FAQ section addresses common questions about the generator, clarifying:\n\n* **Usage rights:**  The paragraphs are free to use.\n* **Paragraph generation:**  Human-written paragraphs are used, not computer-generated ones.\n* **Contribution:** Users can contribute their own paragraphs.\n* **Paragraph length:** Paragraphs are roughly 200 words but can vary significantly.\n\n\nIn essence, the text is a well-structured promotional piece for a random paragraph generator, emphasizing its versatility and usefulness for both writers and programmers, while encouraging user engagement and participation.\n"}}], 'response_status': 'blocked'}

多模态 RAG 护栏 Gemini LLMGuard LLMGuard

带 LLM-GUARD 提供的护栏的多模态 RAG 流水线¶

注意：¶

扩展：¶

流水线概述¶

主要特性：¶

为什么向多模态 RAG 添加护栏？¶

架构概述¶

1. 输入扫描器¶

2. 自定义查询引擎¶

3. 多模态 LLM¶

护栏工作流¶

输入验证¶

检索¶

输出验证¶

带护栏的多模态 RAG 的优势¶

设置¶

设置可观测性¶

加载数据¶

安装依赖项¶

模型设置¶

使用 LlamaParse 解析文本和图像¶

构建多模态索引¶

获取文本节点¶

构建索引¶

构建护栏¶

添加护栏扫描器。¶

InputScanner - OutputScanner 函数¶

参数：¶

返回值：¶

主要步骤：¶

自定义多模态查询引擎¶

主要特性：¶

工作原理：¶

输入和输出扫描器配置¶

尝试查询¶

`InputScanner` - `OutputScanner` 函数¶