JSON 查询引擎¶

JSON 查询引擎适用于查询符合 JSON schema 的 JSON 文档。

然后，此 JSON schema 在提示的上下文中使用，将自然语言查询转换为结构化的 JSON Path 查询。然后使用此 JSON Path 查询来检索数据，以回答给定的问题。

如果您在 colab 上打开此 Notebook，您可能需要安装 LlamaIndex 🦙。

In [ ]

已复制！

%pip install llama-index-llms-openai
%pip install llama-index-llms-openai

In [ ]

已复制！

!pip install llama-index
!pip install llama-index

In [ ]

已复制！

# First, install the jsonpath-ng package which is used by default to parse & execute the JSONPath queries.
!pip install jsonpath-ng
# 首先，安装 jsonpath-ng 包，该包默认用于解析和执行 JSONPath 查询。 !pip install jsonpath-ng

Requirement already satisfied: jsonpath-ng in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (1.5.3)
Requirement already satisfied: ply in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (3.11)
Requirement already satisfied: six in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (1.16.0)
Requirement already satisfied: decorator in /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages (from jsonpath-ng) (5.1.1)
WARNING: You are using pip version 21.2.4; however, version 23.2.1 is available.
You should consider upgrading via the '/Users/loganmarkewich/llama_index/llama-index/bin/python3 -m pip install --upgrade pip' command.

In [ ]

已复制！

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [ ]

已复制！

import os
import openai

os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"
import os import openai os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"

In [ ]

已复制！

from IPython.display import Markdown, display
from IPython.display import Markdown, display

让我们从一个玩具 JSON 开始¶

一个非常简单的 JSON 对象，包含来自博客网站的用户评论数据。

我们还将提供一个 JSON schema（通过给 ChatGPT 提供 JSON 示例生成）。

建议¶

请确保为 JSON schema 中的每个字段提供了有用的 "description" 值。

如您在给定的示例中所见，"username" 字段的描述提到了用户名是小写的。您会发现这有助于 LLM 生成正确的 JSON path 查询。

In [ ]

已复制！





# Test on some sample data
json_value = {
    "blogPosts": [
        {
            "id": 1,
            "title": "First blog post",
            "content": "This is my first blog post",
        },
        {
            "id": 2,
            "title": "Second blog post",
            "content": "This is my second blog post",
        },
    ],
    "comments": [
        {
            "id": 1,
            "content": "Nice post!",
            "username": "jerry",
            "blogPostId": 1,
        },
        {
            "id": 2,
            "content": "Interesting thoughts",
            "username": "simon",
            "blogPostId": 2,
        },
        {
            "id": 3,
            "content": "Loved reading this!",
            "username": "simon",
            "blogPostId": 2,
        },
    ],
}

# JSON Schema object that the above JSON value conforms to
json_schema = {
    "$schema": "https://json-schema.fullstack.org.cn/draft-07/schema#",
    "description": "Schema for a very simple blog post app",
    "type": "object",
    "properties": {
        "blogPosts": {
            "description": "List of blog posts",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "description": "Unique identifier for the blog post",
                        "type": "integer",
                    },
                    "title": {
                        "description": "Title of the blog post",
                        "type": "string",
                    },
                    "content": {
                        "description": "Content of the blog post",
                        "type": "string",
                    },
                },
                "required": ["id", "title", "content"],
            },
        },
        "comments": {
            "description": "List of comments on blog posts",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "description": "Unique identifier for the comment",
                        "type": "integer",
                    },
                    "content": {
                        "description": "Content of the comment",
                        "type": "string",
                    },
                    "username": {
                        "description": (
                            "Username of the commenter (lowercased)"
                        ),
                        "type": "string",
                    },
                    "blogPostId": {
                        "description": (
                            "Identifier for the blog post to which the comment"
                            " belongs"
                        ),
                        "type": "integer",
                    },
                },
                "required": ["id", "content", "username", "blogPostId"],
            },
        },
    },
    "required": ["blogPosts", "comments"],
}
# 测试一些示例数据 json_value = { "blogPosts": [ { "id": 1, "title": "First blog post", "content": "This is my first blog post", }, { "id": 2, "title": "Second blog post", "content": "This is my second blog post", }, ], "comments": [ { "id": 1, "content": "Nice post!", "username": "jerry", "blogPostId": 1, }, { "id": 2, "content": "Interesting thoughts", "username": "simon", "blogPostId": 2, }, { "id": 3, "content": "Loved reading this!", "username": "simon", "blogPostId": 2, }, ], } # 上述 JSON 值所符合的 JSON Schema 对象 json_schema = { "$schema": "https://json-schema.fullstack.org.cn/draft-07/schema#", "description": "Schema for a very simple blog post app", "type": "object", "properties": { "blogPosts": { "description": "List of blog posts", "type": "array", "items": { "type": "object", "properties": { "id": { "description": "Unique identifier for the blog post", "type": "integer", }, "title": { "description": "Title of the blog post", "type": "string", }, "content": { "description": "Content of the blog post", "type": "string", }, }, "required": ["id", "title", "content"], }, }, "comments": { "description": "List of comments on blog posts", "type": "array", "items": { "type": "object", "properties": { "id": { "description": "Unique identifier for the comment", "type": "integer", }, "content": { "description": "Content of the comment", "type": "string", }, "username": { "description": ( "评论者的用户名（已小写）" ), "type": "string", }, "blogPostId": { "description": ( "评论所属博客文章的标识符" ), "type": "integer", }, }, "required": ["id", "content", "username", "blogPostId"], }, }, }, "required": ["blogPosts", "comments"], }

In [ ]

已复制！





from llama_index.llms.openai import OpenAI
from llama_index.core.indices.struct_store import JSONQueryEngine

llm = OpenAI(model="gpt-4")

nl_query_engine = JSONQueryEngine(
    json_value=json_value,
    json_schema=json_schema,
    llm=llm,
)
raw_query_engine = JSONQueryEngine(
    json_value=json_value,
    json_schema=json_schema,
    llm=llm,
    synthesize_response=False,
)
from llama_index.llms.openai import OpenAI from llama_index.core.indices.struct_store import JSONQueryEngine llm = OpenAI(model="gpt-4") nl_query_engine = JSONQueryEngine( json_value=json_value, json_schema=json_schema, llm=llm, ) raw_query_engine = JSONQueryEngine( json_value=json_value, json_schema=json_schema, llm=llm, synthesize_response=False, )

In [ ]

已复制！





nl_response = nl_query_engine.query(
    "What comments has Jerry been writing?",
)
raw_response = raw_query_engine.query(
    "What comments has Jerry been writing?",
)
nl_response = nl_query_engine.query( "What comments has Jerry been writing?", # Jerry 写了哪些评论？ ) raw_response = raw_query_engine.query( "What comments has Jerry been writing?", # Jerry 写了哪些评论？ )

In [ ]

已复制！





display(
    Markdown(f"<h1>Natural language Response</h1><br><b>{nl_response}</b>")
)
display(Markdown(f"<h1>Raw JSON Response</h1><br><b>{raw_response}</b>"))
display( Markdown(f"自然语言响应
{nl_response}") ) display(Markdown(f"原始 JSON 响应
{raw_response}"))

自然语言响应

Jerry 写了评论 "Nice post!"。

原始 JSON 响应

["Nice post!"]

In [ ]

已复制！

# get the json path query string. Same would apply to raw_response
print(nl_response.metadata["json_path_response_str"])
# 获取 json path 查询字符串。raw_response 也同样适用 print(nl_response.metadata["json_path_response_str"])

$.comments[?(@.username=='jerry')].content