使用 JSONalyze，在底层会创建一个内存中的 SQLite 表，并加载 JSON 列表。查询引擎能够对数据执行 SQL 查询，并将查询结果作为分析问题的答案返回。

这是一个如何使用 JSONalyze 查询引擎的非常简单的示例。

首先，让我们安装 llama-index。

In [ ]

已复制！

%pip install llama-index-llms-openai

%pip install llama-index

%pip install llama-index-llms-openai
# JSONalyze Query Engine rely on sqlite-utils %pip install sqlite-utils

%pip install llama-index-llms-openai

%pip install llama-index

%pip install llama-index
import logging import sys logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

%pip install llama-index-llms-openai

%pip install llama-index

# JSONalyze Query Engine rely on sqlite-utils
%pip install sqlite-utils
import os import openai os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE" openai.api_key = os.environ["OPENAI_API_KEY"]

%pip install llama-index-llms-openai

%pip install llama-index

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from IPython.display import Markdown, display

%pip install llama-index-llms-openai

%pip install llama-index

import os
import openai

os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"
openai.api_key = os.environ["OPENAI_API_KEY"]
假设我们有一个 JSON 列表（已加载为 Dict 列表），如下所示：

%pip install llama-index-llms-openai

%pip install llama-index

from IPython.display import Markdown, display
json_list = [ { "name": "John Doe", "age": 25, "major": "Computer Science", "email": "[email protected]", "address": "123 Main St", "city": "New York", "state": "NY", "country": "USA", "phone": "+1 123-456-7890", "occupation": "Software Engineer", }, { "name": "Jane Smith", "age": 30, "major": "Business Administration", "email": "[email protected]", "address": "456 Elm St", "city": "San Francisco", "state": "CA", "country": "USA", "phone": "+1 234-567-8901", "occupation": "Marketing Manager", }, { "name": "Michael Johnson", "age": 35, "major": "Finance", "email": "[email protected]", "address": "789 Oak Ave", "city": "Chicago", "state": "IL", "country": "USA", "phone": "+1 345-678-9012", "occupation": "Financial Analyst", }, { "name": "Emily Davis", "age": 28, "major": "Psychology", "email": "[email protected]", "address": "234 Pine St", "city": "Los Angeles", "state": "CA", "country": "USA", "phone": "+1 456-789-0123", "occupation": "Psychologist", }, { "name": "Alex Johnson", "age": 27, "major": "Engineering", "email": "[email protected]", "address": "567 Cedar Ln", "city": "Seattle", "state": "WA", "country": "USA", "phone": "+1 567-890-1234", "occupation": "Civil Engineer", }, { "name": "Jessica Williams", "age": 32, "major": "Biology", "email": "[email protected]", "address": "890 Walnut Ave", "city": "Boston", "state": "MA", "country": "USA", "phone": "+1 678-901-2345", "occupation": "Biologist", }, { "name": "Matthew Brown", "age": 26, "major": "English Literature", "email": "[email protected]", "address": "123 Peach St", "city": "Atlanta", "state": "GA", "country": "USA", "phone": "+1 789-012-3456", "occupation": "Writer", }, { "name": "Olivia Wilson", "age": 29, "major": "Art", "email": "[email protected]", "address": "456 Plum Ave", "city": "Miami", "state": "FL", "country": "USA", "phone": "+1 890-123-4567", "occupation": "Artist", }, { "name": "Daniel Thompson", "age": 31, "major": "Physics", "email": "[email protected]", "address": "789 Apple St", "city": "Denver", "state": "CO", "country": "USA", "phone": "+1 901-234-5678", "occupation": "Physicist", }, { "name": "Sophia Clark", "age": 27, "major": "Sociology", "email": "[email protected]", "address": "234 Orange Ln", "city": "Austin", "state": "TX", "country": "USA", "phone": "+1 012-345-6789", "occupation": "Social Worker", }, { "name": "Christopher Lee", "age": 33, "major": "Chemistry", "email": "[email protected]", "address": "567 Mango St", "city": "San Diego", "state": "CA", "country": "USA", "phone": "+1 123-456-7890", "occupation": "Chemist", }, { "name": "Ava Green", "age": 28, "major": "History", "email": "[email protected]", "address": "890 Cherry Ave", "city": "Philadelphia", "state": "PA", "country": "USA", "phone": "+1 234-567-8901", "occupation": "Historian", }, { "name": "Ethan Anderson", "age": 30, "major": "Business", "email": "[email protected]", "address": "123 Lemon Ln", "city": "Houston", "state": "TX", "country": "USA", "phone": "+1 345-678-9012", "occupation": "Entrepreneur", }, { "name": "Isabella Carter", "age": 28, "major": "Mathematics", "email": "[email protected]", "address": "456 Grape St", "city": "Phoenix", "state": "AZ", "country": "USA", "phone": "+1 456-789-0123", "occupation": "Mathematician", }, { "name": "Andrew Walker", "age": 32, "major": "Economics", "email": "[email protected]", "address": "789 Berry Ave", "city": "Portland", "state": "OR", "country": "USA", "phone": "+1 567-890-1234", "occupation": "Economist", }, { "name": "Mia Evans", "age": 29, "major": "Political Science", "email": "[email protected]", "address": "234 Lime St", "city": "Washington", "state": "DC", "country": "USA", "phone": "+1 678-901-2345", "occupation": "Political Analyst", }, ]

然后，我们可以创建一个 JSONalyze 查询引擎实例，将 JSON 列表作为输入。

%pip install llama-index-llms-openai

%pip install llama-index





json_list = [
    {
        "name": "John Doe",
        "age": 25,
        "major": "Computer Science",
        "email": "[email protected]",
        "address": "123 Main St",
        "city": "New York",
        "state": "NY",
        "country": "USA",
        "phone": "+1 123-456-7890",
        "occupation": "Software Engineer",
    },
    {
        "name": "Jane Smith",
        "age": 30,
        "major": "Business Administration",
        "email": "[email protected]",
        "address": "456 Elm St",
        "city": "San Francisco",
        "state": "CA",
        "country": "USA",
        "phone": "+1 234-567-8901",
        "occupation": "Marketing Manager",
    },
    {
        "name": "Michael Johnson",
        "age": 35,
        "major": "Finance",
        "email": "[email protected]",
        "address": "789 Oak Ave",
        "city": "Chicago",
        "state": "IL",
        "country": "USA",
        "phone": "+1 345-678-9012",
        "occupation": "Financial Analyst",
    },
    {
        "name": "Emily Davis",
        "age": 28,
        "major": "Psychology",
        "email": "[email protected]",
        "address": "234 Pine St",
        "city": "Los Angeles",
        "state": "CA",
        "country": "USA",
        "phone": "+1 456-789-0123",
        "occupation": "Psychologist",
    },
    {
        "name": "Alex Johnson",
        "age": 27,
        "major": "Engineering",
        "email": "[email protected]",
        "address": "567 Cedar Ln",
        "city": "Seattle",
        "state": "WA",
        "country": "USA",
        "phone": "+1 567-890-1234",
        "occupation": "Civil Engineer",
    },
    {
        "name": "Jessica Williams",
        "age": 32,
        "major": "Biology",
        "email": "[email protected]",
        "address": "890 Walnut Ave",
        "city": "Boston",
        "state": "MA",
        "country": "USA",
        "phone": "+1 678-901-2345",
        "occupation": "Biologist",
    },
    {
        "name": "Matthew Brown",
        "age": 26,
        "major": "English Literature",
        "email": "[email protected]",
        "address": "123 Peach St",
        "city": "Atlanta",
        "state": "GA",
        "country": "USA",
        "phone": "+1 789-012-3456",
        "occupation": "Writer",
    },
    {
        "name": "Olivia Wilson",
        "age": 29,
        "major": "Art",
        "email": "[email protected]",
        "address": "456 Plum Ave",
        "city": "Miami",
        "state": "FL",
        "country": "USA",
        "phone": "+1 890-123-4567",
        "occupation": "Artist",
    },
    {
        "name": "Daniel Thompson",
        "age": 31,
        "major": "Physics",
        "email": "[email protected]",
        "address": "789 Apple St",
        "city": "Denver",
        "state": "CO",
        "country": "USA",
        "phone": "+1 901-234-5678",
        "occupation": "Physicist",
    },
    {
        "name": "Sophia Clark",
        "age": 27,
        "major": "Sociology",
        "email": "[email protected]",
        "address": "234 Orange Ln",
        "city": "Austin",
        "state": "TX",
        "country": "USA",
        "phone": "+1 012-345-6789",
        "occupation": "Social Worker",
    },
    {
        "name": "Christopher Lee",
        "age": 33,
        "major": "Chemistry",
        "email": "[email protected]",
        "address": "567 Mango St",
        "city": "San Diego",
        "state": "CA",
        "country": "USA",
        "phone": "+1 123-456-7890",
        "occupation": "Chemist",
    },
    {
        "name": "Ava Green",
        "age": 28,
        "major": "History",
        "email": "[email protected]",
        "address": "890 Cherry Ave",
        "city": "Philadelphia",
        "state": "PA",
        "country": "USA",
        "phone": "+1 234-567-8901",
        "occupation": "Historian",
    },
    {
        "name": "Ethan Anderson",
        "age": 30,
        "major": "Business",
        "email": "[email protected]",
        "address": "123 Lemon Ln",
        "city": "Houston",
        "state": "TX",
        "country": "USA",
        "phone": "+1 345-678-9012",
        "occupation": "Entrepreneur",
    },
    {
        "name": "Isabella Carter",
        "age": 28,
        "major": "Mathematics",
        "email": "[email protected]",
        "address": "456 Grape St",
        "city": "Phoenix",
        "state": "AZ",
        "country": "USA",
        "phone": "+1 456-789-0123",
        "occupation": "Mathematician",
    },
    {
        "name": "Andrew Walker",
        "age": 32,
        "major": "Economics",
        "email": "[email protected]",
        "address": "789 Berry Ave",
        "city": "Portland",
        "state": "OR",
        "country": "USA",
        "phone": "+1 567-890-1234",
        "occupation": "Economist",
    },
    {
        "name": "Mia Evans",
        "age": 29,
        "major": "Political Science",
        "email": "[email protected]",
        "address": "234 Lime St",
        "city": "Washington",
        "state": "DC",
        "country": "USA",
        "phone": "+1 678-901-2345",
        "occupation": "Political Analyst",
    },
]
from llama_index.llms.openai import OpenAI from llama_index.core.query_engine import JSONalyzeQueryEngine llm = OpenAI(model="gpt-3.5-turbo") json_stats_query_engine = JSONalyzeQueryEngine( list_of_dict=json_list, llm=llm, verbose=True, )

为了演示查询引擎，我们首先创建一个统计问题列表，然后可以使用该查询引擎来回答这些问题。

%pip install llama-index-llms-openai

%pip install llama-index

from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import JSONalyzeQueryEngine

llm = OpenAI(model="gpt-3.5-turbo")

json_stats_query_engine = JSONalyzeQueryEngine(
    list_of_dict=json_list,
    llm=llm,
    verbose=True,
)
questions = [ "What is the average age of the individuals in the dataset?", "What is the maximum age among the individuals?", "What is the minimum age among the individuals?", "How many individuals have a major in Psychology?", "What is the most common major among the individuals?", "What is the percentage of individuals residing in California (CA)?", "How many individuals have an occupation related to science or engineering?", "What is the average length of the email addresses in the dataset?", "How many individuals have a phone number starting with '+1 234'?", "What is the distribution of ages among the individuals?", ]

INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.

假设我们想知道列表中人员的平均年龄，可以使用以下查询：

%pip install llama-index-llms-openai

%pip install llama-index





questions = [
    "What is the average age of the individuals in the dataset?",
    "What is the maximum age among the individuals?",
    "What is the minimum age among the individuals?",
    "How many individuals have a major in Psychology?",
    "What is the most common major among the individuals?",
    "What is the percentage of individuals residing in California (CA)?",
    "How many individuals have an occupation related to science or engineering?",
    "What is the average length of the email addresses in the dataset?",
    "How many individuals have a phone number starting with '+1 234'?",
    "What is the distribution of ages among the individuals?",
]
display( Markdown("> Question: {}".format(questions[0])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[0]))), )

问题：数据集中个体的平均年龄是多少？

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[0])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[0]))),
)
答案：数据集中个体的平均年龄是 29.375。

Query: What is the average age of the individuals in the dataset?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT AVG(age) FROM items
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'AVG(age)': 29.375}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The average age of the individuals in the dataset is 29.375.

我们还可以使用查询引擎回答其他问题

display( Markdown("> Question: {}".format(questions[4])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[4]))), )

问题：个体中最常见的专业是什么？

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[4])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[4]))),
)
答案：个体中最常见的专业是社会学。

Query: What is the most common major among the individuals?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT major, COUNT(*) as count
FROM items
GROUP BY major
ORDER BY count DESC
LIMIT 1;
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'major': 'Sociology', 'count': 1}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The most common major among the individuals is Sociology.

display( Markdown("> Question: {}".format(questions[7])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))), )

问题：数据集中电子邮件地址的平均长度是多少？

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[7])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))),
)
答案：数据集中电子邮件地址的平均长度是 24.5 个字符。

Query: What is the average length of the email addresses in the dataset?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT AVG(LENGTH(email)) FROM items
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'AVG(LENGTH(email))': 24.5}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The average length of the email addresses in the dataset is 24.5 characters.

display( Markdown("> Question: {}".format(questions[5])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))), )

问题：居住在加利福尼亚州 (CA) 的个体百分比是多少？

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[5])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))),
)
答案：居住在加利福尼亚州 (CA) 的个体百分比是 18.75%。

Query: What is the percentage of individuals residing in California (CA)?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT (COUNT(*) * 100.0 / (SELECT COUNT(*) FROM items)) AS percentage
FROM items
WHERE state = 'CA'
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'percentage': 18.75}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The percentage of individuals residing in California (CA) is 18.75%.

display( Markdown("> Question: {}".format(questions[9])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[9]))), )

问题：个体年龄分布是怎样的？

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[9])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[9]))),
)
答案：个体年龄分布如下：

Query: What is the distribution of ages among the individuals?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT age, COUNT(*) as count
FROM items
GROUP BY age
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'age': 25, 'count': 1}, {'age': 26, 'count': 1}, {'age': 27, 'count': 2}, {'age': 28, 'count': 3}, {'age': 29, 'count': 2}, {'age': 30, 'count': 2}, {'age': 31, 'count': 1}, {'age': 32, 'count': 2}, {'age': 33, 'count': 1}, {'age': 35, 'count': 1}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The distribution of ages among the individuals is as follows:
- 1 individual is 25 years old
- 1 individual is 26 years old
- 2 individuals are 27 years old
- 3 individuals are 28 years old
- 2 individuals are 29 years old
- 2 individuals are 30 years old
- 1 individual is 31 years old
- 2 individuals are 32 years old
- 1 individual is 33 years old
- 1 individual is 35 years old

1 个人 25 岁

1 个人 26 岁

2 个人 27 岁
3 个人 28 岁
2 个人 29 岁
2 个人 30 岁
1 个人 31 岁
2 个人 32 岁
1 个人 33 岁
1 个人 35 岁
# e2e test async json_stats_aquery_engine = JSONalyzeQueryEngine( list_of_dict=json_list, llm=llm, verbose=True, use_async=True, )
回到顶部

%pip install llama-index-llms-openai

%pip install llama-index

# e2e test async

json_stats_aquery_engine = JSONalyzeQueryEngine(
    list_of_dict=json_list,
    llm=llm,
    verbose=True,
    use_async=True,
)
  
 上一个  带自定义检索器的检索器查询引擎 - 简单混合搜索 
  下一个  联合问答摘要查询引擎

%pip install llama-index-llms-openai

%pip install llama-index

display(
    Markdown("> Question: {}".format(questions[7])),
    Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))),
)
答案：数据集中电子邮件地址的平均长度是 24.5 个字符。

Query: What is the average length of the email addresses in the dataset?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
SQL Query: SELECT AVG(LENGTH(email)) FROM items
Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>}
SQL Response: [{'AVG(LENGTH(email))': 24.5}]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response: The average length of the email addresses in the dataset is 24.5 characters.

display( Markdown("> Question: {}".format(questions[5])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))), )

问题：居住在加利福尼亚州 (CA) 的个体百分比是多少？