这是一个如何使用 JSONalyze 查询引擎的非常简单的示例。
首先,让我们安装 llama-index。
In [ ]
已复制!
%pip install llama-index-llms-openai
%pip install llama-index
# JSONalyze Query Engine rely on sqlite-utils
%pip install sqlite-utils
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import os
import openai
os.environ["OPENAI_API_KEY"] = "YOUR_KEY_HERE"
openai.api_key = os.environ["OPENAI_API_KEY"]
from IPython.display import Markdown, display
然后,我们可以创建一个 JSONalyze 查询引擎实例,将 JSON 列表作为输入。
json_list = [
{
"name": "John Doe",
"age": 25,
"major": "Computer Science",
"email": "[email protected]",
"address": "123 Main St",
"city": "New York",
"state": "NY",
"country": "USA",
"phone": "+1 123-456-7890",
"occupation": "Software Engineer",
},
{
"name": "Jane Smith",
"age": 30,
"major": "Business Administration",
"email": "[email protected]",
"address": "456 Elm St",
"city": "San Francisco",
"state": "CA",
"country": "USA",
"phone": "+1 234-567-8901",
"occupation": "Marketing Manager",
},
{
"name": "Michael Johnson",
"age": 35,
"major": "Finance",
"email": "[email protected]",
"address": "789 Oak Ave",
"city": "Chicago",
"state": "IL",
"country": "USA",
"phone": "+1 345-678-9012",
"occupation": "Financial Analyst",
},
{
"name": "Emily Davis",
"age": 28,
"major": "Psychology",
"email": "[email protected]",
"address": "234 Pine St",
"city": "Los Angeles",
"state": "CA",
"country": "USA",
"phone": "+1 456-789-0123",
"occupation": "Psychologist",
},
{
"name": "Alex Johnson",
"age": 27,
"major": "Engineering",
"email": "[email protected]",
"address": "567 Cedar Ln",
"city": "Seattle",
"state": "WA",
"country": "USA",
"phone": "+1 567-890-1234",
"occupation": "Civil Engineer",
},
{
"name": "Jessica Williams",
"age": 32,
"major": "Biology",
"email": "[email protected]",
"address": "890 Walnut Ave",
"city": "Boston",
"state": "MA",
"country": "USA",
"phone": "+1 678-901-2345",
"occupation": "Biologist",
},
{
"name": "Matthew Brown",
"age": 26,
"major": "English Literature",
"email": "[email protected]",
"address": "123 Peach St",
"city": "Atlanta",
"state": "GA",
"country": "USA",
"phone": "+1 789-012-3456",
"occupation": "Writer",
},
{
"name": "Olivia Wilson",
"age": 29,
"major": "Art",
"email": "[email protected]",
"address": "456 Plum Ave",
"city": "Miami",
"state": "FL",
"country": "USA",
"phone": "+1 890-123-4567",
"occupation": "Artist",
},
{
"name": "Daniel Thompson",
"age": 31,
"major": "Physics",
"email": "[email protected]",
"address": "789 Apple St",
"city": "Denver",
"state": "CO",
"country": "USA",
"phone": "+1 901-234-5678",
"occupation": "Physicist",
},
{
"name": "Sophia Clark",
"age": 27,
"major": "Sociology",
"email": "[email protected]",
"address": "234 Orange Ln",
"city": "Austin",
"state": "TX",
"country": "USA",
"phone": "+1 012-345-6789",
"occupation": "Social Worker",
},
{
"name": "Christopher Lee",
"age": 33,
"major": "Chemistry",
"email": "[email protected]",
"address": "567 Mango St",
"city": "San Diego",
"state": "CA",
"country": "USA",
"phone": "+1 123-456-7890",
"occupation": "Chemist",
},
{
"name": "Ava Green",
"age": 28,
"major": "History",
"email": "[email protected]",
"address": "890 Cherry Ave",
"city": "Philadelphia",
"state": "PA",
"country": "USA",
"phone": "+1 234-567-8901",
"occupation": "Historian",
},
{
"name": "Ethan Anderson",
"age": 30,
"major": "Business",
"email": "[email protected]",
"address": "123 Lemon Ln",
"city": "Houston",
"state": "TX",
"country": "USA",
"phone": "+1 345-678-9012",
"occupation": "Entrepreneur",
},
{
"name": "Isabella Carter",
"age": 28,
"major": "Mathematics",
"email": "[email protected]",
"address": "456 Grape St",
"city": "Phoenix",
"state": "AZ",
"country": "USA",
"phone": "+1 456-789-0123",
"occupation": "Mathematician",
},
{
"name": "Andrew Walker",
"age": 32,
"major": "Economics",
"email": "[email protected]",
"address": "789 Berry Ave",
"city": "Portland",
"state": "OR",
"country": "USA",
"phone": "+1 567-890-1234",
"occupation": "Economist",
},
{
"name": "Mia Evans",
"age": 29,
"major": "Political Science",
"email": "[email protected]",
"address": "234 Lime St",
"city": "Washington",
"state": "DC",
"country": "USA",
"phone": "+1 678-901-2345",
"occupation": "Political Analyst",
},
]
为了演示查询引擎,我们首先创建一个统计问题列表,然后可以使用该查询引擎来回答这些问题。
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import JSONalyzeQueryEngine
llm = OpenAI(model="gpt-3.5-turbo")
json_stats_query_engine = JSONalyzeQueryEngine(
list_of_dict=json_list,
llm=llm,
verbose=True,
)
INFO:numexpr.utils:NumExpr defaulting to 8 threads. NumExpr defaulting to 8 threads.
假设我们想知道列表中人员的平均年龄,可以使用以下查询:
questions = [
"What is the average age of the individuals in the dataset?",
"What is the maximum age among the individuals?",
"What is the minimum age among the individuals?",
"How many individuals have a major in Psychology?",
"What is the most common major among the individuals?",
"What is the percentage of individuals residing in California (CA)?",
"How many individuals have an occupation related to science or engineering?",
"What is the average length of the email addresses in the dataset?",
"How many individuals have a phone number starting with '+1 234'?",
"What is the distribution of ages among the individuals?",
]
问题:数据集中个体的平均年龄是多少?
display(
Markdown("> Question: {}".format(questions[0])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[0]))),
)
Query: What is the average age of the individuals in the dataset? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT AVG(age) FROM items Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'AVG(age)': 29.375}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The average age of the individuals in the dataset is 29.375.
我们还可以使用查询引擎回答其他问题
display( Markdown("> Question: {}".format(questions[4])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[4]))), )
问题:个体中最常见的专业是什么?
display(
Markdown("> Question: {}".format(questions[4])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[4]))),
)
Query: What is the most common major among the individuals? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT major, COUNT(*) as count FROM items GROUP BY major ORDER BY count DESC LIMIT 1; Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'major': 'Sociology', 'count': 1}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The most common major among the individuals is Sociology.
display( Markdown("> Question: {}".format(questions[7])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))), )
问题:数据集中电子邮件地址的平均长度是多少?
display(
Markdown("> Question: {}".format(questions[7])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))),
)
Query: What is the average length of the email addresses in the dataset? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT AVG(LENGTH(email)) FROM items Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'AVG(LENGTH(email))': 24.5}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The average length of the email addresses in the dataset is 24.5 characters.
display( Markdown("> Question: {}".format(questions[5])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))), )
问题:居住在加利福尼亚州 (CA) 的个体百分比是多少?
display(
Markdown("> Question: {}".format(questions[5])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))),
)
Query: What is the percentage of individuals residing in California (CA)? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT (COUNT(*) * 100.0 / (SELECT COUNT(*) FROM items)) AS percentage FROM items WHERE state = 'CA' Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'percentage': 18.75}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The percentage of individuals residing in California (CA) is 18.75%.
display( Markdown("> Question: {}".format(questions[9])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[9]))), )
问题:个体年龄分布是怎样的?
display(
Markdown("> Question: {}".format(questions[9])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[9]))),
)
Query: What is the distribution of ages among the individuals? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT age, COUNT(*) as count FROM items GROUP BY age Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'age': 25, 'count': 1}, {'age': 26, 'count': 1}, {'age': 27, 'count': 2}, {'age': 28, 'count': 3}, {'age': 29, 'count': 2}, {'age': 30, 'count': 2}, {'age': 31, 'count': 1}, {'age': 32, 'count': 2}, {'age': 33, 'count': 1}, {'age': 35, 'count': 1}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The distribution of ages among the individuals is as follows: - 1 individual is 25 years old - 1 individual is 26 years old - 2 individuals are 27 years old - 3 individuals are 28 years old - 2 individuals are 29 years old - 2 individuals are 30 years old - 1 individual is 31 years old - 2 individuals are 32 years old - 1 individual is 33 years old - 1 individual is 35 years old
1 个人 25 岁
1 个人 26 岁
- 2 个人 27 岁
- 3 个人 28 岁
- 2 个人 29 岁
- 2 个人 30 岁
- 1 个人 31 岁
- 2 个人 32 岁
- 1 个人 33 岁
- 1 个人 35 岁
- # e2e test async json_stats_aquery_engine = JSONalyzeQueryEngine( list_of_dict=json_list, llm=llm, verbose=True, use_async=True, )
- 回到顶部
display(
Markdown("> Question: {}".format(questions[7])),
Markdown("Answer: {}".format(json_stats_query_engine.query(questions[7]))),
)
Query: What is the average length of the email addresses in the dataset? INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" SQL Query: SELECT AVG(LENGTH(email)) FROM items Table Schema: {'name': <class 'str'>, 'age': <class 'int'>, 'major': <class 'str'>, 'email': <class 'str'>, 'address': <class 'str'>, 'city': <class 'str'>, 'state': <class 'str'>, 'country': <class 'str'>, 'phone': <class 'str'>, 'occupation': <class 'str'>} SQL Response: [{'AVG(LENGTH(email))': 24.5}] INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" Response: The average length of the email addresses in the dataset is 24.5 characters.
display( Markdown("> Question: {}".format(questions[5])), Markdown("Answer: {}".format(json_stats_query_engine.query(questions[5]))), )
问题:居住在加利福尼亚州 (CA) 的个体百分比是多少?