Token Counting Handler¶
本 Notebook 将介绍如何使用 TokenCountingHandler,以及如何利用它来跟踪您的提示、补全和嵌入 token 用量随时间的变化。
如果您在 Colab 上打开此 Notebook,您可能需要安装 LlamaIndex 🦙。
In [ ]
已复制!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]
已复制!
!pip install llama-index
!pip install llama-index
设置¶
在这里,我们设置回调和服务上下文。我们设置了全局设置,这样就不必担心将其传递给索引和查询。
In [ ]
已复制!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os os.environ["OPENAI_API_KEY"] = "sk-..."
In [ ]
已复制!
import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])
import tiktoken from llama_index.core.callbacks import CallbackManager, TokenCountingHandler from llama_index.llms.openai import OpenAI from llama_index.core import Settings token_counter = TokenCountingHandler( tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode ) Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2) Settings.callback_manager = CallbackManager([token_counter])
下载数据¶
In [ ]
已复制!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/' !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]
已复制!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
from llama_index.core import SimpleDirectoryReader documents = SimpleDirectoryReader("./data/paul_graham").load_data()
In [ ]
已复制!
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents(documents)
In [ ]
已复制!
print(token_counter.total_embedding_token_count)
print(token_counter.total_embedding_token_count)
20723
看起来没问题!在我们继续之前,先重置计数。
In [ ]
已复制!
token_counter.reset_counts()
token_counter.reset_counts()
LLM + 嵌入 Token 用量¶
接下来,我们测试一个查询,看看计数是什么样的。
In [ ]
已复制!
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What did the author do growing up?")
query_engine = index.as_query_engine(similarity_top_k=4) response = query_engine.query("What did the author do growing up?")
In [ ]
已复制!
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
print( "Embedding Tokens: ", token_counter.total_embedding_token_count, "\n", "LLM Prompt Tokens: ", token_counter.prompt_llm_token_count, "\n", "LLM Completion Tokens: ", token_counter.completion_llm_token_count, "\n", "Total LLM Token Count: ", token_counter.total_llm_token_count, "\n", )
Embedding Tokens: 8 LLM Prompt Tokens: 4518 LLM Completion Tokens: 45 Total LLM Token Count: 4563
In [ ]
已复制!
token_counter.reset_counts()
query_engine = index.as_query_engine(similarity_top_k=4, streaming=True)
response = query_engine.query("What happened at Interleaf?")
# finish the stream
for token in response.response_gen:
# print(token, end="", flush=True)
continue
token_counter.reset_counts() query_engine = index.as_query_engine(similarity_top_k=4, streaming=True) response = query_engine.query("What happened at Interleaf?") # finish the stream for token in response.response_gen: # print(token, end="", flush=True) continue
In [ ]
已复制!
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
print( "Embedding Tokens: ", token_counter.total_embedding_token_count, "\n", "LLM Prompt Tokens: ", token_counter.prompt_llm_token_count, "\n", "LLM Completion Tokens: ", token_counter.completion_llm_token_count, "\n", "Total LLM Token Count: ", token_counter.total_llm_token_count, "\n", )
Embedding Tokens: 6 LLM Prompt Tokens: 4563 LLM Completion Tokens: 123 Total LLM Token Count: 4686
高级用法¶
token 计数器会跟踪每个 token 用量事件,将其存储在一个名为 TokenCountingEvent 的对象中。此对象具有以下属性:
- prompt -> 发送给 LLM 或嵌入模型的提示字符串
- prompt_token_count -> LLM 提示的 token 计数
- completion -> 从 LLM 接收到的补全字符串(不用于嵌入)
- completion_token_count -> LLM 补全的 token 计数(不用于嵌入)
- total_token_count -> 事件的总 prompt + 补全 token 数
- event_id -> 事件的字符串 ID,与其他回调处理器对齐
这些事件在 token 计数器中通过两个列表进行跟踪
- llm_token_counts
- embedding_token_counts
让我们看看它们长什么样!
In [ ]
已复制!
print("Num LLM token count events: ", len(token_counter.llm_token_counts))
print(
"Num Embedding token count events: ",
len(token_counter.embedding_token_counts),
)
print("LLM token count 事件数量: ", len(token_counter.llm_token_counts)) print( "Embedding token count 事件数量: ", len(token_counter.embedding_token_counts), )
Num LLM token count events: 2 Num Embedding token count events: 1
这是有道理的!之前的查询嵌入了查询文本,然后进行了两次 LLM 调用(因为 top k 是 4,默认分块大小是 1024,所以需要进行两次单独调用,以便 LLM 读取所有检索到的文本)。
接下来,让我们快速看看单个事件的样子。
In [ ]
已复制!
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n")
print(
"prompt token count: ",
token_counter.llm_token_counts[0].prompt_token_count,
"\n",
)
print(
"completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n"
)
print(
"completion token count: ",
token_counter.llm_token_counts[0].completion_token_count,
"\n",
)
print("total token count", token_counter.llm_token_counts[0].total_token_count)
print("prompt: ", token_counter.llm_token_counts[0].prompt[:100], "...\n") print( "prompt token count: ", token_counter.llm_token_counts[0].prompt_token_count, "\n", ) print( "completion: ", token_counter.llm_token_counts[0].completion[:100], "...\n" ) print( "completion token count: ", token_counter.llm_token_counts[0].completion_token_count, "\n", ) print("总 token 数", token_counter.llm_token_counts[0].total_token_count)
prompt: system: You are an expert Q&A system that is trusted around the world. Always answer the query using ... prompt token count: 3873 completion: assistant: At Interleaf, the company had added a scripting language inspired by Emacs and made it a ... completion token count: 95 total token count 3968