令牌计数 - 迁移指南#
现有的令牌计数实现已弃用。
我们知道令牌计数对许多用户来说很重要,因此创建了本指南,以帮助您进行一次(希望是无痛的)过渡。
以前,令牌计数直接在 llm_predictor
和 embed_model
对象上跟踪,并可选择打印到控制台。此实现使用静态分词器 (gpt-2) 进行令牌计数,并且 last_token_usage
和 total_token_usage
属性并非始终被正确跟踪。
今后,令牌计数已移至回调中。使用 TokenCountingHandler
回调,您现在有更多选项来控制令牌的计数方式、令牌计数器的生命周期,甚至可以为不同的索引创建独立的令牌计数器。
这里是一个使用新 TokenCountingHandler
和 OpenAI 模型的最小示例
import tiktoken
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings
# you can set a tokenizer directly, or optionally let it default
# to the same tokenizer that was used previously for token counting
# NOTE: The tokenizer should be a function that takes in text and returns a list of tokens
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
verbose=False, # set to true to see usage printed to the console
)
Settings.callback_manager = CallbackManager([token_counter])
document = SimpleDirectoryReader("./data").load_data()
# if verbose is turned on, you will see embedding token usage printed
index = VectorStoreIndex.from_documents(
documents,
)
# otherwise, you can access the count directly
print(token_counter.total_embedding_token_count)
# reset the counts at your discretion!
token_counter.reset_counts()
# also track prompt, completion, and total LLM tokens, in addition to embeddings
response = index.as_query_engine().query("What did the author do growing up?")
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
)