令牌计数 - 迁移指南#

现有的令牌计数实现已弃用。

我们知道令牌计数对许多用户来说很重要，因此创建了本指南，以帮助您进行一次（希望是无痛的）过渡。

以前，令牌计数直接在 llm_predictor 和 embed_model 对象上跟踪，并可选择打印到控制台。此实现使用静态分词器 (gpt-2) 进行令牌计数，并且 last_token_usage 和 total_token_usage 属性并非始终被正确跟踪。

今后，令牌计数已移至回调中。使用 TokenCountingHandler 回调，您现在有更多选项来控制令牌的计数方式、令牌计数器的生命周期，甚至可以为不同的索引创建独立的令牌计数器。

这里是一个使用新 TokenCountingHandler 和 OpenAI 模型的最小示例

import tiktoken
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings

# you can set a tokenizer directly, or optionally let it default
# to the same tokenizer that was used previously for token counting
# NOTE: The tokenizer should be a function that takes in text and returns a list of tokens
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
    verbose=False,  # set to true to see usage printed to the console
)

Settings.callback_manager = CallbackManager([token_counter])

document = SimpleDirectoryReader("./data").load_data()

# if verbose is turned on, you will see embedding token usage printed
index = VectorStoreIndex.from_documents(
    documents,
)

# otherwise, you can access the count directly
print(token_counter.total_embedding_token_count)

# reset the counts at your discretion!
token_counter.reset_counts()

# also track prompt, completion, and total LLM tokens, in addition to embeddings
response = index.as_query_engine().query("What did the author do growing up?")
print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
)