LlamaIndex 中的记忆¶

LlamaIndex 中的 Memory 类用于存储和检索短期记忆和长期记忆。

您可以单独使用它并在定制工作流中进行编排，或在现有智能体中使用它。

默认情况下，短期记忆以 ChatMessage 对象的 FIFO 队列表示。一旦队列超出一定大小， flush size 内的最后 X 条消息将被存档，并可选地刷新到长期记忆块。

长期记忆以 Memory Block 对象表示。这些对象接收从短期记忆刷新的消息，并可选地处理它们以提取信息。然后，当检索记忆时，短期记忆和长期记忆将合并在一起。

设置¶

本 Notebook 将使用 OpenAI 作为示例中各个部分的 LLM/嵌入模型。

对于向量检索，我们将依赖 Chroma 作为向量存储。

In [ ]

已复制！

%pip install llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-vector-stores-chroma
%pip install llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-vector-stores-chroma

In [ ]

已复制！

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os os.environ["OPENAI_API_KEY"] = "sk-proj-..."

短期记忆¶

让我们探讨如何配置短期记忆的各个组件。

为了便于观察记忆行为，我们将设置一些较低的 token 限制。

In [ ]

已复制！





from llama_index.core.memory import Memory

memory = Memory.from_defaults(
    session_id="my_session",
    token_limit=50,  # Normally you would set this to be closer to the LLM context window (i.e. 75,000, etc.)
    token_flush_size=10,
    chat_history_token_ratio=0.7,
)
from llama_index.core.memory import Memory memory = Memory.from_defaults( session_id="my_session", token_limit=50, # 通常您会将其设置为接近 LLM 上下文窗口（例如 75,000 等） token_flush_size=10, chat_history_token_ratio=0.7, )

让我们回顾一下我们使用的配置及其含义

session_id: 会话的唯一标识符。用于在 SQL 数据库中标记聊天消息属于特定会话。
token_limit: 短期记忆 + 长期记忆中可存储的最大 token 数量。
chat_history_token_ratio: 短期聊天历史中的 token 数量与总 token 限制的比率。这里意味着 50*0.7 = 35 个 token 分配给短期记忆，其余分配给长期记忆。
token_flush_size: 当 token 限制超出时，刷新到长期记忆的 token 数量。请注意，我们未配置长期记忆，因此这些消息仅被存档到数据库中并从短期记忆中删除。

使用我们的记忆，我们可以手动添加一些消息并观察其工作方式。

In [ ]

已复制！





from llama_index.core.llms import ChatMessage

# Simulate a long conversation
for i in range(100):
    await memory.aput_messages(
        [
            ChatMessage(role="user", content="Hello, world!"),
            ChatMessage(role="assistant", content="Hello, world to you too!"),
            ChatMessage(role="user", content="What is the capital of France?"),
            ChatMessage(
                role="assistant", content="The capital of France is Paris."
            ),
        ]
    )
from llama_index.core.llms import ChatMessage # 模拟一次长对话 for i in range(100): await memory.aput_messages( [ ChatMessage(role="user", content="Hello, world!"), ChatMessage(role="assistant", content="Hello, world to you too!"), ChatMessage(role="user", content="What is the capital of France?"), ChatMessage( role="assistant", content="The capital of France is Paris." ), ] )

由于我们的 token 限制很小，我们在短期记忆中只会看到最后 4 条消息（因为这符合 50*0.7 的限制）

In [ ]

已复制！

current_chat_history = await memory.aget()
for msg in current_chat_history:
    print(msg)
current_chat_history = await memory.aget() for msg in current_chat_history: print(msg)

user: Hello, world!
assistant: Hello, world to you too!
user: What is the capital of France?
assistant: The capital of France is Paris.

如果我们检索所有消息，我们将找到全部 400 条消息。

In [ ]

已复制！

all_messages = await memory.aget_all()
print(len(all_messages))
all_messages = await memory.aget_all() print(len(all_messages))

我们可以随时清除记忆重新开始。

In [ ]

已复制！

await memory.areset()
await memory.areset()

In [ ]

已复制！

all_messages = await memory.aget_all()
print(len(all_messages))
all_messages = await memory.aget_all() print(len(all_messages))

长期记忆¶

长期记忆以 Memory Block 对象表示。这些对象接收从短期记忆刷新的消息，并可选地处理它们以提取信息。然后，当检索记忆时，短期记忆和长期记忆将合并在一起。

LlamaIndex 提供了 3 种预构建的记忆块

StaticMemoryBlock: 存储静态信息块的记忆块。
FactExtractionMemoryBlock: 从聊天历史中提取事实的记忆块。
VectorMemoryBlock: 从向量数据库存储和检索批量聊天消息的记忆块。

每个块都有一个 priority（优先级），当长期记忆 + 短期记忆超出 token 限制时使用。优先级 0 意味着该块将始终保存在记忆中，优先级 1 意味着该块将被临时禁用，依此类推。

In [ ]

已复制！





from llama_index.core.memory import (
    StaticMemoryBlock,
    FactExtractionMemoryBlock,
    VectorMemoryBlock,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

llm = OpenAI(model="gpt-4.1-mini")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

client = chromadb.EphemeralClient()
vector_store = ChromaVectorStore(
    chroma_collection=client.get_or_create_collection("test_collection")
)

blocks = [
    StaticMemoryBlock(
        name="core_info",
        static_content="My name is Logan, and I live in Saskatoon. I work at LlamaIndex.",
        priority=0,
    ),
    FactExtractionMemoryBlock(
        name="extracted_info",
        llm=llm,
        max_facts=50,
        priority=1,
    ),
    VectorMemoryBlock(
        name="vector_memory",
        # required: pass in a vector store like qdrant, chroma, weaviate, milvus, etc.
        vector_store=vector_store,
        priority=2,
        embed_model=embed_model,
        # The top-k message batches to retrieve
        # similarity_top_k=2,
        # optional: How many previous messages to include in the retrieval query
        # retrieval_context_window=5
        # optional: pass optional node-postprocessors for things like similarity threshold, etc.
        # node_postprocessors=[...],
    ),
]
from llama_index.core.memory import ( StaticMemoryBlock, FactExtractionMemoryBlock, VectorMemoryBlock, ) from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb llm = OpenAI(model="gpt-4.1-mini") embed_model = OpenAIEmbedding(model="text-embedding-3-small") client = chromadb.EphemeralClient() vector_store = ChromaVectorStore( chroma_collection=client.get_or_create_collection("test_collection") ) blocks = [ StaticMemoryBlock( name="core_info", static_content="My name is Logan, and I live in Saskatoon. I work at LlamaIndex.", priority=0, ), FactExtractionMemoryBlock( name="extracted_info", llm=llm, max_facts=50, priority=1, ), VectorMemoryBlock( name="vector_memory", # required: pass in a vector store like qdrant, chroma, weaviate, milvus, etc. vector_store=vector_store, priority=2, embed_model=embed_model, # The top-k message batches to retrieve # similarity_top_k=2, # optional: How many previous messages to include in the retrieval query # retrieval_context_window=5 # optional: pass optional node-postprocessors for things like similarity threshold, etc. # node_postprocessors=[...], ), ]

创建了我们的块之后，我们可以将它们传递给 Memory 类。

In [ ]

已复制！





from llama_index.core.memory import Memory

memory = Memory.from_defaults(
    session_id="my_session",
    token_limit=30000,
    # Setting a extremely low ratio so that more tokens are flushed to long-term memory
    chat_history_token_ratio=0.02,
    token_flush_size=500,
    memory_blocks=blocks,
    # insert into the latest user message, can also be "system"
    insert_method="user",
)
from llama_index.core.memory import Memory memory = Memory.from_defaults( session_id="my_session", token_limit=30000, # Setting a extremely low ratio so that more tokens are flushed to long-term memory chat_history_token_ratio=0.02, token_flush_size=500, memory_blocks=blocks, # insert into the latest user message, can also be "system" insert_method="user", )

有了这些，我们可以模拟与一个代理的对话并检查长期记忆。

In [ ]

已复制！





from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

agent = FunctionAgent(
    tools=[],
    llm=llm,
)

user_msgs = [
    "Hi! My name is Logan",
    "What is your opinion on minature shnauzers?",
    "Do they shed a lot?",
    "What breeds are comparable in size?",
    "What is your favorite breed?",
    "Would you recommend owning a dog?",
    "What should I buy to prepare for owning a dog?",
]

for user_msg in user_msgs:
    _ = await agent.run(user_msg=user_msg, memory=memory)
from llama_index.core.agent.workflow import FunctionAgent from llama_index.llms.openai import OpenAI agent = FunctionAgent( tools=[], llm=llm, ) user_msgs = [ "Hi! My name is Logan", "What is your opinion on minature shnauzers?", "Do they shed a lot?", "What breeds are comparable in size?", "What is your favorite breed?", "Would you recommend owning a dog?", "What should I buy to prepare for owning a dog?", ] for user_msg in user_msgs: _ = await agent.run(user_msg=user_msg, memory=memory)

现在，让我们检查最新的用户消息，看看记忆会在用户消息中插入什么。

请注意，我们至少传递了一个聊天消息，这样向量记忆才能实际运行检索。

In [ ]

已复制！

chat_history = await memory.aget()
chat_history = await memory.aget()

In [ ]

已复制！

print(len(chat_history))
print(len(chat_history))

很好，我们可以看到当前的 FIFO 队列只有 2 条消息（这是预期的，因为我们将聊天历史 token 比例设置为 0.02）。

现在，让我们检查插入到最新用户消息中的长期记忆块。

In [ ]

已复制！

for block in chat_history[-2].blocks:
    print(block.text)
for block in chat_history[-2].blocks: print(block.text)

<memory>
<core_info>
My name is Logan, and I live in Saskatoon. I work at LlamaIndex.
</core_info>
<extracted_info>
<fact>User's name is Logan</fact>
<fact>User lives in Saskatoon</fact>
<fact>User works at LlamaIndex</fact>
<fact>User is interested in Miniature Schnauzers</fact>
</extracted_info>
<vector_memory>
<message role='user'>Hi! My name is Logan</message>
<message role='assistant'>Hi Logan! Nice to meet you. How can I assist you today?</message>
<message role='user'>What is your opinion on minature shnauzers?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers are wonderful dogs—they're known for being intelligent, friendly, and energetic. They often make great companions because they're loyal and good with families. Plus, their distinctive beard and eyebrows give them a charming, expressive look. Do you have one, or are you thinking about getting one?</message>
<message role='user'>Do they shed a lot?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers are actually known for being low shedders. They have a wiry double coat that doesn't shed much, which makes them a good choice for people who prefer a cleaner home or have mild allergies. However, their coat does require regular grooming and trimming to keep it looking its best. Since you’re in Saskatoon, the grooming routine might also help keep them comfortable through the changing seasons. Are you considering getting one as a pet?</message>
<message role='user'>What breeds are comparable in size?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers typically weigh between 11 to 20 pounds (5 to 9 kg) and stand about 12 to 14 inches (30 to 36 cm) tall at the shoulder. Breeds comparable in size include:

- **Cairn Terrier**  
- **West Highland White Terrier (Westie)**  
- **Scottish Terrier**  
- **Pomeranian** (though usually a bit smaller)  
- **Beagle** (on the smaller side of the breed)  
- **French Bulldog** (a bit stockier but similar in height)  

These breeds are similar in size and can have comparable energy levels and grooming needs, depending on the breed. If you’re thinking about a dog that fits well with your lifestyle in Saskatoon and your work at LlamaIndex, I’d be happy to help you explore options!</message>
<message role='user'>What is your favorite breed?</message>
<message role='assistant'>Hi Logan! I don't have personal preferences, but I really appreciate breeds like the Miniature Schnauzer because of their intelligence, friendly nature, and low-shedding coat. They seem like great companions, especially for someone living in a place with changing seasons like Saskatoon. Do you have a favorite breed, or one you’re particularly interested in?</message>
<message role='user'>Would you recommend owning a dog?</message>
<message role='assistant'>Hi Logan! Owning a dog can be a wonderful experience, offering companionship, exercise, and even stress relief. Since you live in Saskatoon, where the seasons can be quite distinct, a dog can be a great motivator to get outside and enjoy the fresh air year-round.

That said, it’s important to consider your lifestyle and work schedule at LlamaIndex. Dogs require time, attention, and care—regular walks, playtime, grooming, and vet visits. If you have the time and energy to commit, a dog can be a fantastic addition to your life. Breeds like Miniature Schnauzers, which are adaptable and relatively low-maintenance in terms of shedding, might be a good fit.

If you’re unsure, maybe start by volunteering at a local animal shelter or fostering a dog to see how it fits with your routine. Would you like tips on how to prepare for dog ownership or suggestions on breeds that suit your lifestyle?</message>
</vector_memory>
</memory>
What should I buy to prepare for owning a dog?

要在代理之外使用这种记忆，并进一步突出其用法，你可以这样做：

In [ ]

已复制！





new_user_msg = ChatMessage(
    role="user", content="What kind of dog was I asking about?"
)
await memory.aput(new_user_msg)

# Get the new chat history
new_chat_history = await memory.aget()
resp = await llm.achat(new_chat_history)
await memory.aput(resp.message)
print(resp.message.content)
new_user_msg = ChatMessage( role="user", content="What kind of dog was I asking about?" ) await memory.aput(new_user_msg) # Get the new chat history new_chat_history = await memory.aget() resp = await llm.achat(new_chat_history) await memory.aput(resp.message) print(resp.message.content)

You were asking about Miniature Schnauzers.