第8章：Memory Management

有效的 Memory Management 对于智能 Agent 保留信息至关重要。Agent 与人类一样，需要不同类型的记忆才能高效运作。本章深入探讨 Memory Management，特别关注 Agent 的即时（短期）和持久（长期）记忆需求。

在 Agent 系统中，Memory 指的是 Agent 保留和利用来自过去交互、观察和学习经验的信息的能力。这种能力使 Agent 能够做出明智的决策、保持对话上下文，并随着时间的推移不断改进。Agent Memory 通常分为两种主要类型：

短期记忆（Short-Term Memory，上下文记忆）：类似于工作记忆，它保存当前正在处理或最近访问的信息。对于使用大语言模型（LLM）的 Agent 而言，短期记忆主要存在于上下文窗口内。此窗口包含最近的用户消息、Agent 回复、Tool Use 结果以及当前交互中的 Agent 反思（Reflection），所有这些信息都为 LLM 的后续响应和行动提供依据。上下文窗口容量有限，限制了 Agent 可以直接访问的最近信息量。高效的短期记忆管理涉及在此有限空间内保留最相关的信息，可能通过总结较早的对话片段或强调关键细节等技术实现。拥有"长上下文"窗口的模型的出现只是扩大了这种短期记忆的大小，允许在单次交互中容纳更多信息。然而，这种上下文仍然是短暂的，一旦会话结束就会丢失，而且每次处理时的成本和效率可能很高。因此，Agent 需要单独的记忆类型来实现真正的持久性、回忆过去交互中的信息，并构建持久的知识库。
长期记忆（Long-Term Memory，持久记忆）：它充当 Agent 需要跨各种交互、任务或延长时间段保留的信息的存储库，类似于长期知识库。数据通常存储在 Agent 即时处理环境之外，通常位于数据库、知识图谱或向量数据库中。在向量数据库中，信息被转换为数值向量并存储，使 Agent 能够基于语义相似性而非精确关键词匹配来检索数据——这一过程称为语义搜索。当 Agent 需要长期记忆中的信息时，它会查询外部存储，检索相关数据，并将其整合到短期上下文中以供即时使用，从而将先前的知识与当前交互结合起来。

实际应用与用例

Memory Management 对于 Agent 跟踪信息并随时间推移智能执行任务至关重要。这是 Agent 超越基础问答能力所必需的。应用包括：

聊天机器人与对话式 AI：保持对话流程依赖于短期记忆。聊天机器人需要记住先前的用户输入以提供连贯的响应。长期记忆使聊天机器人能够回忆用户偏好、过去的问题或之前的讨论，从而提供个性化和连续的交互。
面向任务的 Agent：管理多步任务的 Agent 需要短期记忆来跟踪之前的步骤、当前进度和总体目标。这些信息可能驻留在任务的上下文或临时存储中。长期记忆对于访问不在即时上下文中的特定用户相关数据至关重要。
个性化体验：提供定制交互的 Agent 利用长期记忆来存储和检索用户偏好、过去行为和个人信息。这使 Agent 能够调整其响应和建议。
学习与改进：Agent 可以通过从过去交互中学习来改进其表现。成功的策略、错误和新信息存储在长期记忆中，促进未来的适应。强化学习 Agent 以这种方式存储学习到的策略或知识。
信息检索（RAG）：旨在回答问题的 Agent 访问其知识库（即长期记忆），通常通过 Retrieval Augmented Generation（RAG）实现。Agent 检索相关文档或数据来为其响应提供依据。
自治系统：机器人或自动驾驶汽车需要记忆地图、路线、物体位置和学习到的行为。这涉及用于即时环境的短期记忆和用于一般环境知识的长期记忆。

Memory 使 Agent 能够维护历史记录、学习、个性化交互以及管理复杂的、与时间相关的问题。

Hands-On Code：Google Agent Developer Kit (ADK) 中的 Memory Management

Google Agent Developer Kit (ADK) 提供了一种结构化的方法来管理上下文和 Memory，包括用于实际应用的组件。对 ADK 的 Session、State 和 Memory 的扎实理解对于构建需要保留信息的 Agent 至关重要。

就像人类交互一样，Agent 需要回忆之前的交流以进行连贯自然的对话。ADK 通过三个核心概念及其相关服务简化了上下文管理。

与 Agent 的每次交互都可以被视为一个独特的对话线程。Agent 可能需要访问早期交互中的数据。ADK 将其结构化为：

Session：一个单独的聊天线程，记录该特定交互的消息和动作（Event），还存储与该对话相关的临时数据（State）。
State（session.state）：存储在 Session 中的数据，仅包含与当前活跃聊天线程相关的信息。
Memory：一个可搜索的信息存储库，来源于各种过去的聊天或外部来源，作为即时对话之外数据检索的资源。

ADK 提供了专门的服务来管理构建复杂、有状态和上下文感知 Agent 所需的关键组件。SessionService 通过处理聊天线程（Session 对象）的创建、记录和终止来管理它们，而 MemoryService 负责长期知识（Memory）的存储和检索。

SessionService 和 MemoryService 都提供各种配置选项，允许用户根据应用需求选择存储方法。内存选项可用于测试目的，但数据不会跨重启持久化。对于持久存储和可扩展性，ADK 还支持数据库和基于云的服务。

Session：跟踪每次聊天

ADK 中的 Session 对象旨在跟踪和管理单个聊天线程。当与 Agent 的对话开始时，SessionService 会生成一个 Session 对象，表示为 google.adk.sessions.Session。此对象封装了与特定对话线程相关的所有数据，包括唯一标识符（id、app_name、user_id）、作为 Event 对象的事件按时间顺序的记录、用于会话特定临时数据的存储区域（称为 state），以及指示最后更新时间的时间戳（last_update_time）。开发者通常通过 SessionService 间接与 Session 对象交互。SessionService 负责管理对话会话的生命周期，包括创建新会话、恢复之前的会话、记录会话活动（包括状态更新）、识别活跃会话以及管理会话数据的删除。ADK 提供了几种 SessionService 实现，具有不同的会话历史和临时数据存储机制，例如 InMemorySessionService，它适用于测试，但不提供跨应用重启的数据持久性。

python

# Example: Using InMemorySessionService
# This is suitable for local development and testing where data
# persistence across application restarts is not required.
from google.adk.sessions import InMemorySessionService

session_service = InMemorySessionService()

如果你需要可靠地持久化到你管理的数据库中，则有 DatabaseSessionService。

python

# Example: Using DatabaseSessionService
# This is suitable for production or development requiring persistent storage.
# You need to configure a database URL (e.g., for SQLite, PostgreSQL, etc.).
# Requires: pip install google-adk[sqlalchemy] and a database driver (e.g., psycopg2 for PostgreSQL)
from google.adk.sessions import DatabaseSessionService

# Example using a local SQLite file:
db_url = "sqlite:///./my_agent_data.db"
session_service = DatabaseSessionService(db_url=db_url)

此外，还有 VertexAiSessionService，它使用 Vertex AI 基础设施在 Google Cloud 上实现可扩展的生产部署。

python

# Example: Using VertexAiSessionService
# This is suitable for scalable production on Google Cloud Platform, leveraging
# Vertex AI infrastructure for session management.
# Requires: pip install google-adk[vertexai] and GCP setup/authentication
from google.adk.sessions import VertexAiSessionService

PROJECT_ID = "your-gcp-project-id"  # Replace with your GCP project ID
LOCATION = "us-central1"  # Replace with your desired GCP location
# The app_name used with this service should correspond to the Reasoning Engine ID or name
REASONING_ENGINE_APP_NAME = "projects/your-gcp-project-id/locations/us-central1/reasoningEngines/your-engine-id"  # Replace with your Reasoning Engine resource name
session_service = VertexAiSessionService(project=PROJECT_ID, location=LOCATION)
# When using this service, pass REASONING_ENGINE_APP_NAME to service methods:
# session_service.create_session(app_name=REASONING_ENGINE_APP_NAME, ...)
# session_service.get_session(app_name=REASONING_ENGINE_APP_NAME, ...)
# session_service.append_event(session, event, app_name=REASONING_ENGINE_APP_NAME)
# session_service.delete_session(app_name=REASONING_ENGINE_APP_NAME, ...)

选择合适的 SessionService 至关重要，因为它决定了 Agent 的交互历史和临时数据的存储方式以及持久性。

每次消息交换涉及一个循环过程：接收消息，Runner 使用 SessionService 检索或建立 Session，Agent 使用 Session 的上下文（state 和历史交互）处理消息，Agent 生成响应并可能更新 state，Runner 将其封装为一个 Event，session_service.append_event 方法将新事件记录下来并更新存储中的状态。Session 随后等待下一条消息。理想情况下，当交互结束时，使用 delete_session 方法终止会话。此过程说明了 SessionService 如何通过管理 Session 特定的历史和临时数据来维护连续性。

State：Session 的便签本

在 ADK 中，每个代表聊天线程的 Session 都包含一个 state 组件，类似于 Agent 在该特定对话期间的临时工作记忆。虽然 session.events 记录了整个聊天历史，但 session.state 存储和更新与活跃聊天相关的动态数据点。

从根本上说，session.state 作为一个字典运作，以键值对的形式存储数据。其核心功能是使 Agent 能够保留和管理对连贯对话至关重要的细节，例如用户偏好、任务进度、增量数据收集或影响后续 Agent 行动的条件标志。

State 的结构由字符串键与可序列化 Python 类型的值配对组成，包括字符串、数字、布尔值、列表和包含这些基本类型的字典。State 是动态的，在整个对话过程中不断演变。这些更改的持久性取决于配置的 SessionService。

State 组织可以通过使用键前缀来定义数据范围和持久性。没有前缀的键是会话特有的。

user: 前缀将数据与跨所有会话的用户 ID 关联。
app: 前缀指定在应用程序的所有用户之间共享的数据。
temp: 前缀表示仅对当前处理轮次有效且不持久存储的数据。

Agent 通过单个 session.state 字典访问所有 state 数据。SessionService 处理数据检索、合并和持久化。State 应在通过 session_service.append_event() 向会话历史添加 Event 时更新。这确保了准确的跟踪、持久服务中的正确保存以及状态更改的安全处理。

简单方式：使用 output_key（用于 Agent 文本回复）：如果你只想直接将 Agent 的最终文本响应保存到 state 中，这是最简单的方法。设置 LlmAgent 时，只需告诉它想要使用的 output_key。Runner 看到此设置后，会在追加事件时自动创建必要的动作，将响应保存到 state。让我们看一个通过 output_key 演示 state 更新的代码示例。

python

# Import necessary classes from the Google Agent Developer Kit (ADK)
from google.adk.agents import LlmAgent
from google.adk.sessions import InMemorySessionService, Session
from google.adk.runners import Runner
from google.genai.types import Content, Part

# Define an LlmAgent with an output_key.
greeting_agent = LlmAgent(
    name="Greeter",
    model="gemini-2.0-flash",
    instruction="Generate a short, friendly greeting.",
    output_key="last_greeting"
)

# --- Setup Runner and Session ---
app_name, user_id, session_id = "state_app", "user1", "session1"
session_service = InMemorySessionService()
runner = Runner(
    agent=greeting_agent,
    app_name=app_name,
    session_service=session_service
)
session = session_service.create_session(
    app_name=app_name,
    user_id=user_id,
    session_id=session_id
)
print(f"Initial state: {session.state}")

# --- Run the Agent ---
user_message = Content(parts=[Part(text="Hello")])
print("\n--- Running the agent ---")
for event in runner.run(
    user_id=user_id,
    session_id=session_id,
    new_message=user_message
):
    if event.is_final_response():
        print("Agent responded.")

# --- Check Updated State ---
# Correctly check the state *after* the runner has finished processing all events.
updated_session = session_service.get_session(app_name, user_id, session_id)
print(f"\nState after agent run: {updated_session.state}")

在幕后，Runner 看到你的 output_key，并在调用 append_event 时自动创建带有 state_delta 的必要动作。

标准方式：使用 EventActions.state_delta（用于更复杂的更新）：当你需要执行更复杂的操作时——例如一次更新多个键、保存文本以外的内容、针对特定范围如 user: 或 app:，或进行与 Agent 最终文本回复无关的更新——你需要手动构建一个状态更改字典（state_delta），并将其包含在你正在追加的 Event 的 EventActions 中。让我们看一个示例：

python

import time
from google.adk.tools.tool_context import ToolContext
from google.adk.sessions import InMemorySessionService

# --- Define the Recommended Tool-Based Approach ---
def log_user_login(tool_context: ToolContext) -> dict:
    """
    Updates the session state upon a user login event.
    This tool encapsulates all state changes related to a user login.
    Args:
        tool_context: Automatically provided by ADK, gives access to session state.
    Returns:
        A dictionary confirming the action was successful.
    """
    # Access the state directly through the provided context.
    state = tool_context.state
    # Get current values or defaults, then update the state.
    # This is much cleaner and co-locates the logic.
    login_count = state.get("user:login_count", 0) + 1
    state["user:login_count"] = login_count
    state["task_status"] = "active"
    state["user:last_login_ts"] = time.time()
    state["temp:validation_needed"] = True
    print("State updated from within the `log_user_login` tool.")
    return {
        "status": "success",
        "message": f"User login tracked. Total logins: {login_count}."
    }

# --- Demonstration of Usage ---
# In a real application, an LLM Agent would decide to call this tool.
# Here, we simulate a direct call for demonstration purposes.

# 1. Setup
session_service = InMemorySessionService()
app_name, user_id, session_id = "state_app_tool", "user3", "session3"
session = session_service.create_session(
    app_name=app_name,
    user_id=user_id,
    session_id=session_id,
    state={"user:login_count": 0, "task_status": "idle"}
)
print(f"Initial state: {session.state}")

# 2. Simulate a tool call (in a real app, the ADK Runner does this)
# We create a ToolContext manually just for this standalone example.
from google.adk.tools.tool_context import InvocationContext
mock_context = ToolContext(
    invocation_context=InvocationContext(
        app_name=app_name, user_id=user_id, session_id=session_id,
        session=session, session_service=session_service
    )
)

# 3. Execute the tool
log_user_login(mock_context)

# 4. Check the updated state
updated_session = session_service.get_session(app_name, user_id, session_id)
print(f"State after tool execution: {updated_session.state}")
# Expected output will show the same state change as the
# "Before" case,
# but the code organization is significantly cleaner
# and more robust.

此代码演示了一种基于工具的方法来管理应用中的用户会话状态。它定义了一个函数 log_user_login，该函数充当一个工具，负责在用户登录时更新会话状态。

该函数接受一个由 ADK 提供的 ToolContext 对象，用于访问和修改会话的 state 字典。在工具内部，它递增 user:login_count，将 task_status 设置为 "active"，记录 user:last_login_ts（时间戳），并添加一个临时标志 temp:validation_needed。

代码的演示部分模拟了此工具的使用方式。它设置了一个内存会话服务，并创建了一个带有某些预定义状态的初始会话。然后手动创建一个 ToolContext 来模拟 ADK Runner 执行工具的环境。使用此模拟上下文调用 log_user_login 函数。最后，代码再次检索会话以显示状态已通过工具的执行而更新。目标是展示将状态更改封装在工具中如何使代码更清晰、更有组织，相比之下，直接在工具外部操作状态则不够理想。

请注意，在检索会话后直接修改 session.state 字典是强烈不推荐的，因为这会绕过标准事件处理机制。这种直接更改不会被记录在会话的事件历史中，可能不会被所选的 SessionService 持久化，可能导致并发问题，并且不会更新关键元数据如时间戳。推荐的更新会话状态方法是使用 LlmAgent 上的 output_key 参数（专门用于 Agent 的最终文本响应），或通过 session_service.append_event() 追加事件时将状态更改包含在 EventActions.state_delta 中。session.state 应主要用于读取现有数据。

回顾一下，设计 state 时应保持简单，使用基本数据类型，给键起清晰的名称并正确使用前缀，避免深层嵌套，并始终通过 append_event 流程更新 state。

Memory：使用 MemoryService 的长期知识管理

在 Agent 系统中，Session 组件维护当前聊天历史（events）和特定于单个对话的临时数据（state）的记录。然而，为了让 Agent 跨多个交互保留信息或访问外部数据，需要进行长期知识管理。这由 MemoryService 来实现。

python

# Example: Using InMemoryMemoryService
# This is suitable for local development and testing where data
# persistence across application restarts is not required.
# Memory content is lost when the app stops.
from google.adk.memory import InMemoryMemoryService

memory_service = InMemoryMemoryService()

Session 和 State 可以概念化为单个聊天会话的短期记忆，而由 MemoryService 管理的长期知识则充当持久且可搜索的存储库。此存储库可以包含来自多个过去交互或外部来源的信息。MemoryService，由 BaseMemoryService 接口定义，为管理此可搜索的长期知识建立了标准。其主要功能包括：添加信息（使用 add_session_to_memory 方法从会话中提取内容并存储）和检索信息（允许 Agent 查询存储并使用 search_memory 方法接收相关数据）。

ADK 提供了几种创建此长期知识存储的实现方式。InMemoryMemoryService 提供了适合测试目的的临时存储解决方案，但数据不会跨应用重启保留。对于生产环境，通常使用 VertexAiRagMemoryService。该服务利用 Google Cloud 的 Retrieval Augmented Generation (RAG) 服务，实现可扩展、持久和语义搜索能力（另请参见关于 RAG 的第 14 章）。

python

# Example: Using VertexAiRagMemoryService
# This is suitable for scalable production on GCP, leveraging
# Vertex AI RAG (Retrieval Augmented Generation) for persistent,
# searchable memory.
# Requires: pip install google-adk[vertexai], GCP setup/authentication,
# and a Vertex AI RAG Corpus.
from google.adk.memory import VertexAiRagMemoryService

# The resource name of your Vertex AI RAG Corpus
RAG_CORPUS_RESOURCE_NAME = "projects/your-gcp-project-id/locations/us-central1/ragCorpora/your-corpus-id"  # Replace with your Corpus resource name

# Optional configuration for retrieval behavior
SIMILARITY_TOP_K = 5  # Number of top results to retrieve
VECTOR_DISTANCE_THRESHOLD = 0.7  # Threshold for vector similarity

memory_service = VertexAiRagMemoryService(
    rag_corpus=RAG_CORPUS_RESOURCE_NAME,
    similarity_top_k=SIMILARITY_TOP_K,
    vector_distance_threshold=VECTOR_DISTANCE_THRESHOLD
)

# When using this service, methods like add_session_to_memory
# and search_memory will interact with the specified Vertex AI
# RAG Corpus.

Hands-On Code：LangChain 和 LangGraph 中的 Memory Management

在 LangChain 和 LangGraph 中，Memory 是创建智能、自然感对话应用的关键组件。它使 AI Agent 能够记住过去交互中的信息，从反馈中学习，并适应用户偏好。LangChain 的 memory 功能通过引用存储的历史来丰富当前提示，然后记录最新交换以供将来使用，从而为此奠定了基础。随着 Agent 处理更复杂的任务，这种能力对于效率和提高用户满意度变得至关重要。

短期记忆：这是线程范围的（thread-scoped），意味着它跟踪单个会话或线程中正在进行的对话。它提供即时上下文，但完整的历史可能挑战 LLM 的上下文窗口，可能导致错误或性能下降。LangGraph 将短期记忆作为 Agent 状态的一部分进行管理，通过 checkpoint 器持久化，允许线程随时恢复。
长期记忆：这存储跨会话的特定于用户或应用级数据，并在对话线程之间共享。它保存在自定义的 "namespaces" 中，可以在任何线程中随时回忆。LangGraph 提供了 store 来保存和回忆长期记忆，使 Agent 能够无限期保留知识。

LangChain 提供了几种管理对话历史的工具，从手动控制到链内自动化集成。

ChatMessageHistory：手动 Memory Management。对于在正式链外直接且简单地控制对话历史，ChatMessageHistory 类是理想选择。它允许手动跟踪对话交换。

python

from langchain.memory import ChatMessageHistory

# Initialize the history object
history = ChatMessageHistory()

# Add user and AI messages
history.add_user_message("I'm heading to New York next week.")
history.add_ai_message("Great! It's a fantastic city.")

# Access the list of messages
print(history.messages)

ConversationBufferMemory：链的自动化 Memory。要将 Memory 直接集成到链中，ConversationBufferMemory 是一个常见选择。它保存对话缓冲区，并使其可用于你的提示。其行为可以用两个关键参数自定义：

memory_key：一个字符串，指定提示中将保存聊天历史的变量名称，默认为 "history"。
return_messages：一个布尔值，决定历史的格式。
- 如果为 False（默认），它返回单个格式化字符串，这适合标准 LLM。
- 如果为 True，它返回一个消息对象列表，这是 Chat Models 的推荐格式。

python

from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory()

# Save a conversation turn
memory.save_context({"input": "What's the weather like?"}, {"output": "It's sunny today."})

# Load the memory as a string
print(memory.load_memory_variables({}))

将这种 Memory 集成到 LLMChain 中允许模型访问对话历史并提供上下文相关的响应。

python

from langchain_openai import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory

# 1. Define LLM and Prompt
llm = OpenAI(temperature=0)
template = """You are a helpful travel agent.
Previous conversation:
{history}
New question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)

# 2. Configure Memory
# The memory_key "history" matches the variable in the prompt
memory = ConversationBufferMemory(memory_key="history")

# 3. Build the Chain
conversation = LLMChain(llm=llm, prompt=prompt, memory=memory)

# 4. Run the Conversation
response = conversation.predict(question="I want to book a flight.")
print(response)
response = conversation.predict(question="My name is Sam, by the way.")
print(response)
response = conversation.predict(question="What was my name again?")
print(response)

为了提高与 chat models 的交互效果，建议通过设置 return_messages=True 来使用结构化的消息对象列表。

python

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

# 1. Define Chat Model and Prompt
llm = ChatOpenAI()
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template("You are a friendly assistant."),
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{question}")
    ]
)

# 2. Configure Memory
# return_messages=True is essential for chat models
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# 3. Build the Chain
conversation = LLMChain(llm=llm, prompt=prompt, memory=memory)

# 4. Run the Conversation
response = conversation.predict(question="Hi, I'm Jane.")
print(response)
response = conversation.predict(question="Do you remember my name?")
print(response)

长期记忆的类型

长期记忆使系统能够跨不同对话保留信息，提供更深层的上下文和个性化。它可以被分为三种类型，类似于人类记忆：

语义记忆（Semantic Memory）：记忆事实：这涉及保留特定事实和概念，例如用户偏好或领域知识。它用于为 Agent 的响应提供基础，从而实现更具个性化和相关的交互。这些信息可以作为持续更新的用户"profile"（JSON 文档）或作为单独事实性文档的"collection"来管理。
情景记忆（Episodic Memory）：记忆经验：这涉及回忆过去的事件或行为。对于 AI Agent，情景记忆通常用于记住如何完成任务。在实践中，它经常通过 few-shot example prompting 来实现，Agent 从过去成功的交互序列中学习，以正确执行任务。
程序记忆（Procedural Memory）：记忆规则：这是关于如何执行任务的记忆——Agent 的核心指令和行为，通常包含在其 system prompt 中。Agent 修改自己的 system prompt 以适应和改进是很常见的。一种有效的技术是 "Reflection"，即向 Agent 输入其当前指令和最近的交互，然后要求其优化自己的指令。

下面是伪代码，演示 Agent 如何使用 Reflection 更新其存储在 LangGraph BaseStore 中的程序记忆：

python

# Node that updates the agent's instructions
def update_instructions(state: State, store: BaseStore):
    namespace = ("instructions",)
    # Get the current instructions from the store
    current_instructions = store.search(namespace)[0]

    # Create a prompt to ask the LLM to reflect on the conversation
    # and generate new, improved instructions
    prompt = prompt_template.format(
        instructions=current_instructions.value["instructions"],
        conversation=state["messages"]
    )
    # Get the new instructions from the LLM
    output = llm.invoke(prompt)
    new_instructions = output['new_instructions']
    # Save the updated instructions back to the store
    store.put(("agent_instructions",), "agent_a", {"instructions": new_instructions})

# Node that uses the instructions to generate a response
def call_model(state: State, store: BaseStore):
    namespace = ("agent_instructions", )
    # Retrieve the latest instructions from the store
    instructions = store.get(namespace, key="agent_a")[0]
    # Use the retrieved instructions to format the prompt
    prompt = prompt_template.format(instructions=instructions.value["instructions"])
    # ... application logic continues

LangGraph 将长期记忆作为 JSON 文档存储在 store 中。每个 memory 都在自定义的 namespace（类似文件夹）和独特的 key（类似文件名）下组织。这种层级结构允许对信息进行轻松的组织和检索。以下代码演示了如何使用 InMemoryStore 来 put、get 和 search 记忆。

python

from langgraph.store.memory import InMemoryStore

# A placeholder for a real embedding function
def embed(texts: list[str]) -> list[list[float]]:
    # In a real application, use a proper embedding model
    return [[1.0, 2.0] for _ in texts]

# Initialize an in-memory store. For production, use a database-backed store.
store = InMemoryStore(index={"embed": embed, "dims": 2})

# Define a namespace for a specific user and application context
user_id = "my-user"
application_context = "chitchat"
namespace = (user_id, application_context)

# 1. Put a memory into the store
store.put(
    namespace,
    "a-memory",  # The key for this memory
    {
        "rules": [
            "User likes short, direct language",
            "User only speaks English & python",
        ],
        "my-key": "my-value",
    },
)

# 2. Get the memory by its namespace and key
item = store.get(namespace, "a-memory")
print("Retrieved Item:", item)

# 3. Search for memories within the namespace, filtering by content
# and sorting by vector similarity to the query.
items = store.search(
    namespace,
    filter={"my-key": "my-value"},
    query="language preferences"
)
print("Search Results:", items)

Vertex Memory Bank

Memory Bank 是 Vertex AI Agent Engine 中的一项托管服务，为 Agent 提供持久化的长期记忆。该服务使用 Gemini 模型异步分析对话历史，提取关键事实和用户偏好。

这些信息被持久化存储，按照用户 ID 等定义的作用域组织，并智能更新以整合新数据并解决矛盾。在开始新会话时，Agent 通过完整数据回传或使用 embedding 的相似性搜索来检索相关记忆。此过程使 Agent 能够跨会话保持连续性，并根据回忆的信息个性化响应。

Agent 的 Runner 与首先初始化的 VertexAiMemoryBankService 交互。该服务处理在 Agent 对话期间生成的记忆的自动存储。每个记忆都标记有唯一的 USER_ID 和 APP_NAME，确保将来能够准确检索。

python

from google.adk.memory import VertexAiMemoryBankService

agent_engine_id = agent_engine.api_resource.name.split("/")[-1]
memory_service = VertexAiMemoryBankService(
    project="PROJECT_ID",
    location="LOCATION",
    agent_engine_id=agent_engine_id
)

session = await session_service.get_session(
    app_name=app_name,
    user_id="USER_ID",
    session_id=session.id
)
await memory_service.add_session_to_memory(session)

Memory Bank 提供与 Google ADK 的无缝集成，提供即时开箱即用的体验。对于其他 Agent 框架的用户，如 LangGraph 和 CrewAI，Memory Bank 也通过直接 API 调用提供支持。展示这些集成的在线代码示例可供感兴趣的读者获取。

At a Glance

What（问题）：Agent 系统需要记住过去交互中的信息以执行复杂任务并提供连贯的体验。没有 Memory 机制，Agent 是无状态的，无法维持对话上下文、从经验中学习或为用户个性化响应。这从根本上将它们限制为简单的一次性交互，无法处理多步过程或不断变化的用户需求。核心问题是如何有效管理单个对话的即时临时信息和随时间积累的庞大持久知识。

Why（解决方案）：标准化解决方案是实现一个区分短期和长期存储的双组件 Memory 系统。短期上下文记忆在 LLM 的上下文窗口中保存最近的交互数据，以维持对话流。对于必须持久化的信息，长期记忆解决方案使用外部数据库（通常是向量数据库），以实现高效的语义检索。像 Google ADK 这样的 Agent 框架提供了特定组件来管理这一点，例如用于对话线程的 Session 和用于其临时数据的 State。一个专门的 MemoryService 用于与长期知识库交互，允许 Agent 检索并将相关的过去信息整合到当前上下文中。

经验法则（Rule of thumb）：当 Agent 需要做的不仅仅是回答单个问题时，使用此模式。对于必须在整个对话中维持上下文、跟踪多步任务进度或通过回忆用户偏好和历史来个性化交互的 Agent 而言，这是必不可少的。每当期望 Agent 基于过去的成功、失败或新获得的信息进行学习或适应时，就应当实现 Memory Management。

视觉总结

图1：Memory Management 设计模式

Key Takeaways

快速回顾 Memory Management 的要点：

Memory 对于 Agent 跟踪事物、学习和个性化交互非常重要。
对话式 AI 依赖短期记忆（用于单个聊天中的即时上下文）和长期记忆（用于跨多个会话的持久知识）。
短期记忆（即时内容）是临时的，通常受限于 LLM 的上下文窗口或框架传递上下文的方式。
长期记忆（持久保存的内容）使用外部存储（如向量数据库）在不同聊天之间保存信息，并通过搜索方式访问。
像 ADK 这样的框架具有特定部分，如 Session（聊天线程）、State（临时聊天数据）和 MemoryService（可搜索的长期知识）来管理 Memory。
ADK 的 SessionService 处理聊天会话的整个生命周期，包括其历史（events）和临时数据（state）。
ADK 的 session.state 是用于临时聊天数据的字典。前缀（user:、app:、temp:）指示数据所属范围以及是否持久保存。
在 ADK 中，应通过添加事件时使用 EventActions.state_delta 或 output_key 来更新 state，而不是直接更改 state 字典。
ADK 的 MemoryService 用于将信息放入长期存储并允许 Agent 搜索，通常使用工具实现。
LangChain 提供实用工具如 ConversationBufferMemory，自动将单个对话的历史注入提示中，使 Agent 能够回忆即时上下文。
LangGraph 通过使用 store 来保存和检索语义事实、情景经验甚至可更新的程序规则（跨不同用户会话），实现高级长期 Memory。
Memory Bank 是一项托管服务，通过自动提取、存储和回忆特定于用户的信息，为 Agent 提供持久化的长期记忆，从而在 Google ADK、LangGraph 和 CrewAI 等框架之间实现个性化的连续对话。

结论

本章深入探讨了 Agent 系统 Memory Management 的重要工作，展示了短暂上下文与持久知识之间的区别。我们讨论了这些 Memory 类型的设置方式，以及它们在构建能够记住事物的更智能 Agent 中的应用场景。我们详细介绍了 Google ADK 如何提供像 Session、State 和 MemoryService 这样的特定组件来处理这一问题。既然我们已经涵盖了 Agent 如何记住事物（无论是短期还是长期），我们可以继续探讨它们如何学习和适应。下一个模式 "Learning and Adaptation" 关注的是 Agent 基于新的经验或数据，改变其思维方式、行为方式或知识储备的能力。

参考文献

ADK Memory, https://google.github.io/adk-docs/sessions/memory/
LangGraph Memory, https://langchain-ai.github.io/langgraph/concepts/memory/
Vertex AI Agent Engine Memory Bank, https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-memory-bank-in-public-preview