Skip to content

Memory

Ash memory keeps conversation context usable across long chats and stores reusable facts when extraction is enabled.

Memory In 30 Seconds

Memory in Ash has three layers:

  • recent conversation context (always)
  • structured memory entries in the graph store
  • semantic retrieval via embeddings (optional)

Use defaults first. Tune only if context quality or extraction behavior needs adjustment.

Quick Start

Start with this baseline:

[memory]
context_token_budget = 100000
recency_window = 10
compaction_enabled = true
extraction_enabled = true
extraction_confidence_threshold = 0.7
extraction_context_messages = 8
extraction_verification_enabled = true

Verify memory tools are working:

Terminal window
uv run ash memory stats
uv run ash memory list
uv run ash memory search "project ideas"

Configure Memory

Common options:

[memory]
max_context_messages = 20 # Hard cap for recent message count
context_token_budget = 100000 # Target token budget for assembled context
recency_window = 10 # Always keep this many most-recent messages
system_prompt_buffer = 8000 # Reserved budget for prompt/tools metadata
compaction_enabled = true # Summarize older history when over budget
auto_gc = true # Run cleanup on startup
max_entries = null # Optional cap for total stored memories
# Background extraction
extraction_enabled = true
extraction_model = null # Optional model alias override
extraction_min_message_length = 20
extraction_debounce_seconds = 30
extraction_context_messages = 8
extraction_confidence_threshold = 0.7
# Verification pass for extracted facts
extraction_verification_enabled = true
extraction_verification_model = null # alias or provider model; null -> default

Extraction Model Patterns

Use a cheaper extractor with stronger verification:

[models.default]
provider = "openai"
model = "gpt-5.2"
[models.fast]
provider = "openai"
model = "gpt-5.2-mini"
[memory]
extraction_model = "fast"
extraction_verification_enabled = true
extraction_verification_model = "default"

Use a provider-qualified verification model directly:

[memory]
extraction_verification_enabled = true
extraction_verification_model = "openai:gpt-5.2"

Use an unqualified provider model name (resolved against default provider):

[memory]
extraction_verification_enabled = true
extraction_verification_model = "gpt-5.2"

Enable Semantic Retrieval (Embeddings)

[embeddings]
provider = "openai"
model = "text-embedding-3-small"

Troubleshooting

Memory search returns nothing

Terminal window
uv run ash memory list
uv run ash memory search "your query"
uv run ash memory stats

If store is empty, add a test fact and search again.

Extraction seems too noisy

Raise confidence threshold or reduce extraction frequency:

[memory]
extraction_confidence_threshold = 0.85
extraction_debounce_seconds = 60

Important context gets dropped

Increase budget and recency window:

[memory]
context_token_budget = 140000
recency_window = 15

Memory quality regresses after model changes

Pin extraction and verification models explicitly:

[memory]
extraction_model = "mini"
extraction_verification_model = "default"

Reference (Advanced)

High-level runtime flow:

  1. Incoming message is written to context.
  2. History is selected with recency and token-budget rules.
  3. Semantic retrieval (if enabled) injects relevant stored memories.
  4. Background extraction can write new structured memories.

CLI surface:

Terminal window
uv run ash memory list
uv run ash memory search "..."
uv run ash memory show <id>
uv run ash memory add -q "..."
uv run ash memory remove <id>
uv run ash memory gc
uv run ash memory stats