Agent Memory Ultimate
Cut your agent's token usage by 60-80%. Stop reading entire files into context โ store memories once, recall only what's relevant. Vector search, knowledge graphs, automatic decay, and RAPTOR hierarchies give your agent human-like memory that actually scales.
Zero cloud APIs. Zero costs. Everything runs locally.
Why Token Savings Matter
Every time your agent reads MEMORY.md or a daily log, that's thousands of tokens burned on context. Most of it is irrelevant to the current question.
| Approach | Tokens per recall | Precision |
|---|---|---|
| Read entire MEMORY.md | 3,000-10,000 | ~5% relevant |
| Read daily log + MEMORY.md | 5,000-20,000 | ~10% relevant |
mem.py recall "query" |
200-800 | ~80% relevant |
mem.py primed-recall (context-aware) |
300-1,000 | ~90% relevant |
On a typical agent doing 50 recalls/day:
- Old way: ~500K tokens/day on memory reads
- With v3: ~30K tokens/day
- Savings: ~470K tokens/day (~$7-14/day on API, or significant Claude Max headroom)
The agent gets better answers with less context pollution.
What's New in v3
v2 was markdown files + FTS5 keyword search. v3 adds a full cognitive architecture:
| Feature | v2 | v3 |
|---|---|---|
| Storage | Markdown files | SQLite + sqlite-vec embeddings |
| Search | FTS5 keyword only | Hybrid (vector + keyword + BM25) |
| Recall | Read entire files | Precise snippets via 6 strategies |
| Associations | None | Knowledge graph with multi-hop traversal |
| Hierarchy | Flat files | RAPTOR tree (L0โL1โL2โL3 abstraction) |
| Decay | Manual cleanup | Automatic strength decay + half-life |
| Consolidation | Cron text prompt | Clustering + merging + hierarchy rebuild |
| Context priming | None | Spreading activation from conversation |
| Sharing | None | Cross-agent with sensitivity gates |
| Token cost | High (full file reads) | Low (targeted retrieval) |
v2 still works. Markdown files, daily logs, MEMORY.md โ all unchanged. v3 adds a parallel cognitive layer that dramatically reduces token waste.
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Agent Session โ
โ (reads MEMORY.md, daily logs) โ โ v2 (unchanged)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ mem.py CLI โ โ v3 (NEW)
โ store / recall / consolidate โ
โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโค
โ sqlite-vecโ FTS5 โ Association โ
โ (vectors) โ(keyword)โ Graph โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโค
โ memory.db โ
โ memories, associations, hierarchy โ
โ shares, embeddings (384-dim) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Embedding model:
all-MiniLM-L6-v2(384 dimensions, ~80MB, ONNX) - Vector search:
sqlite-vecโ no Pinecone, no Weaviate, no cloud - Database:
db/memory.db(one file, portable, backupable)
Quick Start
# 1. Initialize database
python3 scripts/mem.py init
# 2. Start the local embedding server (port 9999)
bash scripts/start-memory.sh
# 3. Store a memory
python3 scripts/mem.py store "Oscar prefers wired home automation" --type semantic --importance 0.8
# 4. Recall it (200 tokens instead of 10,000)
python3 scripts/mem.py recall "home automation preferences"
That's it. Your agent now has precise memory retrieval.
6 Recall Strategies
Not all queries are the same. Use the right strategy:
| Strategy | Best For | Command | When |
|---|---|---|---|
| Hybrid (default) | General recall | mem.py recall "query" |
Most queries |
| Vector | Semantic similarity | recall --strategy vector |
Fuzzy, conceptual |
| Keyword | Exact terms, IDs | recall --strategy keyword |
File names, codes |
| Adaptive | Auto-selects detail | mem.py recall-adaptive "query" |
Exploratory |
| Graph | Follow connections | mem.py recall-assoc "query" --hops 2 |
Related concepts |
| Primed | Context-aware | mem.py primed-recall "q" --context "..." |
Mid-conversation |
Primed recall is the killer feature โ pass the last 2-3 user messages as context and results are biased toward what's conversationally relevant. This is how human memory works.
CLI Reference
Core Commands
| Command | Description |
|---|---|
mem.py init |
Create database schema |
mem.py migrate |
Import existing documents from jarvis.db |
mem.py store <content> [--type TYPE] [--source SRC] [--importance N] |
Store a memory |
mem.py recall <query> [--strategy hybrid|vector|keyword] [--limit N] |
Search memories |
mem.py forget <id|--query QUERY> |
Soft-delete (strength โ 0) |
mem.py hard-delete <id> |
Permanently remove |
mem.py stats |
Database statistics |
Memory types: episodic (events), semantic (facts), procedural (how-to), preference (user preferences)
Knowledge Graph
| Command | Description |
|---|---|
mem.py associate <src_id> <tgt_id> [--type TYPE] [--weight N] |
Link two memories |
mem.py links <memory_id> |
Show all associations |
mem.py recall-assoc <query> [--hops N] [--limit N] |
Multi-hop traversal |
mem.py graph-stats |
Graph statistics |
Edge types: related, caused_by, part_of, contradicts, temporal, supports
RAPTOR Hierarchy
| Command | Description |
|---|---|
mem.py build-hierarchy [--levels N] |
Build abstraction tree |
mem.py recall-adaptive <query> [--detail auto|broad|specific|0-3] |
Recall at right abstraction level |
mem.py hierarchy-stats |
Show hierarchy structure |
Broad queries ("what do I know about AI?") match high-level summaries. Specific queries ("sqlite-vec version") match leaf nodes. Adaptive auto-selects.
Spreading Activation
| Command | Description |
|---|---|
mem.py primed-recall <query> [--context 'text1' 'text2'] [--limit N] |
Context-primed recall |
Cross-Agent Sharing
| Command | Description |
|---|---|
mem.py share <memory_id> --with <agent> [--sensitivity N] |
Share a memory |
mem.py shared [--from AGENT] [--to AGENT] |
List shares |
mem.py revoke <share_id> | --memory <id> |
Revoke access |
Sensitivity levels: 0 (public) โ 5 (top secret). Default gate: โค 3. Both agents must consent.
Maintenance
| Command | Description |
|---|---|
mem.py consolidate [--days N] [--decay-only] |
Full consolidation cycle |
Runs: decay โ cluster โ merge duplicates โ rebuild hierarchy.
Daily Cycle
Wake Up
1. Read SOUL.md, USER.md (identity โ small, always load)
2. Read today's daily log (recent context)
3. Use `mem.py primed-recall "session start"` for relevant memories
โ Gets 200-800 tokens of precisely relevant context
โ Instead of 10,000+ tokens from reading MEMORY.md cover-to-cover
During Day
# Store important facts as they come up
mem.py store "Client meeting moved to March 1" --type episodic --importance 0.7
# Before answering from memory โ recall, don't read files
mem.py recall "client meeting schedule"
# Link related memories when you notice connections
mem.py associate 42 87 --type related
Sleep Cycle (Nightly Consolidation)
{
"schedule": { "kind": "cron", "expr": "0 3 * * *", "tz": "America/Los_Angeles" },
"payload": {
"kind": "agentTurn",
"message": "Run memory consolidation: python3 scripts/mem.py consolidate --days 7"
},
"sessionTarget": "isolated"
}
What consolidation does:
- Decay โ Unaccessed memories lose strength (half-life: 30 days)
- Cluster โ Groups similar memories
- Merge โ Combines near-duplicates (saves storage + tokens)
- Rebuild hierarchy โ Updates RAPTOR tree for better adaptive recall
File Structure
workspace/
โโโ MEMORY.md # Long-term curated memory (v2, still works)
โโโ SOUL.md # Identity & personality
โโโ USER.md # Human profile
โโโ db/
โ โโโ agent.db # Contacts/history (v2)
โ โโโ memory.db # Cognitive memory (v3)
โโโ bank/
โ โโโ entities/ # People profiles
โ โโโ contacts.md # Quick contact reference
โ โโโ opinions.md # Preferences, beliefs
โโโ memory/
โโโ YYYY-MM-DD.md # Daily logs
โโโ projects/ # Project notes
โโโ knowledge/ # Topic docs
Comparison: Why This Over Alternatives
| Feature | Read files (default) | Basic RAG | Agent Memory Ultimate v3 |
|---|---|---|---|
| Token cost per recall | 3K-20K | 500-2K | 200-800 |
| Precision | ~5-10% | ~50% | ~80-90% |
| Associations | โ | โ | โ Knowledge graph |
| Abstraction levels | โ | โ | โ RAPTOR hierarchy |
| Context priming | โ | โ | โ Spreading activation |
| Automatic decay | โ | โ | โ Configurable half-life |
| Cross-agent sharing | โ | โ | โ With sensitivity gates |
| Cloud dependency | None | Usually | None โ fully local |
| Setup complexity | Zero | High | mem.py init + one script |
Data Sources (v2 Importers)
| Source | Command |
|---|---|
| WhatsApp contacts/groups | python3 scripts/sync_whatsapp.py |
| ChatGPT conversations | python3 scripts/init_db.py (auto-detects chatgpt-export/) |
| Phone contacts (VCF) | python3 scripts/import_vcf.py contacts.vcf |
| Existing markdown files | python3 scripts/mem.py migrate |
Dependencies
pip3 install scipy tokenizers onnxruntime numpy sqlite-vec
That's it. No Pinecone. No Weaviate. No Docker. No API keys. ~80MB embedding model downloaded once.
Cognitive Science
This isn't arbitrary architecture โ it's modeled on how human memory actually works:
| Human Process | Agent Equivalent |
|---|---|
| Working memory | primed-recall (conversation-biased) |
| Long-term declarative | store + vector embeddings |
| Episodic memory | Daily logs + episodic memories |
| Semantic memory | MEMORY.md + semantic memories |
| Sleep consolidation | consolidate (decay + cluster + merge) |
| Forgetting curve | Strength decay with half-life |
| Association | Knowledge graph with typed edges |
| Abstraction | RAPTOR hierarchy (concrete โ abstract) |
| Spreading activation | Context primes related memories |
Tips
- Importance scores matter โ 0.9+ for core identity, 0.5 for routine facts. Higher = survives decay longer.
- Associate liberally โ More edges = better graph traversal. Link when you notice connections.
- Let decay work โ Forgetting is a feature. Unaccessed memories fading keeps recall precise.
- Primed recall for conversations โ Pass last 2-3 messages as context. Dramatically better results.
- Rebuild hierarchy weekly โ
build-hierarchyafter storing many new memories. - Start with hybrid โ Only switch to pure vector/keyword when hybrid misses.
Credits
Created by Oscar Serra with the help of Claude (Anthropic).
Because waking fresh each session shouldn't mean burning 20K tokens to remember who you are.