Source Code

Agent Memory Ultimate

Cut your agent's token usage by 60-80%. Stop reading entire files into context — store memories once, recall only what's relevant. Vector search, knowledge graphs, automatic decay, and RAPTOR hierarchies give your agent human-like memory that actually scales.

Zero cloud APIs. Zero costs. Everything runs locally.

Why Token Savings Matter

Every time your agent reads MEMORY.md or a daily log, that's thousands of tokens burned on context. Most of it is irrelevant to the current question.

Approach	Tokens per recall	Precision
Read entire MEMORY.md	3,000-10,000	~5% relevant
Read daily log + MEMORY.md	5,000-20,000	~10% relevant
`mem.py recall "query"`	200-800	~80% relevant
`mem.py primed-recall` (context-aware)	300-1,000	~90% relevant

On a typical agent doing 50 recalls/day:

Old way: ~500K tokens/day on memory reads
With v3: ~30K tokens/day
Savings: ~470K tokens/day (~$7-14/day on API, or significant Claude Max headroom)

The agent gets better answers with less context pollution.

What's New in v3

v2 was markdown files + FTS5 keyword search. v3 adds a full cognitive architecture:

Feature	v2	v3
Storage	Markdown files	SQLite + sqlite-vec embeddings
Search	FTS5 keyword only	Hybrid (vector + keyword + BM25)
Recall	Read entire files	Precise snippets via 6 strategies
Associations	None	Knowledge graph with multi-hop traversal
Hierarchy	Flat files	RAPTOR tree (L0→L1→L2→L3 abstraction)
Decay	Manual cleanup	Automatic strength decay + half-life
Consolidation	Cron text prompt	Clustering + merging + hierarchy rebuild
Context priming	None	Spreading activation from conversation
Sharing	None	Cross-agent with sensitivity gates
Token cost	High (full file reads)	Low (targeted retrieval)

v2 still works. Markdown files, daily logs, MEMORY.md — all unchanged. v3 adds a parallel cognitive layer that dramatically reduces token waste.

Architecture

┌─────────────────────────────────────┐
│         Agent Session               │
│  (reads MEMORY.md, daily logs)      │  ← v2 (unchanged)
├─────────────────────────────────────┤
│         mem.py CLI                  │  ← v3 (NEW)
│  store / recall / consolidate       │
├──────────┬──────────┬───────────────┤
│ sqlite-vec│  FTS5   │ Association   │
│ (vectors) │(keyword)│   Graph       │
├──────────┴──────────┴───────────────┤
│            memory.db                │
│  memories, associations, hierarchy  │
│  shares, embeddings (384-dim)       │
└─────────────────────────────────────┘

Embedding model: all-MiniLM-L6-v2 (384 dimensions, ~80MB, ONNX)
Vector search: sqlite-vec — no Pinecone, no Weaviate, no cloud
Database: db/memory.db (one file, portable, backupable)

Quick Start

# 1. Initialize database
python3 scripts/mem.py init

# 2. Start the local embedding server (port 9999)
bash scripts/start-memory.sh

# 3. Store a memory
python3 scripts/mem.py store "Oscar prefers wired home automation" --type semantic --importance 0.8

# 4. Recall it (200 tokens instead of 10,000)
python3 scripts/mem.py recall "home automation preferences"

That's it. Your agent now has precise memory retrieval.

6 Recall Strategies

Not all queries are the same. Use the right strategy:

Strategy	Best For	Command	When
Hybrid (default)	General recall	`mem.py recall "query"`	Most queries
Vector	Semantic similarity	`recall --strategy vector`	Fuzzy, conceptual
Keyword	Exact terms, IDs	`recall --strategy keyword`	File names, codes
Adaptive	Auto-selects detail	`mem.py recall-adaptive "query"`	Exploratory
Graph	Follow connections	`mem.py recall-assoc "query" --hops 2`	Related concepts
Primed	Context-aware	`mem.py primed-recall "q" --context "..."`	Mid-conversation

Primed recall is the killer feature — pass the last 2-3 user messages as context and results are biased toward what's conversationally relevant. This is how human memory works.

CLI Reference

Core Commands

Command	Description
`mem.py init`	Create database schema
`mem.py migrate`	Import existing documents from jarvis.db
`mem.py store <content> [--type TYPE] [--source SRC] [--importance N]`	Store a memory
`mem.py recall <query> [--strategy hybrid\|vector\|keyword] [--limit N]`	Search memories
`mem.py forget <id\|--query QUERY>`	Soft-delete (strength → 0)
`mem.py hard-delete <id>`	Permanently remove
`mem.py stats`	Database statistics

Memory types: episodic (events), semantic (facts), procedural (how-to), preference (user preferences)

Knowledge Graph

Command	Description
`mem.py associate <src_id> <tgt_id> [--type TYPE] [--weight N]`	Link two memories
`mem.py links <memory_id>`	Show all associations
`mem.py recall-assoc <query> [--hops N] [--limit N]`	Multi-hop traversal
`mem.py graph-stats`	Graph statistics

Edge types: related, caused_by, part_of, contradicts, temporal, supports

RAPTOR Hierarchy

Command	Description
`mem.py build-hierarchy [--levels N]`	Build abstraction tree
`mem.py recall-adaptive <query> [--detail auto\|broad\|specific\|0-3]`	Recall at right abstraction level
`mem.py hierarchy-stats`	Show hierarchy structure

Broad queries ("what do I know about AI?") match high-level summaries. Specific queries ("sqlite-vec version") match leaf nodes. Adaptive auto-selects.

Spreading Activation

Command	Description
`mem.py primed-recall <query> [--context 'text1' 'text2'] [--limit N]`	Context-primed recall

Cross-Agent Sharing

Command	Description
`mem.py share <memory_id> --with <agent> [--sensitivity N]`	Share a memory
`mem.py shared [--from AGENT] [--to AGENT]`	List shares
`mem.py revoke <share_id> \| --memory <id>`	Revoke access

Sensitivity levels: 0 (public) → 5 (top secret). Default gate: ≤ 3. Both agents must consent.

Maintenance

Command	Description
`mem.py consolidate [--days N] [--decay-only]`	Full consolidation cycle

Runs: decay → cluster → merge duplicates → rebuild hierarchy.

Daily Cycle

Wake Up

1. Read SOUL.md, USER.md (identity — small, always load)
2. Read today's daily log (recent context)
3. Use `mem.py primed-recall "session start"` for relevant memories
   → Gets 200-800 tokens of precisely relevant context
   → Instead of 10,000+ tokens from reading MEMORY.md cover-to-cover

During Day

# Store important facts as they come up
mem.py store "Client meeting moved to March 1" --type episodic --importance 0.7

# Before answering from memory — recall, don't read files
mem.py recall "client meeting schedule"

# Link related memories when you notice connections
mem.py associate 42 87 --type related

Sleep Cycle (Nightly Consolidation)

{
  "schedule": { "kind": "cron", "expr": "0 3 * * *", "tz": "America/Los_Angeles" },
  "payload": {
    "kind": "agentTurn",
    "message": "Run memory consolidation: python3 scripts/mem.py consolidate --days 7"
  },
  "sessionTarget": "isolated"
}

What consolidation does:

Decay — Unaccessed memories lose strength (half-life: 30 days)
Cluster — Groups similar memories
Merge — Combines near-duplicates (saves storage + tokens)
Rebuild hierarchy — Updates RAPTOR tree for better adaptive recall

File Structure

workspace/
├── MEMORY.md              # Long-term curated memory (v2, still works)
├── SOUL.md                # Identity & personality
├── USER.md                # Human profile
├── db/
│   ├── agent.db           # Contacts/history (v2)
│   └── memory.db          # Cognitive memory (v3)
├── bank/
│   ├── entities/          # People profiles
│   ├── contacts.md        # Quick contact reference
│   └── opinions.md        # Preferences, beliefs
└── memory/
    ├── YYYY-MM-DD.md      # Daily logs
    ├── projects/           # Project notes
    └── knowledge/          # Topic docs

Comparison: Why This Over Alternatives

Feature	Read files (default)	Basic RAG	Agent Memory Ultimate v3
Token cost per recall	3K-20K	500-2K	200-800
Precision	~5-10%	~50%	~80-90%
Associations	❌	❌	✅ Knowledge graph
Abstraction levels	❌	❌	✅ RAPTOR hierarchy
Context priming	❌	❌	✅ Spreading activation
Automatic decay	❌	❌	✅ Configurable half-life
Cross-agent sharing	❌	❌	✅ With sensitivity gates
Cloud dependency	None	Usually	None — fully local
Setup complexity	Zero	High	`mem.py init` + one script

Data Sources (v2 Importers)

Source	Command
WhatsApp contacts/groups	`python3 scripts/sync_whatsapp.py`
ChatGPT conversations	`python3 scripts/init_db.py` (auto-detects `chatgpt-export/`)
Phone contacts (VCF)	`python3 scripts/import_vcf.py contacts.vcf`
Existing markdown files	`python3 scripts/mem.py migrate`

Dependencies

pip3 install scipy tokenizers onnxruntime numpy sqlite-vec

That's it. No Pinecone. No Weaviate. No Docker. No API keys. ~80MB embedding model downloaded once.

Cognitive Science

This isn't arbitrary architecture — it's modeled on how human memory actually works:

Human Process	Agent Equivalent
Working memory	`primed-recall` (conversation-biased)
Long-term declarative	`store` + vector embeddings
Episodic memory	Daily logs + episodic memories
Semantic memory	MEMORY.md + semantic memories
Sleep consolidation	`consolidate` (decay + cluster + merge)
Forgetting curve	Strength decay with half-life
Association	Knowledge graph with typed edges
Abstraction	RAPTOR hierarchy (concrete → abstract)
Spreading activation	Context primes related memories

Tips

Importance scores matter — 0.9+ for core identity, 0.5 for routine facts. Higher = survives decay longer.
Associate liberally — More edges = better graph traversal. Link when you notice connections.
Let decay work — Forgetting is a feature. Unaccessed memories fading keeps recall precise.
Primed recall for conversations — Pass last 2-3 messages as context. Dramatically better results.
Rebuild hierarchy weekly — build-hierarchy after storing many new memories.
Start with hybrid — Only switch to pure vector/keyword when hybrid misses.

Credits

Created by Oscar Serra with the help of Claude (Anthropic).

Because waking fresh each session shouldn't mean burning 20K tokens to remember who you are.

agent-memory-ultimate