Swarm โ Cut Your LLM Costs by 200x
Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers โ parallel, batch, research โ at a fraction of the cost.
At a Glance
| 30 tasks via | Time | Cost |
|---|---|---|
| Opus (sequential) | ~30s | ~$0.50 |
| Swarm (parallel) | ~1s | ~$0.003 |
When to Use
Swarm is ideal for:
- 3+ independent tasks (research, summaries, comparisons)
- Comparing or researching multiple subjects
- Multiple URLs to fetch/analyze
- Batch processing (documents, entities, facts)
- Complex analysis needing multiple perspectives โ use chain
Quick Reference
# Check daemon (do this every session)
swarm status
# Start if not running
swarm start
# Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"
# Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"
# Discover capabilities
swarm capabilities
Execution Modes
Parallel (v1.0)
N prompts โ N workers simultaneously. Best for independent tasks.
swarm parallel "prompt1" "prompt2" "prompt3"
Research (v1.1)
Multi-phase: search โ fetch โ analyze. Uses Google Search grounding.
swarm research "Buildertrend" "Jobber" --topic "pricing 2026"
Chain (v1.3) โ Refinement Pipelines
Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.
Stage modes:
parallelโ N inputs โ N workers (same perspective)singleโ merged input โ 1 workerfan-outโ 1 input โ N workers with DIFFERENT perspectivesreduceโ N inputs โ 1 synthesized output
Auto-chain โ describe what you want, get an optimal pipeline:
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'
Manual chain:
swarm chain pipeline.json
# or
echo '{"stages":[...]}' | swarm chain --stdin
Depth presets: quick (2 stages), standard (4), deep (6), exhaustive (8)
Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic
Preview without executing:
curl -X POST http://localhost:9999/chain/preview \
-d '{"task":"...","depth":"standard"}'
Benchmark (v1.3)
Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.
curl -X POST http://localhost:9999/benchmark \
-d '{"task":"Analyze X","data":"...","depth":"standard"}'
Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.
Capabilities Discovery (v1.3)
Lets the orchestrator discover what execution modes are available:
swarm capabilities
# or
curl http://localhost:9999/capabilities
Prompt Cache (v1.3.2)
LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.
- Keyed by hash of instruction + input + perspective
- 500 entries max, 1 hour TTL
- Skips web search tasks (need fresh data)
- Persists to disk across daemon restarts
- Per-task bypass: set
task.cache = false
# View cache stats
curl http://localhost:9999/cache
# Clear cache
curl -X DELETE http://localhost:9999/cache
Cache stats show in swarm status.
Stage Retry (v1.3.2)
If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via phase.retries or globally via options.stageRetries.
Cost Tracking (v1.3.1)
All endpoints return cost data in their complete event:
sessionโ current daemon session totalsdailyโ persisted across restarts, accumulates all day
swarm status # Shows session + daily cost
swarm savings # Monthly savings report
Web Search (v1.1)
Workers search the live web via Google Search grounding (Gemini only, no extra cost).
# Research uses web search by default
swarm research "Subject" --topic "angle"
# Parallel with web search
curl -X POST http://localhost:9999/parallel \
-d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'
JavaScript API
const { parallel, research } = require('~/clawd/skills/node-scaling/lib');
const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');
// Simple parallel
const result = await parallel(['prompt1', 'prompt2', 'prompt3']);
// Client with streaming
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }
// Chain
const result = await client.chainSync({ task, data, depth });
Daemon Management
swarm start # Start daemon (background)
swarm stop # Stop daemon
swarm status # Status, cost, cache stats
swarm restart # Restart daemon
swarm savings # Monthly savings report
swarm logs [N] # Last N lines of daemon log
Performance (v1.3.2)
| Mode | Tasks | Time | Notes |
|---|---|---|---|
| Parallel (simple) | 5 | ~700ms | 142ms/task effective |
| Parallel (stress) | 10 | ~1.2s | 123ms/task effective |
| Chain (standard) | 5 | ~14s | 3-stage multi-perspective |
| Chain (quick) | 2 | ~3s | 2-stage extract+synthesize |
| Cache hit | any | ~3-5ms | 200-500x speedup |
| Research (web) | 2 | ~15s | Google grounding latency |
Config
Location: ~/.config/clawdbot/node-scaling.yaml
node_scaling:
enabled: true
limits:
max_nodes: 16
max_concurrent_api: 16
provider:
name: gemini
model: gemini-2.0-flash
web_search:
enabled: true
parallel_default: false
cost:
max_daily_spend: 10.00
Troubleshooting
| Issue | Fix |
|---|---|
| Daemon not running | swarm start |
| No API key | Set GEMINI_API_KEY or run npm run setup |
| Rate limited | Lower max_concurrent_api in config |
| Web search not working | Ensure provider is gemini + web_search.enabled |
| Cache stale results | curl -X DELETE http://localhost:9999/cache |
| Chain too slow | Use depth: "quick" or check context size |
Structured Output (v1.3.7)
Force JSON output with schema validation โ zero parse failures on structured tasks.
# With built-in schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'
# With custom schema
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'
# JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured \
-d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'
# List available schemas
curl http://localhost:9999/structured/schemas
Built-in schemas: entities, summary, comparison, actions, classification, qa
Uses Gemini's native response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.
Majority Voting (v1.3.7)
Same prompt โ N parallel executions โ pick the best answer. Higher accuracy on factual/analytical tasks.
# Judge strategy (LLM picks best โ most reliable)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'
# Similarity strategy (consensus โ zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'
# Longest strategy (heuristic โ zero extra cost)
curl -X POST http://localhost:9999/vote \
-d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'
Strategies:
judgeโ LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)similarityโ Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)longestโ Picks longest response as heuristic for thoroughness (N calls, zero extra cost)
When to use: Factual questions, critical decisions, or any task where accuracy > speed.
| Strategy | Calls | Extra Cost | Quality |
|---|---|---|---|
| similarity | N | $0 | Good (consensus) |
| longest | N | $0 | Decent (heuristic) |
| judge | N+1 | ~$0.0001 | Best (LLM-scored) |
Self-Reflection (v1.3.5)
Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.
# Add reflect:true to any chain or skeleton request
curl -X POST http://localhost:9999/chain/auto \
-d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a market analysis","reflect":true}'
Proven: improved weak output from 5.0 โ 7.6 avg score. Skeleton + reflect scored 9.4/10.
Skeleton-of-Thought (v1.3.6)
Generate outline โ expand each section in parallel โ merge into coherent document. Best for long-form content.
curl -X POST http://localhost:9999/skeleton \
-d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'
Performance: 14,478 chars in 21s (675 chars/sec) โ 5.1x more content than chain at 2.9x higher throughput.
| Metric | Chain | Skeleton-of-Thought | Winner |
|---|---|---|---|
| Output size | 2,856 chars | 14,478 chars | SoT (5.1x) |
| Throughput | 234 chars/sec | 675 chars/sec | SoT (2.9x) |
| Duration | 12s | 21s | Chain (faster) |
| Quality (w/ reflect) | ~7-8/10 | 9.4/10 | SoT |
When to use what:
- SoT โ long-form content, reports, guides, docs (anything with natural sections)
- Chain โ analysis, research, adversarial review (anything needing multiple perspectives)
- Parallel โ independent tasks, batch processing
- Structured โ entity extraction, classification, any task needing reliable JSON
- Voting โ factual accuracy, critical decisions, consensus-building
API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health | Health check |
| GET | /status | Detailed status + cost + cache |
| GET | /capabilities | Discover execution modes |
| POST | /parallel | Execute N prompts in parallel |
| POST | /research | Multi-phase web research |
| POST | /skeleton | Skeleton-of-Thought (outline โ expand โ merge) |
| POST | /chain | Manual chain pipeline |
| POST | /chain/auto | Auto-build + execute chain |
| POST | /chain/preview | Preview chain without executing |
| POST | /chain/template | Execute pre-built template |
| POST | /structured | Forced JSON with schema validation |
| GET | /structured/schemas | List built-in schemas |
| POST | /vote | Majority voting (best-of-N) |
| POST | /benchmark | Quality comparison test |
| GET | /templates | List chain templates |
| GET | /cache | Cache statistics |
| DELETE | /cache | Clear cache |
Cost Comparison
| Model | Cost per 1M tokens | Relative |
|---|---|---|
| Claude Opus 4 | ~$15 input / $75 output | 1x |
| GPT-4o | ~$2.50 input / $10 output | ~7x cheaper |
| Gemini Flash | ~$0.075 input / $0.30 output | 200x cheaper |
Cache hits are essentially free (~3-5ms, no API call).