โ† Back to Web & Frontend Development
Web & Frontend Development by @richginsberg

ralph-mode

Autonomous development loops with iteration, backpressure gates

0
Source Code

Ralph Mode - Autonomous Development Loops

Ralph Mode implements the Ralph Wiggum technique adapted for OpenClaw: autonomous task completion through continuous iteration with backpressure gates, completion criteria, and structured planning.

When to Use

Use Ralph Mode when:

  • Building features that require multiple iterations and refinement
  • Working on complex projects with acceptance criteria to validate
  • Need automated testing, linting, or typecheck gates
  • Want to track progress across many iterations systematically
  • Prefer autonomous loops over manual turn-by-turn guidance

Core Principles

Three-Phase Workflow

Phase 1: Requirements Definition

  • Document specs in specs/ (one file per topic of concern)
  • Define acceptance criteria (observable, verifiable outcomes)
  • Create implementation plan with prioritized tasks

Phase 2: Planning

  • Gap analysis: compare specs against existing code
  • Generate IMPLEMENTATION_PLAN.md with prioritized tasks
  • No implementation during this phase

Phase 3: Building (Iterative)

  • Pick one task from plan per iteration
  • Implement, validate, update plan, commit
  • Continue until all tasks complete or criteria met

Backpressure Gates

Reject incomplete work automatically through validation:

Programmatic Gates (Always use these):

  • Tests: [test command] - Must pass before committing
  • Typecheck: [typecheck command] - Catch type errors early
  • Lint: [lint command] - Enforce code quality
  • Build: [build command] - Verify integration

Subjective Gates (Use for UX, design, quality):

  • LLM-as-judge reviews for tone, aesthetics, usability
  • Binary pass/fail - converges through iteration
  • Only add after programmatic gates work reliably

Context Efficiency

  • One task per iteration = fresh context each time
  • Spawn sub-agents for exploration, not main context
  • Lean prompts = smart zone (~40-60% utilization)
  • Plans are disposable - regenerate cheap vs. salvage

File Structure

Create this structure for each Ralph Mode project:

project-root/
โ”œโ”€โ”€ IMPLEMENTATION_PLAN.md     # Shared state, updated each iteration
โ”œโ”€โ”€ AGENTS.md                  # Build/test/lint commands (~60 lines)
โ”œโ”€โ”€ specs/                     # Requirements (one file per topic)
โ”‚   โ”œโ”€โ”€ topic-a.md
โ”‚   โ””โ”€โ”€ topic-b.md
โ”œโ”€โ”€ src/                        # Application code
โ””โ”€โ”€ src/lib/                    # Shared utilities

IMPLEMENTATION_PLAN.md

Priority task list - single source of truth. Format:

# Implementation Plan

## In Progress
- [ ] Task name (iteration N)
  - Notes: discoveries, bugs, blockers

## Completed
- [x] Task name (iteration N)

## Backlog
- [ ] Future task

Topic Scope Test

Can you describe the topic in one sentence without "and"?

  • โœ… "User authentication with JWT and session management"
  • โŒ "Auth, profiles, and billing" โ†’ 3 topics

AGENTS.md - Operational Guide

Succinct guide for running the project. Keep under 60 lines:

# Project Operations

## Build Commands
npm run dev      # Development server
npm run build     # Production build

## Validation
npm run test      # All tests
npm run lint      # ESLint
npm run typecheck  # TypeScript
npm run e2e       # E2E tests

## Operational Notes
- Tests must pass before committing
- Typecheck failures block commits
- Use existing utilities from src/lib over ad-hoc copies

Hats (Personas)

Specialized roles for different tasks:

Hat: Architect (@architect)

  • High-level design, data modeling, API contracts
  • Focus: patterns, scalability, maintainability

Hat: Implementer (@implementer)

  • Write code, implement features, fix bugs
  • Focus: correctness, performance, test coverage

Hat: Tester (@tester)

  • Test authoring, validation, edge cases
  • Focus: coverage, reliability, reproducibility

Hat: Reviewer (@reviewer)

  • Code reviews, PR feedback, quality assessment
  • Focus: style, readability, adherence to specs

Usage:

"Spawn a sub-agent with @architect hat to design the data model"

Loop Mechanics

Outer Loop (You coordinate)

Your job as main agent: engineer setup, observe, course-correct.

  1. Don't allocate work to main context - Spawn sub-agents
  2. Let Ralph Ralph - LLM will self-identify, self-correct
  3. Use protection - Sandbox is your security boundary
  4. Plan is disposable - Regenerate when wrong/stale
  5. Move outside the loop - Sit and watch, don't micromanage

Inner Loop (Sub-agent executes)

Each sub-agent iteration:

  1. Study - Read plan, specs, relevant code
  2. Select - Pick most important uncompleted task
  3. Implement - Write code, one task only
  4. Validate - Run tests, lint, typecheck (backpressure)
  5. Update - Mark task done, note discoveries, commit
  6. Exit - Next iteration starts fresh

Stopping Conditions

Loop ends when:

  • โœ… All IMPLEMENTATION_PLAN.md tasks completed
  • โœ… All acceptance criteria met
  • โœ… Tests passing, no blocking issues
  • โš ๏ธ Max iterations reached (configure limit)
  • ๐Ÿ›‘ Manual stop (Ctrl+C)

Completion Criteria

Define success upfront - avoid "seems done" ambiguity.

Programmatic (Measurable)

  • All tests pass: [test_command] returns 0
  • Typecheck passes: No TypeScript errors
  • Build succeeds: Production bundle created
  • Coverage threshold: e.g., 80%+

Subjective (LLM-as-Judge)

For quality criteria that resist automation:

## Completion Check - UX Quality
Criteria: Navigation is intuitive, primary actions are discoverable
Test: User can complete core flow without confusion

## Completion Check - Design Quality
Criteria: Visual hierarchy is clear, brand consistency maintained
Test: Layout follows established patterns

Run LLM-as-judge sub-agent for binary pass/fail.

Technology-Specific Patterns

Next.js Full Stack

specs/
โ”œโ”€โ”€ authentication.md
โ”œโ”€โ”€ database.md
โ””โ”€โ”€ api-routes.md

src/
โ”œโ”€โ”€ app/                    # App Router
โ”œโ”€โ”€ components/              # React components
โ”œโ”€โ”€ lib/                    # Utilities (db, auth, helpers)
โ””โ”€โ”€ types/                   # TypeScript types

AGENTS.md:
  Build: npm run dev
  Test: npm run test
  Typecheck: npx tsc --noEmit
  Lint: npm run lint

Python (Scripts/Notebooks/FastAPI)

specs/
โ”œโ”€โ”€ data-pipeline.md
โ”œโ”€โ”€ model-training.md
โ””โ”€โ”€ api-endpoints.md

src/
โ”œโ”€โ”€ pipeline.py
โ”œโ”€โ”€ models/
โ”œโ”€โ”€ api/
โ””โ”€โ”€ tests/

AGENTS.md:
  Build: python -m src.main
  Test: pytest
  Typecheck: mypy src/
  Lint: ruff check src/

GPU Workloads

specs/
โ”œโ”€โ”€ model-architecture.md
โ”œโ”€โ”€ training-data.md
โ””โ”€โ”€ inference-pipeline.md

src/
โ”œโ”€โ”€ models/
โ”œโ”€โ”€ training/
โ”œโ”€โ”€ inference/
โ””โ”€โ”€ utils/

AGENTS.md:
  Train: python train.py
  Test: pytest tests/
  Lint: ruff check src/
  GPU Check: nvidia-smi

Quick Start Command

Start a Ralph Mode session:

"Start Ralph Mode for my project at ~/projects/my-app. I want to implement user authentication with JWT.

I will:

  1. Create IMPLEMENTATION_PLAN.md with prioritized tasks
  2. Spawn sub-agents for iterative implementation
  3. Apply backpressure gates (test, lint, typecheck)
  4. Track progress and announce completion

Operational Learnings

When Ralph patterns emerge, update AGENTS.md:

## Discovered Patterns

- When adding API routes, also add to OpenAPI spec
- Use existing db utilities from src/lib/db over direct calls
- Test files must be co-located with implementation

Escape Hatches

When trajectory goes wrong:

  • Ctrl+C - Stop loop immediately
  • Regenerate plan - "Discard IMPLEMENTATION_PLAN.md and re-plan"
  • Reset - "Git reset to last known good state"
  • Scope down - Create smaller scoped plan for specific work

Advanced: LLM-as-Judge Fixture

For subjective criteria (tone, aesthetics, UX):

Create src/lib/llm-review.ts:

interface ReviewResult {
  pass: boolean;
  feedback?: string;
}

async function createReview(config: {
  criteria: string;
  artifact: string; // text or screenshot path
}): Promise<ReviewResult>;

Sub-agents discover and use this pattern for binary pass/fail checks.

Critical Operational Requirements

Based on empirical usage, enforce these practices to avoid silent failures:

1. Mandatory Progress Logging

Ralph MUST write to PROGRESS.md after EVERY iteration. This is non-negotiable.

Create PROGRESS.md in project root at start:

# Ralph: [Task Name]

## Iteration [N] - [Timestamp]

### Status
- [ ] In Progress | [ ] Blocked | [ ] Complete

### What Was Done
- [Item 1]
- [Item 2]

### Blockers
- None | [Description]

### Next Step
[Specific next task from IMPLEMENTATION_PLAN.md]

### Files Changed
- `path/to/file.ts` - [brief description]

Why: External observers (parent agents, crons, humans) can tail one file instead of scanning directories or inferring state from session logs.

2. Session Isolation & Cleanup

Before spawning a new Ralph session:

  • Check for existing Ralph sub-agents via sessions_list
  • Kill or verify completion of previous sessions
  • Do NOT spawn overlapping Ralph sessions on same codebase

Anti-pattern: Spawning Ralph v2 while v1 is still running = file conflicts, race conditions, lost work.

3. Explicit Path Verification

Never assume directory structure. At start of each iteration:

// Verify current working directory
const cwd = process.cwd();
console.log(`Working in: ${cwd}`);

// Verify expected paths exist
if (!fs.existsSync('./src/app')) {
  console.error('Expected ./src/app, found:', fs.readdirSync('.'));
  // Adapt or fail explicitly
}

Why: Ralph may be spawned from different contexts with different working directories.

4. Completion Signal Protocol

When done, Ralph MUST:

  1. Write final PROGRESS.md with "## Status: COMPLETE"
  2. List all created/modified files
  3. Exit cleanly (no hanging processes)

Example completion PROGRESS.md:

# Ralph: Influencer Detail Page

## Status: COMPLETE โœ…

**Finished:** [ISO timestamp]

### Final Verification
- [x] TypeScript: Pass
- [x] Tests: Pass  
- [x] Build: Pass

### Files Created
- `src/app/feature/page.tsx`
- `src/app/api/feature/route.ts`

### Testing Instructions
1. Run: `npm run dev`
2. Visit: `http://localhost:3000/feature`
3. Verify: [specific checks]

5. Error Handling Requirements

If Ralph encounters unrecoverable errors:

  1. Log to PROGRESS.md with "## Status: BLOCKED"
  2. Describe blocker in detail
  3. List attempted solutions
  4. Exit cleanly (don't hang)

Do not silently fail. A Ralph that stops iterating with no progress log is indistinguishable from one still working.

6. Iteration Time Limits

Set explicit iteration timeouts:

## Operational Parameters
- Max iteration time: 10 minutes
- Total session timeout: 60 minutes
- If iteration exceeds limit: Log blocker, exit

Why: Prevents infinite loops on stuck tasks, allows parent agent to intervene.

Memory Updates

After each Ralph Mode session, document:

## [Date] Ralph Mode Session

**Project:** [project-name]
**Duration:** [iterations]
**Outcome:** success / partial / blocked
**Learnings:**
- What worked well
- What needs adjustment
- Patterns to add to AGENTS.md

Appendix: Hall of Failures

Common anti-patterns observed:

Anti-Pattern Consequence Prevention
No progress logging Parent agent cannot determine status Mandatory PROGRESS.md
Silent failure Work lost, time wasted Explicit error logging
Overlapping sessions File conflicts, corrupt state Check/cleanup before spawn
Path assumptions Wrong directory, wrong files Explicit verification
No completion signal Parent waits indefinitely Clear COMPLETE status
Infinite iteration Resource waste, no progress Time limits + blockers
Complex initial prompts Sub-agent never starts (empty session logs) SIMPLIFY instructions

NEW: Session Initialization Best Practices (2025-02-07)

Problem: Sub-agents spawn but don't execute

Evidence: Empty session logs (2 bytes), no tool calls, 0 tokens used

Root Causes

  1. Instructions too complex - Overwhelms isolated session initialization
  2. No clear execution trigger - Agent doesn't know to start
  3. Branching logic - "If X do Y, if Z do W" confuses task selection
  4. Multiple files mentioned - Can't decide which to start with

Fix: SIMPLIFIED Ralph Task Template

## Task: [ONE specific thing]

**File:** exact/path/to/file.ts
**What:** Exact description of change
**Validate:** Exact command to run
**Then:** Update PROGRESS.md and exit

## Rules
1. Do NOT look at other files
2. Do NOT "check first"
3. Make the change, validate, exit

BEFORE (Bad - causes stalls):

Fix all TypeScript errors across these files:
- lib/db.ts has 2 errors
- lib/proposal-service.ts has 5 errors
- route.ts has errors
Check which ones to fix first, then...

AFTER (Good - executes):

Fix lib/db.ts line 27:
Change: PoolClient to pg.PoolClient
Validate: npm run typecheck
Exit immediately after

CRITICAL: Single File Rule

Each Ralph iteration gets ONE file. Not "all errors", not "check then decide". ONE file, ONE change, validate, exit.

CRITICAL: Update PROGRESS.md

MANDATORY: After EVERY iteration, update PROGRESS.md with:

## Iteration [N] - [Timestamp]

### Status: Complete โœ… | Blocked โ›” | Failed โŒ

### What Was Done
- [Specific changes made]

### Validation
- [Test/lint/typecheck results]

### Next Step
- [What should happen next]

Why this matters: Cron job reads PROGRESS.md for status updates. If not updated, status appears stale/repetitive.

Debugging Ralph Stalls

If Ralph stalls:

  1. Check session logs (should show tool calls within 60s)
  2. If empty after spawn โ†’ instructions too complex
  3. Reduce: ONE file, ONE line number, ONE change
  4. Shorter timeout forces smaller tasks (300s not 600s)

Fixing Stale Status Reports

If cron reports same status repeatedly:

  1. Check PROGRESS.md was updated by sub-agent
  2. If not updated โ†’ sub-agent skipped documentation step
  3. Update skill: Add "MANDATORY PROGRESS.md update" to prompt
  4. Manual fix: Update PROGRESS.md to reflect actual state

Summary

Ralph works when: Single file focus + explicit change + validate + exit Ralph stalls when: Complex decisions + multiple files + conditional logic