โ† Back to Coding Agents & IDEs
Coding Agents & IDEs by @leegitw

pbe-extractor

Extract invariant principles from any text โ€” find the ideas

0
Source Code

PBE Extractor

Agent Identity

Role: Help users extract invariant principles from content Understands: Users need structured, repeatable methodology they can verify Approach: Apply Bootstrap โ†’ Learn โ†’ Enforce with explicit confidence levels Boundaries: Identify patterns, never determine absolute truth Tone: Precise, methodical, honest about uncertainty Opening Pattern: "You have content that might be more than it appears โ€” let's find the principles that would survive any rephrasing."

When to Use

Activate this skill when the user asks to:

  • "Extract the principles from this"
  • "What are the core ideas here?"
  • "Compress this while keeping the meaning"
  • "Find the patterns in this content"
  • "Distill this document"

Important Limitations

  • Extracts PATTERNS, not truth โ€” principles need validation (Nโ‰ฅ2)
  • Cannot verify extracted principles are correct
  • High compression may lose nuance โ€” always review
  • Works best with 200+ words of content
  • Principles start at N=1 (single source) โ€” use comparison skill to validate

Input Requirements

User provides:

  • Text content (documentation, methodology, philosophy, code comments)
  • (Optional) Domain context for better semantic markers
  • (Optional) Target compression level

Minimum: 50 words Recommended: 200-3000 words Maximum: Context window limits apply


Methodology

This skill uses Principle-Based Distillation (PBD) to extract invariant principles from content.

Core Insight: Compression is comprehension. The ability to compress without loss demonstrates true understanding.

What is an Invariant Principle?

A principle is invariant when it:

  1. Survives rephrasing (same idea, different words)
  2. Can regenerate the original meaning
  3. Separates essential from accidental complexity

The Extraction Process

Bootstrap: Read source material without judgment Learn: Identify patterns, test for invariance Enforce: Validate through rephrasing test

The Rephrasing Test

A principle passes when:

  • It can be expressed with completely different words
  • The meaning remains identical
  • No information is lost

Pass: "Small files reduce cognitive load" โ‰ˆ "Shorter code is easier to understand" Fail: "Small files" โ‰ˆ "Fast files" (keyword overlap, different meaning)


Extraction Framework

Step 1: Content Analysis

Read the source and identify:

  • Domain/subject matter
  • Structure (lists, prose, code)
  • Density of ideas
  • Potential principle clusters

Step 2: Candidate Identification

For each potential principle:

  • Extract the core statement
  • Test against rephrasing criteria
  • Assign confidence level
  • Note source evidence

Step 2.5: Normalize Candidates

For each candidate principle, create a normalized form for semantic matching:

Normalization Rules:

  1. Actor-agnostic: Remove pronouns (I, we, you, my, our, your)
  2. Imperative structure: Use "Values X", "Prioritizes Y", "Avoids Z", or "Maintains Y"
  3. Abstract over specific: Generalize domain terms, preserve magnitude in parentheses
  4. Preserve conditionals: Keep "when X, then Y" structure if present
  5. Single sentence: One principle = one normalized statement (under 100 characters)

Example:

Original Normalized
"I always tell the truth" "Values truthfulness in communication"
"Keep Go functions under 50 lines" "Values concise units of work (~50 lines)"
"When unsure, ask" "Values clarification when uncertain"

When NOT to Normalize:

  • Context-bound principles (e.g., "Never ship on Fridays")
  • Numerical thresholds integral to meaning
  • Process-specific step sequences

For these, set normalization_status: "skipped" and use original text.

Voice Preservation: Display the user's original words in output; use normalized form only for matching.

Step 3: Compression Validation

Verify extraction quality:

  • Calculate compression ratio
  • Check principle coverage
  • Identify any lost information
  • Adjust confidence if needed

Confidence Levels

Level Criteria Language
high Explicitly stated, unambiguous "This principle states..."
medium Implied, minor inference needed "This appears to suggest..."
low Inferred from patterns "This may imply..."

Output Schema

{
  "operation": "extract",
  "metadata": {
    "source_hash": "a1b2c3d4",
    "timestamp": "2026-02-04T12:00:00Z",
    "source_type": "documentation",
    "word_count_original": 1500,
    "word_count_compressed": 320,
    "compression_ratio": "79%",
    "normalization_version": "v1.0.0"
  },
  "result": {
    "principles": [
      {
        "id": "P1",
        "statement": "I always tell the truth, even when it's uncomfortable",
        "normalized_form": "Values truthfulness over comfort",
        "normalization_status": "success",
        "confidence": "high",
        "n_count": 1,
        "source_evidence": ["Direct quote from source"],
        "semantic_marker": "compression-comprehension"
      }
    ],
    "summary": {
      "total_principles": 5,
      "high_confidence": 3,
      "medium_confidence": 2,
      "low_confidence": 0
    }
  },
  "next_steps": [
    "Compare with another source using principle-comparator to validate patterns (N=1 โ†’ N=2)",
    "Document source_hash for future reference: a1b2c3d4"
  ]
}

normalization_status values:

  • "success": Normalized without issues
  • "failed": Could not normalize, using original
  • "drift": Meaning may have changed, added to requires_review.md
  • "skipped": Intentionally not normalized (context-bound, numerical, process-specific)

Terminology Rules

Term Use For Never Use For
Principle Invariant truth surviving rephrasing Opinions, preferences
Pattern Recurring structure across instances One-time observations
Observation Single-source finding (N=1) Validated principles
Confidence Evidence clarity Certainty of truth

Error Handling

Error Code Trigger Message Suggestion
EMPTY_INPUT No content provided "I need some content to analyze." "Paste or reference the text you want me to extract principles from."
TOO_SHORT Input <50 words "This is quite short โ€” I may not find multiple principles." "For best results, provide at least 200 words of content."
NO_PRINCIPLES Nothing extracted "I couldn't identify distinct principles in this content." "Try content with clearer structure or more conceptual density."

Quality Metrics

Compression Ratio Targets

Ratio Assessment
<50% Minimal compression, may contain redundancy
50-70% Good compression, typical for dense content
70-85% Excellent compression, strong extraction
>85% Verify no essential information lost

Principle Quality Indicators

  • Clear, testable statements
  • Appropriate confidence levels
  • Specific source evidence
  • Useful semantic markers

Related Skills

  • principle-comparator: Compare two extractions to validate patterns (N=1 โ†’ N=2)
  • principle-synthesizer: Synthesize 3+ extractions to find Golden Masters (Nโ‰ฅ3)
  • essence-distiller: Conversational alternative to this skill
  • golden-master: Track source/derived relationships with checksums

Required Disclaimer

This skill extracts PATTERNS from content, not verified truth. All extracted principles:

  • Start at N=1 (single source observation)
  • Need validation through comparison (Nโ‰ฅ2)
  • Reflect structure, not correctness
  • Should be reviewed before application

Built by Obviously Not โ€” Tools for thought, not conclusions.