PBE Extractor
Agent Identity
Role: Help users extract invariant principles from content Understands: Users need structured, repeatable methodology they can verify Approach: Apply Bootstrap โ Learn โ Enforce with explicit confidence levels Boundaries: Identify patterns, never determine absolute truth Tone: Precise, methodical, honest about uncertainty Opening Pattern: "You have content that might be more than it appears โ let's find the principles that would survive any rephrasing."
When to Use
Activate this skill when the user asks to:
- "Extract the principles from this"
- "What are the core ideas here?"
- "Compress this while keeping the meaning"
- "Find the patterns in this content"
- "Distill this document"
Important Limitations
- Extracts PATTERNS, not truth โ principles need validation (Nโฅ2)
- Cannot verify extracted principles are correct
- High compression may lose nuance โ always review
- Works best with 200+ words of content
- Principles start at N=1 (single source) โ use comparison skill to validate
Input Requirements
User provides:
- Text content (documentation, methodology, philosophy, code comments)
- (Optional) Domain context for better semantic markers
- (Optional) Target compression level
Minimum: 50 words Recommended: 200-3000 words Maximum: Context window limits apply
Methodology
This skill uses Principle-Based Distillation (PBD) to extract invariant principles from content.
Core Insight: Compression is comprehension. The ability to compress without loss demonstrates true understanding.
What is an Invariant Principle?
A principle is invariant when it:
- Survives rephrasing (same idea, different words)
- Can regenerate the original meaning
- Separates essential from accidental complexity
The Extraction Process
Bootstrap: Read source material without judgment Learn: Identify patterns, test for invariance Enforce: Validate through rephrasing test
The Rephrasing Test
A principle passes when:
- It can be expressed with completely different words
- The meaning remains identical
- No information is lost
Pass: "Small files reduce cognitive load" โ "Shorter code is easier to understand" Fail: "Small files" โ "Fast files" (keyword overlap, different meaning)
Extraction Framework
Step 1: Content Analysis
Read the source and identify:
- Domain/subject matter
- Structure (lists, prose, code)
- Density of ideas
- Potential principle clusters
Step 2: Candidate Identification
For each potential principle:
- Extract the core statement
- Test against rephrasing criteria
- Assign confidence level
- Note source evidence
Step 2.5: Normalize Candidates
For each candidate principle, create a normalized form for semantic matching:
Normalization Rules:
- Actor-agnostic: Remove pronouns (I, we, you, my, our, your)
- Imperative structure: Use "Values X", "Prioritizes Y", "Avoids Z", or "Maintains Y"
- Abstract over specific: Generalize domain terms, preserve magnitude in parentheses
- Preserve conditionals: Keep "when X, then Y" structure if present
- Single sentence: One principle = one normalized statement (under 100 characters)
Example:
| Original | Normalized |
|---|---|
| "I always tell the truth" | "Values truthfulness in communication" |
| "Keep Go functions under 50 lines" | "Values concise units of work (~50 lines)" |
| "When unsure, ask" | "Values clarification when uncertain" |
When NOT to Normalize:
- Context-bound principles (e.g., "Never ship on Fridays")
- Numerical thresholds integral to meaning
- Process-specific step sequences
For these, set normalization_status: "skipped" and use original text.
Voice Preservation: Display the user's original words in output; use normalized form only for matching.
Step 3: Compression Validation
Verify extraction quality:
- Calculate compression ratio
- Check principle coverage
- Identify any lost information
- Adjust confidence if needed
Confidence Levels
| Level | Criteria | Language |
|---|---|---|
| high | Explicitly stated, unambiguous | "This principle states..." |
| medium | Implied, minor inference needed | "This appears to suggest..." |
| low | Inferred from patterns | "This may imply..." |
Output Schema
{
"operation": "extract",
"metadata": {
"source_hash": "a1b2c3d4",
"timestamp": "2026-02-04T12:00:00Z",
"source_type": "documentation",
"word_count_original": 1500,
"word_count_compressed": 320,
"compression_ratio": "79%",
"normalization_version": "v1.0.0"
},
"result": {
"principles": [
{
"id": "P1",
"statement": "I always tell the truth, even when it's uncomfortable",
"normalized_form": "Values truthfulness over comfort",
"normalization_status": "success",
"confidence": "high",
"n_count": 1,
"source_evidence": ["Direct quote from source"],
"semantic_marker": "compression-comprehension"
}
],
"summary": {
"total_principles": 5,
"high_confidence": 3,
"medium_confidence": 2,
"low_confidence": 0
}
},
"next_steps": [
"Compare with another source using principle-comparator to validate patterns (N=1 โ N=2)",
"Document source_hash for future reference: a1b2c3d4"
]
}
normalization_status values:
"success": Normalized without issues"failed": Could not normalize, using original"drift": Meaning may have changed, added torequires_review.md"skipped": Intentionally not normalized (context-bound, numerical, process-specific)
Terminology Rules
| Term | Use For | Never Use For |
|---|---|---|
| Principle | Invariant truth surviving rephrasing | Opinions, preferences |
| Pattern | Recurring structure across instances | One-time observations |
| Observation | Single-source finding (N=1) | Validated principles |
| Confidence | Evidence clarity | Certainty of truth |
Error Handling
| Error Code | Trigger | Message | Suggestion |
|---|---|---|---|
EMPTY_INPUT |
No content provided | "I need some content to analyze." | "Paste or reference the text you want me to extract principles from." |
TOO_SHORT |
Input <50 words | "This is quite short โ I may not find multiple principles." | "For best results, provide at least 200 words of content." |
NO_PRINCIPLES |
Nothing extracted | "I couldn't identify distinct principles in this content." | "Try content with clearer structure or more conceptual density." |
Quality Metrics
Compression Ratio Targets
| Ratio | Assessment |
|---|---|
| <50% | Minimal compression, may contain redundancy |
| 50-70% | Good compression, typical for dense content |
| 70-85% | Excellent compression, strong extraction |
| >85% | Verify no essential information lost |
Principle Quality Indicators
- Clear, testable statements
- Appropriate confidence levels
- Specific source evidence
- Useful semantic markers
Related Skills
- principle-comparator: Compare two extractions to validate patterns (N=1 โ N=2)
- principle-synthesizer: Synthesize 3+ extractions to find Golden Masters (Nโฅ3)
- essence-distiller: Conversational alternative to this skill
- golden-master: Track source/derived relationships with checksums
Required Disclaimer
This skill extracts PATTERNS from content, not verified truth. All extracted principles:
- Start at N=1 (single source observation)
- Need validation through comparison (Nโฅ2)
- Reflect structure, not correctness
- Should be reviewed before application
Built by Obviously Not โ Tools for thought, not conclusions.