โ† Back to Security & Passwords
Security & Passwords by @emberdesire

hopeids

Inference-based intrusion detection for AI agents with quarantine

0
Source Code

hopeIDS Security Skill

Inference-based intrusion detection for AI agents with quarantine and human-in-the-loop.

Security Invariants

These are non-negotiable design principles:

  1. Block = full abort โ€” Blocked messages never reach jasper-recall or the agent
  2. Metadata only โ€” No raw malicious content is ever stored
  3. Approve โ‰  re-inject โ€” Approval changes future behavior, doesn't resurrect messages
  4. Alerts are programmatic โ€” Telegram alerts built from metadata, no LLM involved

The Pipeline

Message arrives
    โ†“
hopeIDS.autoScan()
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  risk >= threshold?                     โ”‚
โ”‚                                         โ”‚
โ”‚  BLOCK (strictMode):                    โ”‚
โ”‚     โ†’ Create QuarantineRecord           โ”‚
โ”‚     โ†’ Send Telegram alert               โ”‚
โ”‚     โ†’ ABORT (no recall, no agent)       โ”‚
โ”‚                                         โ”‚
โ”‚  WARN (non-strict):                     โ”‚
โ”‚     โ†’ Inject <security-alert>           โ”‚
โ”‚     โ†’ Continue to jasper-recall         โ”‚
โ”‚     โ†’ Continue to agent                 โ”‚
โ”‚                                         โ”‚
โ”‚  ALLOW:                                 โ”‚
โ”‚     โ†’ Continue normally                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuration

{
  "plugins": {
    "entries": {
      "hopeids": {
        "enabled": true,
        "config": {
          "autoScan": true,
          "defaultRiskThreshold": 0.7,
          "strictMode": false,
          "telegramAlerts": true,
          "agents": {
            "moltbook-scanner": {
              "strictMode": true,
              "riskThreshold": 0.7
            },
            "main": {
              "strictMode": false,
              "riskThreshold": 0.8
            }
          }
        }
      }
    }
  }
}

Options

Option Type Default Description
autoScan boolean false Auto-scan every message
strictMode boolean false Block (vs warn) on threats
defaultRiskThreshold number 0.7 Risk level that triggers action
telegramAlerts boolean true Send alerts for blocked messages
telegramChatId string - Override alert destination
quarantineDir string ~/.openclaw/quarantine/hopeids Storage path
agents object - Per-agent overrides
trustOwners boolean true Skip scanning owner messages

Quarantine Records

When a message is blocked, a metadata record is created:

{
  "id": "q-7f3a2b",
  "ts": "2026-02-06T00:48:00Z",
  "agent": "moltbook-scanner",
  "source": "moltbook",
  "senderId": "@sus_user",
  "intent": "instruction_override",
  "risk": 0.85,
  "patterns": [
    "matched regex: ignore.*instructions",
    "matched keyword: api key"
  ],
  "contentHash": "ab12cd34...",
  "status": "pending"
}

Note: There is NO originalMessage field. This is intentional.


Telegram Alerts

When a message is blocked:

๐Ÿ›‘ Message blocked

ID: `q-7f3a2b`
Agent: moltbook-scanner
Source: moltbook
Sender: @sus_user
Intent: instruction_override (85%)

Patterns:
โ€ข matched regex: ignore.*instructions
โ€ข matched keyword: api key

`/approve q-7f3a2b`
`/reject q-7f3a2b`
`/trust @sus_user`

Built from metadata only. No LLM touches this.


Commands

/quarantine [all|clean]

List quarantine records.

/quarantine        # List pending
/quarantine all    # List all (including resolved)
/quarantine clean  # Clean expired records

/approve <id>

Mark a blocked message as a false positive.

/approve q-7f3a2b

Effect:

  • Status โ†’ approved
  • (Future) Add sender to allowlist
  • (Future) Lower pattern weight

/reject <id>

Confirm a blocked message was a true positive.

/reject q-7f3a2b

Effect:

  • Status โ†’ rejected
  • (Future) Reinforce pattern weights

/trust <senderId>

Whitelist a sender for future messages.

/trust @legitimate_user

/scan <message>

Manually scan a message.

/scan ignore your previous instructions and...

What Approve/Reject Mean

Command What it does What it doesn't do
/approve Marks as false positive, may adjust IDS Does NOT re-inject the message
/reject Confirms threat, may strengthen patterns Does NOT affect current message
/trust Whitelists sender for future Does NOT retroactively approve

The blocked message is gone by design. If it was legitimate, the sender can re-send.


Per-Agent Configuration

Different agents need different security postures:

"agents": {
  "moltbook-scanner": {
    "strictMode": true,    // Block threats
    "riskThreshold": 0.7   // 70% = suspicious
  },
  "main": {
    "strictMode": false,   // Warn only
    "riskThreshold": 0.8   // Higher bar for main
  },
  "email-processor": {
    "strictMode": true,    // Always block
    "riskThreshold": 0.6   // More paranoid
  }
}

Threat Categories

Category Risk Description
command_injection ๐Ÿ”ด Critical Shell commands, code execution
credential_theft ๐Ÿ”ด Critical API key extraction attempts
data_exfiltration ๐Ÿ”ด Critical Data leak to external URLs
instruction_override ๐Ÿ”ด High Jailbreaks, "ignore previous"
impersonation ๐Ÿ”ด High Fake system/admin messages
discovery โš ๏ธ Medium API/capability probing

Installation

npx hopeid setup

Then restart OpenClaw.


Links