โ† Back to Coding Agents & IDEs
Coding Agents & IDEs by @thewulf7

avatar-video-messages

Generate and send video messages

0
Source Code

Video Message

Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).

Installation

npm install -g openclaw-avatarcam

Configuration

Configure in TOOLS.md:

### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00

Settings Reference

Setting Default Description
avatar default.vrm VRM avatar file path
background #00FF00 Color (hex) or image path

Prerequisites

System Dependencies

Platform Command
macOS brew install ffmpeg
Linux sudo apt-get install -y xvfb xauth ffmpeg
Windows Install ffmpeg and add to PATH
Docker See Docker section below

Note: macOS and Windows don't need xvfb โ€” they have native display support.

Docker Users

Add to OPENCLAW_DOCKER_APT_PACKAGES:

build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg

Usage

# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"

# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"

# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"

Sending as Video Note

Use OpenClaw's message tool with asVideoNote:

message action=send filePath=/tmp/video.mp4 asVideoNote=true

Workflow

  1. Read config from TOOLS.md (avatar, background)
  2. Generate TTS if given text: tts text="..." โ†’ audio path
  3. Run avatarcam with audio + settings โ†’ MP4 output
  4. Send as video note via message action=send filePath=... asVideoNote=true
  5. Return NO_REPLY after sending

Example Flow

User: "Send me a video message saying hello"

# 1. TTS
tts text="Hello! How are you today?" โ†’ /tmp/voice.mp3

# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"

# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true

# 4. Reply
NO_REPLY

Technical Details

Setting Value
Resolution 384x384 (square)
Frame rate 30fps constant
Max duration 60 seconds
Video codec H.264 (libx264)
Audio codec AAC
Quality CRF 18 (high quality)
Container MP4

Processing Pipeline

  1. Electron renders VRM avatar with lip sync at 1280x720
  2. WebM captured via canvas.captureStream(30)
  3. FFmpeg processes: crop โ†’ fps normalize โ†’ scale โ†’ encode
  4. Message tool sends via Telegram sendVideoNote API

Platform Support

Platform Display Notes
macOS Native Quartz No extra deps
Linux xvfb (headless) apt install xvfb
Windows Native No extra deps

Headless Rendering

Avatarcam auto-detects headless environments:

  • Uses xvfb-run when $DISPLAY is not set (Linux only)
  • macOS/Windows use native display
  • GPU stall warnings are safe to ignore
  • Generation time: ~1.5x realtime (20s audio โ‰ˆ 30s processing)

Notes

  • Config is read from TOOLS.md
  • Clean up temp files after sending: rm /tmp/video*.mp4
  • For regular video (not circular), omit asVideoNote=true