TubeScribe ๐ฌ
Turn any YouTube video into a polished document + audio summary.
Drop a YouTube link โ get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
๐ธ Free & No Paid APIs
- No subscriptions or API keys โ works out of the box
- Local processing โ transcription, speaker detection, and TTS run on your machine
- Network access โ fetching from YouTube (captions, metadata, comments) requires internet
- No data uploaded โ nothing is sent to external services; all processing stays on your machine
- Safe sub-agent โ spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube
โจ Features
- ๐ Transcript with summary and key quotes โ Export as DOCX, HTML, or Markdown
- ๐ฏ Smart Speaker Detection โ Automatically identifies participants
- ๐ Audio Summaries โ Listen to key points (MP3/WAV)
- ๐ Clickable Timestamps โ Every quote links directly to that moment in the video
- ๐ฌ YouTube Comments โ Viewer sentiment analysis and best comments
- ๐ Queue Support โ Send multiple links, they get processed in order
- ๐ Non-Blocking Workflow โ Conversation continues while video processes in background
๐ฌ Works With Any Video
- Interviews & podcasts (multi-speaker detection)
- Lectures & tutorials (single speaker)
- Music videos (lyrics extraction)
- News & documentaries
- Any YouTube content with captions
Quick Start
When user sends a YouTube URL:
- Spawn sub-agent with the full pipeline task immediately
- Reply: "๐ฌ TubeScribe is processing โ I'll let you know when it's ready!"
- Continue conversation (don't wait!)
- Sub-agent notification will announce completion with title and details
DO NOT BLOCK โ spawn and move on instantly.
First-Time Setup
Run setup to check dependencies and configure defaults:
python skills/tubescribe/scripts/setup.py
This checks: summarize CLI, pandoc, ffmpeg, Kokoro TTS
Full Workflow (Single Sub-Agent)
Spawn ONE sub-agent that does the entire pipeline:
sessions_spawn(
task=f"""
## TubeScribe: Process {youtube_url}
โ ๏ธ CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.
Run the COMPLETE pipeline โ do not stop until all steps are done.
### Step 1: Extract
```bash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"
Note the Source and Output paths printed by the script. Use those exact paths in subsequent steps.
Step 2: Read source JSON
Read the Source path from Step 1 output and note:
- metadata.title (for filename)
- metadata.video_id
- metadata.channel, upload_date, duration_string
Step 3: Create formatted markdown
Write to the Output path from Step 1:
# **<title>**
- Video info block โ Channel, Date, Duration, URL (clickable). Empty line between each field.
## **Participants**โ table with bold headers:| **Name** | **Role** | **Description** | |----------|----------|-----------------|
## **Summary**โ 3-5 paragraphs of prose
## **Key Quotes**โ 5 best with clickable YouTube timestamps. Format each as:
Use regular dash"Quote text here." - [12:34](https://www.youtube.com/watch?v=ID&t=754s) "Another quote." - [25:10](https://www.youtube.com/watch?v=ID&t=1510s)-, NOT em dashโ. Do NOT use blockquotes>. Plain paragraphs only.
## **Viewer Sentiment**(if comments exist)
## **Best Comments**(if comments exist) โ Top 5, NO lines between them:
Attribution line: dash + italic. Just blank line between comments, NOComment text here. *- โฒ 123 @AuthorName* Next comment text here. *- โฒ 45 @AnotherAuthor*---separators.
## **Full Transcript**โ merge segments, speaker labels, clickable timestamps
Step 4: Create DOCX
Clean the title for filename (remove special chars), then:
pandoc <output_path> -o ~/Documents/TubeScribe/<safe_title>.docx
Step 5: Generate audio
Write the summary text to a temp file, then use TubeScribe's built-in audio generation:
# Write summary to temp file (use python3 to write, avoids shell escaping issues)
python3 -c "
text = '''YOUR SUMMARY TEXT HERE'''
with open('<temp_dir>/tubescribe_<video_id>_summary.txt', 'w') as f:
f.write(text)
"
# Generate audio (auto-detects engine, voice, format from config)
python3 skills/tubescribe/scripts/tubescribe.py \
--generate-audio <temp_dir>/tubescribe_<video_id>_summary.txt \
--audio-output ~/Documents/TubeScribe/<safe_title>_summary
This reads ~/.tubescribe/config.json and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.
Step 6: Cleanup
python3 skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>
Step 7: Open folder
open ~/Documents/TubeScribe/
Report
Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" )
**After spawning, reply immediately:**
> ๐ฌ TubeScribe is processing - I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.
## Configuration
Config file: `~/.tubescribe/config.json`
```json
{
"output": {
"folder": "~/Documents/TubeScribe",
"open_folder_after": true,
"open_document_after": false,
"open_audio_after": false
},
"document": {
"format": "docx",
"engine": "pandoc"
},
"audio": {
"enabled": true,
"format": "mp3",
"tts_engine": "mlx"
},
"mlx_audio": {
"path": "~/.openclaw/tools/mlx-audio",
"model": "mlx-community/Kokoro-82M-bf16",
"voice": "af_heart",
"lang_code": "a",
"speed": 1.05
},
"kokoro": {
"path": "~/.openclaw/tools/kokoro",
"voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
"speed": 1.05
},
"processing": {
"subagent_timeout": 600,
"cleanup_temp_files": true
}
}
Output Options
| Option | Default | Description |
|---|---|---|
output.folder |
~/Documents/TubeScribe |
Where to save files |
output.open_folder_after |
true |
Open output folder when done |
output.open_document_after |
false |
Auto-open generated document |
output.open_audio_after |
false |
Auto-open generated audio summary |
Document Options
| Option | Default | Values | Description |
|---|---|---|---|
document.format |
docx |
docx, html, md |
Output format |
document.engine |
pandoc |
pandoc |
Converter for DOCX (falls back to HTML) |
Audio Options
| Option | Default | Values | Description |
|---|---|---|---|
audio.enabled |
true |
true, false |
Generate audio summary |
audio.format |
mp3 |
mp3, wav |
Audio format (mp3 needs ffmpeg) |
audio.tts_engine |
mlx |
mlx, kokoro, builtin |
TTS engine (mlx = fastest on Apple Silicon) |
MLX-Audio Options (preferred on Apple Silicon)
| Option | Default | Description |
|---|---|---|
mlx_audio.path |
~/.openclaw/tools/mlx-audio |
mlx-audio venv location |
mlx_audio.model |
mlx-community/Kokoro-82M-bf16 |
MLX model to use |
mlx_audio.voice |
af_heart |
Voice preset (used if no voice_blend) |
mlx_audio.voice_blend |
{af_heart: 0.6, af_sky: 0.4} |
Custom voice mix (weighted blend) |
mlx_audio.lang_code |
a |
Language code (a=US English) |
mlx_audio.speed |
1.05 |
Playback speed (1.0 = normal, 1.05 = 5% faster) |
Kokoro PyTorch Options (fallback)
| Option | Default | Description |
|---|---|---|
kokoro.path |
~/.openclaw/tools/kokoro |
Kokoro repo location |
kokoro.voice_blend |
{af_heart: 0.6, af_sky: 0.4} |
Custom voice mix |
kokoro.speed |
1.05 |
Playback speed (1.0 = normal, 1.05 = 5% faster) |
Processing Options
| Option | Default | Description |
|---|---|---|
processing.subagent_timeout |
600 |
Seconds for sub-agent (increase for long videos) |
processing.cleanup_temp_files |
true |
Remove /tmp files after completion |
Comment Options
| Option | Default | Description |
|---|---|---|
comments.max_count |
50 |
Number of comments to fetch |
comments.timeout |
90 |
Timeout for comment fetching (seconds) |
Queue Options
| Option | Default | Description |
|---|---|---|
queue.stale_minutes |
30 |
Consider a processing job stale after this many minutes |
Output Structure
~/Documents/TubeScribe/
โโโ {Video Title}.html # Formatted document (or .docx / .md)
โโโ {Video Title}_summary.mp3 # Audio summary (or .wav)
After generation, opens the folder (not individual files) so you can access everything.
Dependencies
Required:
summarizeCLI โbrew install steipete/tap/summarize- Python 3.8+
Optional (better quality):
pandocโ DOCX output:brew install pandocffmpegโ MP3 audio:brew install ffmpegyt-dlpโ YouTube comments:brew install yt-dlp- mlx-audio โ Fastest TTS on Apple Silicon:
pip install mlx-audio(uses MLX backend for Kokoro) - Kokoro TTS โ PyTorch fallback: see https://github.com/hexgrad/kokoro
yt-dlp Search Paths
TubeScribe checks these locations (in order):
| Priority | Path | Source |
|---|---|---|
| 1 | which yt-dlp |
System PATH |
| 2 | /opt/homebrew/bin/yt-dlp |
Homebrew (Apple Silicon) |
| 3 | /usr/local/bin/yt-dlp |
Homebrew (Intel) / Linux |
| 4 | ~/.local/bin/yt-dlp |
pip install --user |
| 5 | ~/.local/pipx/venvs/yt-dlp/bin/yt-dlp |
pipx |
| 6 | ~/.openclaw/tools/yt-dlp/yt-dlp |
TubeScribe auto-install |
If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.
Queue Handling
When user sends multiple YouTube URLs while one is processing:
Check Before Starting
python skills/tubescribe/scripts/tubescribe.py --queue-status
If Already Processing
# Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL"
# โ Replies: "๐ Added to queue (position 2)"
After Completion
# Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# โ Automatically pops and processes next URL
Queue Commands
| Command | Description |
|---|---|
--queue-status |
Show what's processing + queued items |
--queue-add URL |
Add URL to queue |
--queue-next |
Process next item from queue |
--queue-clear |
Clear entire queue |
Batch Processing (multiple URLs at once)
python skills/tubescribe/scripts/tubescribe.py url1 url2 url3
Processes all URLs sequentially with a summary at the end.
Error Handling
The script detects and reports these errors with clear messages:
| Error | Message |
|---|---|
| Invalid URL | โ Not a valid YouTube URL |
| Private video | โ Video is private โ can't access |
| Video removed | โ Video not found or removed |
| No captions | โ No captions available for this video |
| Age-restricted | โ Age-restricted video โ can't access without login |
| Region-blocked | โ Video blocked in your region |
| Live stream | โ Live streams not supported โ wait until it ends |
| Network error | โ Network error โ check your connection |
| Timeout | โ Request timed out โ try again later |
When an error occurs, report it to the user and don't proceed with that video.
Tips
- For long videos (>30 min), increase sub-agent timeout to 900s
- Speaker detection works best with clear interview/podcast formats
- Single-speaker videos (tutorials, lectures) skip speaker labels automatically
- Timestamps link directly to YouTube at that moment
- Use batch mode for multiple videos:
tubescribe url1 url2 url3