tldw - YouTube Video Summarizer
too long; didn't watch
Extract and summarize YouTube video transcripts quickly and efficiently.
Overview
The tldw skill takes a YouTube URL, extracts the video transcript, and provides a comprehensive summary of the content. This allows you to quickly understand video content without watching the entire video.
Purpose
This skill solves the problem of information overload from video content. Instead of spending 10-60 minutes watching a video, you can get the key points, main arguments, and conclusions in a concise summary within seconds.
When to Use
Use this skill when:
- A user provides a YouTube video URL and asks for a summary
- You need to quickly understand video content without watching it
- You want to analyze or reference specific video content
- You need to extract information from educational, news, or documentary videos
How It Works
- Extraction: Uses yt-dlp to download video transcripts (captions/subtitles)
- Cleaning: Applies deduplication to remove artifacts from auto-generated captions
- Processing: Analyzes the cleaned transcript directly in the main agent session
- Summary: Returns a structured summary with main points, key arguments, and conclusions
Key Features
- Caching: Downloaded transcripts are cached locally to avoid re-downloading
- Deduplication: Removes duplicate lines common in auto-generated captions
- Multi-format support: Works with VTT, SRT, and JSON caption formats
- Cookie support: Can access age-restricted content with a cookie file
- Comprehensive summaries: Provides thesis, key examples, comparisons, and conclusions
- Fast processing: Typical videos summarized in seconds
Attribution
This skill is based on the tldw project by stong. Full attribution and licensing details are available in ATTRIBUTION.md.
Usage Instructions
Basic Usage
When a user provides a YouTube URL and requests a summary, use the following workflow:
- Extract the transcript using the extraction script
- Parse the JSON output to get the cleaned transcript
- Summarize the transcript directly (do not use sub-agents for large transcripts)
- Return a structured summary to the user
Command Syntax
cd tldw/ && \
venv/bin/python scripts/extract_transcript.py \
--json --cache-dir cache "YOUTUBE_URL"
Processing the Output
The script returns JSON with the following structure:
{
"transcript": "Full cleaned transcript text...",
"video_id": "video_id_here",
"title": "Video Title",
"description": "Video description...",
"duration": 1234,
"uploader": "Channel Name",
"upload_date": "20260101",
"view_count": 12345,
"webpage_url": "https://www.youtube.com/watch?v=..."
}
Extract the transcript field and process it directly to create a comprehensive summary.
Command Options
--json: Output in JSON format (recommended for parsing)--cache-dir <path>: Specify cache directory (default:cache/)--cookies <file>: Path to Netscape-format cookie file for age-restricted content
Example Workflow
# 1. Extract transcript
cd tldw/ && \
venv/bin/python scripts/extract_transcript.py \
--json --cache-dir cache "https://www.youtube.com/watch?v=VIDEO_ID"
# 2. Parse the JSON output and extract the transcript field
# 3. Summarize the transcript directly (include main points, key arguments, conclusions)
# 4. Return formatted summary to user
Accessing Age-Restricted Content
For age-restricted or members-only videos, export cookies from your browser:
- Install a browser extension like "Get cookies.txt LOCALLY"
- Navigate to YouTube while logged in
- Export cookies in Netscape format
- Save to the tldw directory (e.g.,
youtube_cookies.txt) - Use with:
--cookies youtube_cookies.txt
Error Handling
No Captions Available
Error message: "No subtitles/captions found"
What it means: The video has no auto-generated or manual captions available.
Solution: Inform the user that the video cannot be transcribed because it lacks captions.
Invalid URL
Error message: "ERROR: unable to download video data"
What it means: The URL is malformed, the video doesn't exist, or it's private/deleted.
Solution: Verify the URL is correct and check if the video is publicly accessible.
Age-Restricted Content
Error message: "Sign in to confirm your age" or similar authentication errors
What it means: The video requires age verification or YouTube login.
Solution: Use the --cookies flag with exported browser cookies (see "Accessing Age-Restricted Content" above).
Network/Connection Errors
Error messages: "Unable to download", "Connection timeout", extraction failures
What it means: Network issues, YouTube blocking the request, or outdated yt-dlp that's incompatible with current YouTube.
Solution:
- First, update yt-dlp:
cd tldw/ && \ venv/bin/pip install --upgrade yt-dlp - Retry the extraction
- If still failing: check internet connection or wait and try later
YouTube frequently changes their API, so keeping yt-dlp updated is essential.
Cache Issues
Symptoms: Permission errors, disk full errors
What it means: The cache directory has permission problems or insufficient disk space.
Solution: Check available disk space with df -h and verify write permissions on the cache/ directory.
Large Transcript Handling
Note: Transcripts over 50,000 characters may take longer to process.
Best practice: Process large transcripts directly in the main agent session. Do not delegate to sub-agents, as they have been found unreliable with large payloads.
Debugging
To see full error output (not just the last 100 lines):
cd tldw/ && \
venv/bin/python scripts/extract_transcript.py \
--json --cache-dir cache "YOUTUBE_URL"
To inspect cached transcripts:
ls -lh tldw/cache/
Limitations
Caption Dependency
- The skill only works with videos that have captions/subtitles available
- Cannot transcribe videos with only audio (no built-in speech-to-text capability)
- Auto-generated captions may contain errors, typos, or timing artifacts
- Deduplication helps clean up auto-generated caption issues but isn't perfect
Language Support
- Depends on available caption languages provided by YouTube
- The script extracts whatever captions are available (auto-generated or manual)
- Non-English captions work, but summarization quality depends on the language model's capabilities
- English captions typically provide the best results
Video Length
- Very long videos (2+ hours) may produce extremely large transcripts (100k+ characters)
- Processing time increases proportionally with transcript length
- No hard limit, but practical considerations for context window and processing time apply
YouTube-Only
- Currently only supports YouTube URLs
- Does not work with other video platforms (Vimeo, Dailymotion, TikTok, etc.)
- While yt-dlp supports many platforms, this script is optimized specifically for YouTube
Private/Restricted Content
- Cannot access truly private videos (not shared publicly)
- Members-only or channel membership content requires cookies from an authenticated session
- Live streams may not have captions available until after the stream has ended
- Some geographic restrictions cannot be bypassed even with cookies
Deduplication Limitations
- The deduplication logic removes consecutive duplicate lines
- May occasionally remove legitimate repeated phrases or refrains
- Designed primarily for auto-generated caption artifacts, not all repetition scenarios
- Manual captions typically don't need deduplication
No Audio Extraction
- This skill extracts text transcripts only, not audio files
- For audio extraction or processing, other tools (like yt-dlp directly with audio flags) would be needed
- Focus is on text-based content analysis, not media files