ffmpeg-ai
A zero-cost Python CLI that turns any topic into a complete YouTube Short in one command
Give it a topic → get back a polished 1080×1920 vertical MP4 with AI script, scene images, natural voice-over, perfectly timed burned-in captions, and smooth Ken Burns zooms — all using **only free services** and local tools.
No subscriptions, no watermarks, no web UIs. Just `ffmpeg-ai generate "your topic"` and you're done.
What It Does
ffmpeg-ai is a complete end-to-end generative video pipeline built for speed and zero cost.
You feed it a single topic and it:
1. Uses a free-tier LLM via OpenRouter to write a tight, scene-broken script (30–60 seconds of narration)
2. Pulls one AI-generated image per scene from Pollinations.ai (no account, no rate limits)
3. Synthesizes a natural-sounding voice-over using Microsoft’s free edge-tts engine
4. Runs **faster-whisper** locally on the audio to generate accurate timed captions
5. Hands everything to ffmpeg for final assembly: variable-speed zooms on each image, audio sync, ASS subtitle burn-in, H.264/AAC encoding
Result: a ready-to-upload YouTube Short (or TikTok/Reels) in under 2 minutes on decent hardware.
Detailed Pipeline Breakdown
1. Script Generation (`ai/openrouter.py`)
- Prompts a free OpenRouter model (you can swap models in code)
- Returns structured JSON: array of `{ "scene": "...", "narration": "..." }`
- Script is kept short and punchy by design
2. Image Generation (`ai/images.py`)
- One HTTP call per scene to `https://image.pollinations.ai/prompt/...`
- Images are downloaded to a temp folder and resized to 1080×1920
- No authentication — truly free and instant
3. Voice-over (`ai/tts.py`)
- Uses `edge-tts` with the excellent `en-US-AndrewNeural` voice (easy to change)
- Saves as high-quality WAV
- Supports any edge-tts voice — just edit one line
4. Captions (`video/captions.py`)
- **faster-whisper** (large-v3 model by default) transcribes the audio
- Converts timestamps to ASS format with styled subtitles (white text, black outline, bottom third)
- Handles word-level timing for buttery-smooth karaoke-style captions
5. Video Composition (`video/composer.py`)
This is the magic. A series of ffmpeg filters:
- `zoompan` with variable speed per scene (slow pan on important images)
- Scale + pad to exact 1080×1920
- Overlay audio
- Burn ASS subtitles with `subtitles=` filter
- Two-pass H.264 encoding for maximum quality at small file size
All ffmpeg commands are built programmatically and logged for debugging.
Project Architecture
Clean, modular Python structure under `src/ffmpeg_ai/`:
```
src/ffmpeg_ai/
├── cli.py # Typer CLI + argument parsing
├── pipeline.py # Orchestrates the entire flow
├── ai/
│ ├── openrouter.py # LLM client
│ ├── images.py # Pollinations fetcher
│ └── tts.py # Voice synthesis
├── video/
│ ├── composer.py # All ffmpeg logic
│ ├── captions.py # Whisper + ASS generation
│ └── shorts.py # Constants (resolution, max length, etc.)
└── ui/
├── display.py # ASCII banner + live status
└── widgets.py # Rich progress bars & tables
```
Total code: ~550 lines. Extremely hackable.
Installation
# Requirements
# - Python 3.11+
# - ffmpeg in your $PATH
# - uv (recommended package manager)
git clone https://github.com/numbpill3d/ffmpeg-ai.git
cd ffmpeg-ai
uv pip install -e ".[dev]"
# Get your free OpenRouter key (https://openrouter.ai/keys)
cp .env.example .env
# edit .env and add your key
Done. No Docker, no heavy frameworks.
Usage Examples
# Normal generation
ffmpeg-ai generate "why cats hate water"
# Dry-run (shows every step without API calls or long processing)
ffmpeg-ai generate --dry-run "the future of keyboards"
# You can also run directly:
python -m ffmpeg_ai generate "deep sea creatures ranked"
Dry-run (shows every step without API calls or long processing)
ffmpeg-ai generate --dry-run "the future of keyboards"
You can also run directly:...
Read more »
voidrane Splicer