ffmpeg-ai

A zero-cost Python CLI that turns any topic into a complete YouTube Short in one command
Give it a topic → get back a polished 1080×1920 vertical MP4 with AI script, scene images, natural voice-over, perfectly timed burned-in captions, and smooth Ken Burns zooms — all using **only free services** and local tools.

No subscriptions, no watermarks, no web UIs. Just `ffmpeg-ai generate "your topic"` and you're done.


What It Does

ffmpeg-ai is a complete end-to-end generative video pipeline built for speed and zero cost.


 You feed it a single topic and it:

1. Uses a free-tier LLM via OpenRouter to write a tight, scene-broken script (30–60 seconds of narration)
2. Pulls one AI-generated image per scene from Pollinations.ai (no account, no rate limits)
3. Synthesizes a natural-sounding voice-over using Microsoft’s free edge-tts engine
4. Runs **faster-whisper** locally on the audio to generate accurate timed captions
5. Hands everything to ffmpeg for final assembly: variable-speed zooms on each image, audio sync, ASS subtitle burn-in, H.264/AAC encoding

Result: a ready-to-upload YouTube Short (or TikTok/Reels) in under 2 minutes on decent hardware.

Detailed Pipeline Breakdown

1. Script Generation (`ai/openrouter.py`)


- Prompts a free OpenRouter model (you can swap models in code)
- Returns structured JSON: array of `{ "scene": "...", "narration": "..." }`
- Script is kept short and punchy by design

2. Image Generation (`ai/images.py`)

- One HTTP call per scene to `https://image.pollinations.ai/prompt/...`
- Images are downloaded to a temp folder and resized to 1080×1920
- No authentication — truly free and instant

3. Voice-over (`ai/tts.py`)

- Uses `edge-tts` with the excellent `en-US-AndrewNeural` voice (easy to change)
- Saves as high-quality WAV
- Supports any edge-tts voice — just edit one line

4. Captions (`video/captions.py`)

- **faster-whisper** (large-v3 model by default) transcribes the audio
- Converts timestamps to ASS format with styled subtitles (white text, black outline, bottom third)
- Handles word-level timing for buttery-smooth karaoke-style captions

5. Video Composition (`video/composer.py`)

This is the magic. A series of ffmpeg filters:
- `zoompan` with variable speed per scene (slow pan on important images)
- Scale + pad to exact 1080×1920
- Overlay audio
- Burn ASS subtitles with `subtitles=` filter
- Two-pass H.264 encoding for maximum quality at small file size

All ffmpeg commands are built programmatically and logged for debugging.

 Project Architecture

Clean, modular Python structure under `src/ffmpeg_ai/`:

```
src/ffmpeg_ai/
├── cli.py                    # Typer CLI + argument parsing
├── pipeline.py               # Orchestrates the entire flow
├── ai/
│   ├── openrouter.py         # LLM client
│   ├── images.py             # Pollinations fetcher
│   └── tts.py                # Voice synthesis
├── video/
│   ├── composer.py           # All ffmpeg logic
│   ├── captions.py           # Whisper + ASS generation
│   └── shorts.py             # Constants (resolution, max length, etc.)
└── ui/
    ├── display.py            # ASCII banner + live status
    └── widgets.py            # Rich progress bars & tables
```

Total code: ~550 lines. Extremely hackable.

Installation


# Requirements
# - Python 3.11+
# - ffmpeg in your $PATH
# - uv (recommended package manager)

git clone https://github.com/numbpill3d/ffmpeg-ai.git
cd ffmpeg-ai

uv pip install -e ".[dev]"

# Get your free OpenRouter key (https://openrouter.ai/keys)
cp .env.example .env
# edit .env and add your key

Done. No Docker, no heavy frameworks.

Usage Examples

# Normal generation
ffmpeg-ai generate "why cats hate water"

# Dry-run (shows every step without API calls or long processing)
ffmpeg-ai generate --dry-run "the future of keyboards"

# You can also run directly:
python -m ffmpeg_ai generate "deep sea creatures ranked"

Dry-run (shows every step without API calls or long processing)

ffmpeg-ai generate --dry-run "the future of keyboards" 

You can also run directly:...

Read more »