Talking Head Video Generator
Create lip-synced avatar videos from text scripts.
Pipeline
- Write script — the words your avatar will speak
- Generate audio — ElevenLabs TTS with your chosen voice
- Generate video — VEED Fabric 1.0 via Fal API (720p)
Usage
python3 {baseDir}/scripts/generate.py \
--script "Your script text here" \
--voice <elevenlabs_voice_id> \
--avatar <image_url_or_path> \
--output ~/Desktop/video.mp4
Avatar Requirements
- Clear, front-facing headshot
- Good lighting, neutral expression
- JPG or PNG, at least 512x512
Voice Options
Find voice IDs at https://elevenlabs.io/app/voice-library or use:
curl -s "https://api.elevenlabs.io/v1/voices" \
-H "xi-api-key: $(cat ~/.config/elevenlabs/api_key)" | python3 -m json.tool
API Keys
- ElevenLabs:
~/.config/elevenlabs/api_key - Fal:
~/.config/fal/api_key(env varFAL_KEY)
Costs
- ElevenLabs TTS: ~$0.15-0.30 per minute of audio
- Fal Fabric 1.0: ~$0.10-0.20 per video generation
- Total: ~$0.30-0.50 per short video (~30s-1min)
Tips
- Keep scripts under 60 seconds for best quality
- Use a consistent avatar image for brand recognition
- Test with a short phrase before generating full videos