← Back to Personal Project

IA Automation Project Overview

Try the application here: https://video-automation.littlenuage.com/

This project is a fully automated video production platform powered by AI. From a simple text topic, the system generates a structured script, creates images, synthesizes narration, applies camera animations, adds subtitles and background music, and delivers a ready-to-publish video — all orchestrated through a Streamlit web interface with job queue management and Discord notifications.

Project Context

Developed as a personal project to industrialize video content creation, this platform eliminates the need for manual editing. It supports multiple video categories (celebrity biographies, geography documentaries, news reports, movie analyses, monuments, artworks), multiple languages (French, English, Spanish), and two production modes (short-form and long-form). The application is deployed online and accessible to anyone.

Architecture Overview

The system follows a modular, service-oriented architecture with clear separation of concerns:

User (Streamlit UI)
    │
    ▼
app2.py ──► queue_manager.py ──► run_pipeline.py ──► pipeline_executor.py
                                                          │
                                    ┌─────────────────────┼─────────────────────┐
                                    ▼                     ▼                     ▼
                              services/             scripts/              AI Models
                          (script, image,       (generation &          (SD, TTS, LLM,
                          audio, video)          processing)           CLIP, Whisper...)

What I Developed

1. Streamlit Web Application (app2.py)

The main user interface built with Streamlit, replacing the earlier Flask prototype:

2. Pipeline Execution Engine (pipeline_executor.py)

The core orchestrator implementing a robust 6-step pipeline:

Step Description Script
1 Script Generation generate_script.py / generate_script_actu.py
2 Image Generation generate_image.py / get_image_pexels.py
3 Audio Synthesis generate_audio.py
4 Camera Movement & Animation generate_movement.py / generate_movement_from_db.py
5 Subtitle Alignment align_subtitles.py (short mode)
6 Music Generation generate_music.py (short mode)

Key features:

3. Job Queue System (queue_manager.py)

A thread-safe job queue for sequential video production:

4. Service Layer (services/)

Four service classes encapsulate file I/O and subprocess execution:

All services follow a consistent pattern with organized paths:

output/{type}/{video}/{category}/{entity_name}/{mode}/{language_code}/

5. UI Components (ui/)

Modular Streamlit components split for maintainability:

6. Infrastructure Modules

Cache Manager (cache_manager.py):

Logger (logger.py):

Notifications (notifications.py):

7. Scripts Organization (scripts/)

All processing scripts are organized by domain:

scripts/
├── script/           # LLM-based script generation & verification
│   ├── generate_script.py
│   ├── generate_script_actu.py
│   ├── verify_script.py
│   └── verify_script_actu.py
├── images/           # Image generation, validation & thumbnails
│   ├── generate_image.py          # Stable Diffusion (celebrity mode)
│   ├── get_image_pexels.py        # Pexels API (geography/actu/movie)
│   ├── generate_mignature.py      # Thumbnail generation
│   ├── verify_image.py            # CLIP score + face detection
│   ├── validate_sequential_images.py
│   ├── configure_images.py
│   └── add_image_details.py
├── audio/            # TTS synthesis & audio processing
│   ├── generate_audio.py          # XTTS v2 voice cloning
│   ├── cut_audio.py
│   ├── verify_audio.py
│   └── correct_text_aft_verif.py
├── video/            # Video editing, animation & subtitles
│   ├── generate_movement.py       # Celebrity image animation
│   ├── generate_movement_from_db.py  # DB-based animation (geography/actu)
│   ├── align_subtitles.py         # Whisper-based subtitle timing
│   ├── generate_text_on_video.py
│   ├── cut_txt_video.py           # TikTok video cutting
│   ├── generate_video_image.py
│   └── cut.py
├── music/            # Background music generation
│   └── generate_music.py          # Facebook MusicGen
├── batch/            # Batch operations
│   ├── generate_all_audio.py
│   ├── generate_all_image.py
│   └── checklist_final.py
├── publishing/       # Publishing & promotion
│   ├── send_to_youtube.py
│   └── short_end_pub.py
└── experimental/     # Experimental features
    └── (image-to-video, deforum)

File Organization

.
├── app2.py                    # Streamlit web application
├── pipeline_executor.py       # 6-step pipeline engine
├── queue_manager.py           # Job queue management
├── run_pipeline.py            # Subprocess launcher
├── utils.py                   # Configuration & environment utilities
├── cache_manager.py           # Intelligent caching system
├── logger.py                  # Structured logging
├── notifications.py           # Discord webhook notifications
├── config.xml                 # Application configuration (languages, modes, models, paths)
├── services/
│   ├── script_service.py
│   ├── image_service.py
│   ├── audio_service.py
│   └── video_service.py
├── ui/
│   ├── generation_tab.py
│   ├── queue_tab.py
│   ├── components.py
│   └── styles.py
├── scripts/                   # Processing scripts (see above)
├── authentification/          # OAuth2 & credentials
├── input/
│   ├── Audio/                 # Voice reference files (FR/EN/ES)
│   └── Video_List/            # Excel entity lists per category
├── output/
│   ├── Script/                # Generated JSON scripts
│   ├── Image/                 # Generated scene images
│   ├── Audio/                 # Synthesized narration audio
│   ├── Video/                 # Text-overlay videos
│   ├── Video_Anime/           # Animated scene videos
│   ├── Final-Video/           # Final assembled videos + TikTok cuts
│   ├── Music/                 # Generated background music
│   └── Miniature/             # Video thumbnails
├── logs/                      # Rotating log files
├── cache/                     # Cache index & metadata
└── stable-diffusion-webui/    # AUTOMATIC1111 WebUI (submodule)

Automated Pipeline Workflow

Full Pipeline (Queue Mode)

  1. User configures video type, category, entity, mode, language in the Streamlit UI
  2. Job added to the persistent queue via queue_manager.py
  3. Worker thread picks up the job and spawns run_pipeline.py as a subprocess
  4. PipelineExecutor orchestrates the 6 steps:
    • Step 1: LLM generates a structured JSON script (scenes, prompts, titles)
    • Steps 2-3: Images and audio run in parallel — Stable Diffusion or Pexels for images, XTTS v2 for voice cloning
    • Step 4: Camera movement animation applied to each scene (zoom, pan, slide)
    • Step 5: Whisper-based subtitle alignment (short mode)
    • Step 6: MusicGen background music generation (short mode)
  5. Discord notification sent at job start, completion, or failure
  6. User reviews the result in the History tab, can upload to YouTube directly

Manual Mode (Generation Tab)

Each step can also be executed individually with live preview:

Configuration (config.xml)

<!-- Languages with voice reference files -->
<languages>
  <language name="Francais" code="fr" audio="Enregistrement-francais.wav" />
  <language name="English" code="en" audio="Enregistrement-anglais.wav" />
  <language name="Espanol" code="es" audio="Enregistrement-espagnol.wav" />
</languages>

<!-- Production modes -->
<modes>
  <mode name="short" nb_scenes="5" nb_images="3" width="1024" height="1024" />
  <mode name="long" nb_scenes="20" nb_images="3" width="1280" height="720" />
</modes>

<!-- AI Models -->
<models>
  <model name="juggernaut-xl" type="image" />
  <model name="xtts_v2" type="audio" />
  <script_models>gemma, deepseek-r1, ministral</script_models>
</models>

Technical Architecture

Backend Stack:

AI Models (8 models integrated):

External APIs:

Processing Categories:

Category Image Source Script Model Description
Celebrity Stable Diffusion LLM Biography and career videos
Geography Pexels API LLM Location and travel documentaries
Actu (News) Pexels API LLM (actu variant) Current events reports
Movie Pexels API LLM Film analysis and summaries
Monument Stable Diffusion LLM Historical and architectural features
Oeuvre (Art) Stable Diffusion LLM Artistic works and analysis

Development Challenges & Solutions

Challenge 1: Pipeline Reliability

Challenge 2: Audio-Visual Synchronization

Challenge 3: Resource Contention

Challenge 4: Quality Consistency

Challenge 5: Multi-language Support

Challenge 6: Observability

Prerequisites

Python packages:

pip install streamlit streamlit-authenticator
pip install moviepy requests TTS openai-whisper
pip install transformers mediapipe ollama
pip install pydub soundfile openpyxl pyyaml
pip install -r stable-diffusion-webui/requirements.txt

Services to run:

# Stable Diffusion WebUI (image generation)
cd stable-diffusion-webui && ./webui.sh --api

# Ollama (LLM for script generation)
ollama serve
ollama pull gemma3:12b

# Streamlit application
streamlit run app2.py

Troubleshooting

Use Cases & Applications

Future Enhancements

Performance Metrics


Detailed AI Models

1. Stable Diffusion (AUTOMATIC1111 WebUI)


2. Coqui TTS (XTTS v2)


3. Ollama (Gemma, DeepSeek, Ministral)


4. CLIP (OpenAI)


5. Whisper (OpenAI)


6. MediaPipe (Face Detection)


7. Facebook MusicGen


8. MoviePy + OpenCV


Resources & Credits

License

See stable-diffusion-webui/LICENSE.txt for the WebUI license. Other scripts are under your chosen license.


Try It Yourself!

Access the application: https://video-automation.littlenuage.com/

  1. Select your video type and category
  2. Choose an entity from the list (or add your own)
  3. Pick a language and production mode (short/long)
  4. Add the job to the queue or generate step-by-step
  5. Monitor progress via Discord notifications
  6. Download or upload directly to YouTube

IA Automation Workflow