A

AIReelVideo

How to Run CogVideoX Locally: Free AI Video

March 22, 2026

|

AIReelVideo Team

|

9 min read

tutorial

Key Takeaways

  • CogVideoX-2B can generate short-form videos locally using a GPU with 12GB+ VRAM (RTX 3080 Ti or better)
  • The complete setup takes about 30-45 minutes including Docker, model downloads, and configuration
  • Local generation is completely free after hardware costs - no API fees, no per-video charges
  • Generation time is approximately 5 minutes per video on capable hardware
  • You can pair CogVideoX with Ollama for local script generation, creating an entirely free pipeline

Why Run AI Video Generation Locally?

Cloud-based AI video services charge per video. At $0.50-$2.00 per generation, costs add up fast when you are producing content at volume. Running CogVideoX from THUDM locally means:

  • Zero per-video cost after your initial hardware investment
  • Complete privacy - your content never leaves your machine
  • No rate limits - generate as many videos as your GPU can handle
  • No dependency on third-party services - no outages, no API changes, no sudden pricing increases

The trade-off is obvious: you need hardware. Specifically, a GPU with enough VRAM to run the model. But if you already have a gaming PC or workstation with a decent NVIDIA GPU, you might already be ready.

Hardware Requirements

Minimum Requirements

ComponentMinimumRecommended
GPUNVIDIA RTX 3080 (10GB VRAM)NVIDIA RTX 3080 Ti (12GB VRAM)
RAM16GB32GB
Storage20GB free (model files)50GB+ free
CPUAny modern quad-core8+ cores
OSLinux (Ubuntu 22.04+)Linux (Ubuntu 22.04+)

GPU Compatibility

The CogVideoX paper on arXiv explains the architecture in detail, and the 2B variant needs approximately 12GB of VRAM for comfortable generation. Here is how common GPUs stack up:

GPUVRAMWorks?Notes
RTX 409024GBYesFastest consumer option
RTX 408016GBYesComfortable headroom
RTX 309024GBYesGreat value on used market
RTX 3080 Ti12GBYesMinimum comfortable VRAM
RTX 308010GBMarginalMay need optimizations
RTX 30708GBNoInsufficient VRAM
RTX 4060 Ti 16GB16GBYesBudget-friendly option
Any AMD GPUVariesNoCUDA required

Important: CogVideoX requires NVIDIA GPUs with CUDA support. AMD and Intel GPUs are not currently supported.

Checking Your GPU

Open a terminal and run:

nvidia-smi

This shows your GPU model, VRAM capacity, and current usage. If this command does not work, you need to install NVIDIA drivers first.

Step 1: Install Prerequisites (10 Minutes)

Install Docker

CogVideoX runs inside Docker containers, which simplifies the setup significantly.

# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Log out and back in, then verify
docker --version

Install NVIDIA Container Toolkit

This lets Docker containers access your GPU:

# Add NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If the last command shows your GPU info inside Docker, you are ready.

Install Docker Compose

# Docker Compose V2 (usually included with modern Docker)
docker compose version

# If not available, install it:
sudo apt-get install docker-compose-plugin

Step 2: Set Up Ollama for Script Generation (10 Minutes)

Ollama runs large language models locally, providing free script generation for your video pipeline.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Download the Script Model

# Pull the text generation model
ollama pull llama3.2

# Optional: pull vision model for content analysis
ollama pull llava

Configure for Docker Access

By default, Ollama only listens on localhost. To let Docker containers access it:

# Edit the Ollama service configuration
sudo sed -i '/\[Service\]/a Environment="OLLAMA_HOST=0.0.0.0"' /etc/systemd/system/ollama.service
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Verify it is accessible
curl http://localhost:11434/api/tags

You should see a JSON response listing your downloaded models.

Step 3: Clone and Configure AIReelVideo (10 Minutes)

Clone the Repository

git clone https://github.com/your-org/aireelvideo.git
cd aireelvideo

Configure Environment

Create your .env file with local-mode settings:

# Copy the example config
cp .env.example .env

Edit the .env file with these key settings:

# Database
DATABASE_URL=postgresql://autosociale:your_password@localhost:5432/autosociale
DATABASE_PASSWORD=your_password

# Local AI modes
VIDEO_GENERATION_MODE=local
TTS_MODE=local
TRANSCRIPTION_MODE=local

# Ollama (running on host machine)
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_TEXT_MODEL=llama3.2

# Storage
STORAGE_PATH=/storage/generated

Understanding the Configuration

SettingValuePurpose
VIDEO_GENERATION_MODE=localEnables CogVideoXUses local GPU instead of cloud API
TTS_MODE=localEnables Edge TTSFree text-to-speech (or skip TTS entirely)
TRANSCRIPTION_MODE=localEnables WhisperFree local transcription for analysis
OLLAMA_TEXT_MODEL=llama3.2Script generationLocal LLM for writing video scripts

Step 4: Start the Services (5 Minutes)

Launch with Docker Compose

docker compose up -d

This starts:

ServicePortPurpose
API server8000FastAPI backend
PostgreSQL5432Database
Redis6379Task queue
Celery Worker-Video generation worker
Celery Beat-Scheduled task runner
Flower5555Task monitoring dashboard

Verify Everything Is Running

# Check all containers are up
docker compose ps

# Check API is responding
curl http://localhost:8000/docs

# Check Celery worker can see the GPU
docker logs aireelvideo-celery-worker 2>&1 | head -20

Start the Frontend

In a separate terminal:

cd frontend
pnpm install
pnpm dev

The web interface will be available at http://localhost:3000.

Step 5: Run Your First Local Video Generation

Create a Market

  1. Open http://localhost:3000 in your browser
  2. Navigate to Markets and create a new market (your niche/topic area)
  3. Set the content type to your preferred style

Add Content Sources

The pipeline needs reference content to generate relevant scripts:

Option A: Discovery - let the system find competitor videos automatically

Option B: Articles - add URLs of articles in your niche

# Add an article via API
curl -X POST http://localhost:8000/api/v1/articles/from-url \
  -H "Content-Type: application/json" \
  -d '{
    "market_id": "YOUR_MARKET_ID",
    "url": "https://example.com/article",
    "title": "Article Title"
  }'

Generate Scripts

# Generate scripts for your market
curl -X POST http://localhost:8000/api/v1/scripts/generate/YOUR_MARKET_ID

Or use the web interface: navigate to Scripts and click "Generate Scripts."

The scripts are generated by Ollama running locally - no cloud APIs involved.

Approve and Generate Video

  1. Review the generated scripts in the web interface
  2. Click "Approve" on a script you like
  3. Video generation starts automatically

Monitor progress:

# Check generation status
curl http://localhost:8000/api/v1/generation/videos/VIDEO_ID/status

# Watch worker logs in real time
docker logs -f aireelvideo-celery-worker

Generation Time Expectations

GPUApproximate Time per Video
RTX 4090~2-3 minutes
RTX 3090~4-5 minutes
RTX 3080 Ti~5-6 minutes
RTX 4080~3-4 minutes

Step 6: Monitor and Troubleshoot

Flower Dashboard

Open http://localhost:5555 to see:

  • Active tasks (currently generating videos)
  • Completed tasks (finished videos)
  • Failed tasks (errors with details)
  • Worker status (GPU utilization)

Common Issues and Fixes

"CUDA out of memory"

Your GPU does not have enough VRAM.

# Check current VRAM usage
nvidia-smi

# Kill other GPU processes
# Close any other applications using the GPU (games, other AI tools)

# If still failing, try reducing batch size in config

"Ollama connection refused"

The Docker container cannot reach Ollama on your host machine.

# Verify Ollama is running and bound to 0.0.0.0
curl http://localhost:11434/api/tags

# Test from inside Docker
docker exec aireelvideo-api curl http://host.docker.internal:11434/api/tags

"Worker not picking up tasks"

# Check worker logs
docker logs aireelvideo-celery-worker 2>&1 | tail -50

# Restart the worker
docker restart aireelvideo-celery-worker

# Check Redis connection
docker exec aireelvideo-redis redis-cli ping

Slow generation speed

# Check GPU utilization during generation
watch -n 1 nvidia-smi

# GPU utilization should be 90%+ during generation
# If it is low, there may be a CPU bottleneck - check RAM usage
free -h

Optimizing Local Performance

GPU Memory Management

CogVideoX-2B loads into VRAM when generating and can stay resident for subsequent generations. To optimize:

  • Close other GPU applications during generation batches
  • Do not run multiple generations simultaneously on a single GPU
  • Monitor with nvidia-smi to ensure VRAM is not being consumed by other processes

Storage Management

Generated videos accumulate in /storage/generated/. Each video is relatively small (5-15MB for a 15-20 second clip), but over time they add up:

# Check storage usage
du -sh /storage/generated/

# Clean up old generations (be careful - this deletes files)
# find /storage/generated/ -mtime +30 -delete

Batch Processing Tips

When generating multiple videos:

  1. Queue all scripts for approval
  2. Approve them in batch - they will enter the generation queue
  3. The Celery worker processes them sequentially
  4. Monitor progress in Flower

With a capable GPU, you can generate 10-12 videos per hour with no marginal cost.

Cost Comparison: Local vs. Cloud

Cloud API Costs (Typical)

ServiceCost per Video100 Videos/Month
Sora 2 (via API)$1.00-2.00$100-200
Runway Gen-4$0.50-1.50$50-150
Typical AI video platform$0.30-1.00$30-100

Local Costs

ExpenseCostFrequency
GPU (RTX 3080 Ti used)$400-500One-time
Electricity~$5-10/monthMonthly
Internet (for Ollama download)NegligibleOne-time

Break-even point: If you generate 50+ videos per month, local generation pays for itself within 2-4 months, depending on which cloud service you would otherwise use.

For creators and businesses producing content at volume, the cost savings are substantial. At 200 videos per month (the kind of volume described in our agency automation case study), you would save $1,000-3,000 per month compared to cloud-only generation.

The Complete Free Pipeline

Here is the fully local, zero-cost-per-video pipeline:

StepLocal ToolCloud Alternative
Script generationOllama (llama3.2)GPT-4, Gemini
Content analysisWhisper + OllamaCloud transcription APIs
Video generationCogVideoX-2BSora 2, Veo 3, Runway
Voice (if needed)Edge TTSElevenLabs, PlayHT
CaptionsBuilt-in ASS generatorCloud caption services

Every step runs on your hardware with no external API calls. The entire pipeline - from topic research to finished video - operates offline after the initial setup.

When to Use Local vs. Cloud

Local generation makes sense when:

  • You produce high volume (50+ videos/month)
  • Privacy matters (your content stays on your machine)
  • You have capable hardware already
  • You want predictable costs

Cloud generation makes sense when:

  • You need the highest quality models (Sora 2, Veo 3)
  • You do not have GPU hardware
  • Volume is low (under 20 videos/month)
  • You need faster generation (cloud GPUs can be faster than consumer hardware)

Many users run a hybrid setup: CogVideoX locally for daily content and cloud models for high-stakes, premium videos where quality matters most. For a full cost breakdown across all platforms, see our cheapest AI video tools guide.

FAQ

What GPU do I need to run CogVideoX locally?

Minimum: NVIDIA GPU with 12GB VRAM (RTX 3080 Ti, RTX 4070 Ti, RTX 4080). Recommended: RTX 4090 with 24GB VRAM for CogVideoX-5B model and faster generation. Minimum 32GB system RAM, 100GB disk space. AMD and Intel GPUs are not officially supported.

How long does local CogVideoX generation take per video?

5-10 minutes per 5-second clip on RTX 3080 Ti, 3-5 minutes on RTX 4080, 2-3 minutes on RTX 4090. Times include model loading, diffusion steps, and encoding. Batch multiple generations to amortize model load time across several videos.

Is the quality of CogVideoX comparable to Sora 2 or Veo 3?

Lower than Sora 2 and Veo 3 in raw photorealism, but sufficient for most social media use cases. CogVideoX-5B (24GB VRAM) narrows the gap significantly. For mobile-viewed short-form content, the difference is subtle. For cinematic or premium brand content, commercial models still lead.

Can I combine local CogVideoX with cloud models in one workflow?

Yes, and this is the best approach for serious creators. Use local generation for daily volume (cost: $0 per video after hardware amortization) and cloud models (Sora 2, Veo 3) for premium hero content. AIReelVideo supports both modes natively with shared pipeline and publishing.

Is CogVideoX licensed for commercial use?

Yes. CogVideoX is released under Apache 2.0 license, allowing commercial use, modification, and distribution. You can monetize videos generated with CogVideoX on YouTube, TikTok, etc., without attribution requirements. Verify current license on the official GitHub repository before large-scale commercial deployment.


Running CogVideoX locally puts AI video generation in your hands with zero ongoing costs. The setup takes under an hour, and once running, you have an unlimited video generation machine. AIReelVideo supports both local CogVideoX and cloud models, so you can start local and add cloud generation later as needed. Get started with the setup guide above and generate your first free AI video today.

cogvideox
local ai
self-hosted
gpu
tutorial
open source

Explore Our Tools