A

AIReelVideo

Local AI Video Generator — Free GPU Generation

Run AI video generation on your own GPU. CogVideoX local, Ollama for scripts, Edge TTS. Privacy-first, zero ongoing cost. 12GB VRAM required.

Why Run AI Video Generation Locally

Cloud AI video services are convenient, but they come with trade-offs that matter to many creators and businesses:

Cost at scale: Cloud video generation charges per video. At $0.30-0.50 per video, generating 100 videos per month costs $30-50. That adds up to $360-600 per year. A GPU that can generate unlimited videos costs $300-700 as a one-time purchase.

Data privacy: Every video prompt, script, and generated output passes through the cloud provider's servers. For businesses handling sensitive content, client data, or proprietary information, this is a real concern. Some industries (healthcare, finance, legal) have regulatory requirements about where data can be processed.

Control: Cloud services change pricing, change terms, rate-limit accounts, and sometimes shut down entirely. When you run locally, you control the infrastructure. No API changes will break your workflow overnight.

Availability: No internet outage, API downtime, or rate limit will stop your local pipeline from running.

AIReelVideo is designed to work fully locally. The entire pipeline, from script writing to video generation to caption rendering, can run on your own hardware without sending a single request to an external service.

Hardware Requirements

GPU (Required)

CogVideoX-2B is the local video generation model. It requires:

  • 12GB VRAM minimum (GPU memory)
  • NVIDIA GPU with CUDA support (AMD is not currently supported)
  • CUDA 11.8 or later

Tested and confirmed GPUs:

GPUVRAMGeneration TimeStatus
RTX 3080 Ti12GB~5 minutesTested, confirmed
RTX 309024GB~4 minutesCompatible
RTX 4070 Ti12GB~4 minutesCompatible
RTX 408016GB~3 minutesCompatible
RTX 409024GB~2 minutesCompatible
Tesla T416GB~6 minutesCloud-compatible
A10G24GB~3 minutesCloud-compatible

GPUs that will not work:

  • RTX 3060 (12GB VRAM but older architecture, may have issues)
  • RTX 3070 (8GB VRAM, not enough)
  • GTX 1080 Ti (11GB, insufficient and older CUDA)
  • Any AMD GPU (no CUDA support)

CPU and RAM

  • CPU: Any modern multi-core processor. Not the bottleneck for video generation.
  • RAM: 16GB minimum, 32GB recommended. The Celery workers, PostgreSQL, and Redis all need memory alongside the GPU workloads.

Storage

  • CogVideoX model: ~10GB
  • Ollama Llama 3.2: ~2GB
  • Docker images and database: ~5-10GB
  • Generated videos: ~20-50MB each
  • Total recommended: 50GB+ free space

Operating System

  • Linux (Ubuntu 22.04 recommended, tested on Ubuntu with kernel 6.8.0)
  • Windows with WSL2 (works but less tested)
  • macOS is not supported for GPU generation (no CUDA)

The Complete Local Stack

AIReelVideo's local deployment replaces every external dependency with a local alternative:

FunctionCloud VersionLocal Version
Script GenerationGemini 2.5 Flash / ClaudeOllama + Llama 3.2
Video GenerationSora 2 / Runway / Veo 3CogVideoX-2B (local GPU)
Voice SynthesisN/A (captions only)Edge TTS (optional, free)
TranscriptionWhisper APIWhisper (local)
DatabasePostgreSQL (cloud)PostgreSQL (Docker)
Task QueueRedis (cloud)Redis (Docker)
WorkersCloud workersCelery (Docker)

The result is a fully self-contained content pipeline where no data leaves your network.

Setting Up Local Generation

Step 1: Install Ollama

Ollama runs language models locally. It handles script generation.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the Llama 3.2 model for script generation
ollama pull llama3.2

# Optional: Pull LLaVA for vision analysis
ollama pull llava

Configure Ollama to accept connections from Docker containers:

# Edit the systemd service
sudo sed -i '/\[Service\]/a Environment="OLLAMA_HOST=0.0.0.0"' \
  /etc/systemd/system/ollama.service

sudo systemctl daemon-reload
sudo systemctl restart ollama

# Verify it's accessible
curl http://localhost:11434/api/tags

Step 2: Install NVIDIA Container Toolkit

Docker needs GPU access for CogVideoX:

# Add NVIDIA Docker repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Step 3: Configure Environment

Set up the .env file for local operation:

# Video Generation
VIDEO_GENERATION_MODE=local

# Script Generation
OLLAMA_TEXT_MODEL=llama3.2
OLLAMA_HOST=http://host.docker.internal:11434

# Voice (optional)
TTS_MODE=local

# Transcription
TRANSCRIPTION_MODE=local

Step 4: Deploy with Docker Compose

# Clone the repository
git clone https://github.com/your-org/aireelvideo.git
cd aireelvideo

# Start all services
docker compose up -d

# Verify services are running
docker ps

This starts:

  • API server on port 8000
  • PostgreSQL on port 5432
  • Redis on port 6379
  • Celery worker (handles video generation)
  • Celery beat (scheduled tasks)
  • Flower on port 5555 (task monitoring)

Step 5: Run Database Migrations

docker exec aireelvideo-api alembic upgrade head

Step 6: Start the Frontend

cd frontend
pnpm install
pnpm dev

The platform is now accessible at http://localhost:3000.

Local Generation Quality

Let's be transparent about what to expect from CogVideoX-2B compared to cloud models:

Where CogVideoX Does Well

  • Scene composition: Generates coherent scenes with proper spatial relationships
  • Motion: Smooth, natural camera movement and object motion
  • Color and lighting: Produces well-lit, visually appealing footage
  • Consistency: Maintains visual consistency within a single generation

Where Cloud Models Are Better

  • Fine detail: Sora 2 renders finer textures, skin detail, and small objects
  • Physical realism: Cloud models handle physics, reflections, and shadows more accurately
  • Face quality: Sora 2 and Veo 3 produce more realistic human faces
  • Complex scenes: Multiple interacting objects or people are better handled by larger models

The Mobile Screen Factor

Here is the practical reality: short-form video is consumed on phone screens at arm's length. At that viewing distance and screen size, the quality gap between CogVideoX and Sora 2 narrows considerably. Details that are obvious on a 27-inch monitor become invisible on a 6-inch phone screen.

For most social media content niches, CogVideoX output is good enough. The exceptions are niches where visual quality is the primary value proposition (photography, videography, visual art) where Sora 2's output is noticeably superior.

The Hybrid Approach

The most practical strategy for many creators: use CogVideoX for most content (free, fast, good enough) and switch to Sora 2 for premium content (best quality, paid). AIReelVideo makes this easy since you can configure different models per market or switch models between generations.

Cost Analysis: Local vs Cloud

Local Setup Costs

ComponentCostNotes
RTX 3080 Ti (used)$350-500Primary expense
RTX 4070 Ti (new)$600-700Alternative
Electricity~$5-15/monthDepends on generation volume
InternetExistingOnly needed for publishing

Break-even point: At $0.40 per Sora 2 video, a $400 GPU pays for itself after 1,000 videos. If you generate 100 videos per month, that is 10 months. If you generate 50 per month, 20 months.

Cloud-Only Costs

VolumeMonthly CostAnnual Cost
50 videos/month$20$240
100 videos/month$40$480
200 videos/month$80$960
500 videos/month$200$2,400

For high-volume creators (200+ videos/month), local generation pays for itself within a few months.

Cloud GPU Hosting (Middle Ground)

If you want self-hosted privacy without buying hardware:

ProviderGPUCostNotes
Vast.aiRTX 3090~$0.20-0.40/hourOn-demand, variable pricing
RunPodRTX 4090~$0.44/hourOn-demand
Lambda LabsA10G~$0.60/hourMore reliable uptime

At $0.30/hour and 5 minutes per video, cloud GPU hosting costs about $0.025 per video, much cheaper than managed API services, but requires more setup and management.

Privacy and Data Sovereignty

For businesses and professionals, the privacy argument for local generation is not paranoia. It is practical risk management.

What Stays Local

  • All script text: Your content ideas, brand messaging, and proprietary information
  • All generated video: The output never touches external servers
  • Market configuration: Your niche strategy and competitive analysis
  • User data: Account information, publishing credentials, everything

Who Benefits Most

  • Healthcare professionals: Patient-related content must stay private (HIPAA considerations)
  • Financial advisors: Client information cannot be processed by third parties
  • Legal professionals: Confidentiality requirements prohibit external processing
  • Businesses with trade secrets: Competitive intelligence and strategy must stay internal
  • Privacy-conscious creators: Anyone who simply prefers not to share their data

What Still Requires External Services

  • Trend discovery: Scraping TikTok and YouTube requires internet access
  • Publishing: Uploading to social platforms sends the final video externally
  • Cloud model generation: If you opt to use Sora 2 or Runway for specific videos

Monitoring Your Local Installation

Task Monitoring with Flower

Flower provides a web dashboard for monitoring Celery task execution:

http://localhost:5555

You can see:

  • Active and queued video generation tasks
  • Task execution time and success/failure rates
  • Worker health and resource usage
  • Historical task data

GPU Monitoring

# Check GPU usage
nvidia-smi

# Watch GPU in real-time during generation
watch -n 1 nvidia-smi

During CogVideoX generation, expect GPU utilization at 90-100% and VRAM usage at 10-12GB.

Log Monitoring

# API server logs
docker logs -f aireelvideo-api

# Celery worker logs (video generation)
docker logs -f aireelvideo-celery-worker

# Database logs
docker logs -f aireelvideo-db

Troubleshooting Local Setup

Ollama Connection Fails from Docker

The most common issue. Docker containers cannot reach localhost on the host machine.

# Verify Ollama is listening on 0.0.0.0
curl http://localhost:11434/api/tags

# Test from inside Docker
docker exec aireelvideo-api curl http://host.docker.internal:11434/api/tags

If the second command fails, Ollama is not bound to 0.0.0.0. Re-run the Ollama configuration step above.

Out of VRAM

If generation fails with CUDA out-of-memory errors:

# Check current VRAM usage
nvidia-smi

# Kill any processes using GPU memory
sudo fuser -v /dev/nvidia*

Close browser tabs running WebGL, other GPU applications, or previous generation processes that did not clean up properly.

Video Generation Hangs

If a generation task seems stuck:

# Check worker status
docker logs aireelvideo-celery-worker 2>&1 | tail -50

# Restart the worker
docker restart aireelvideo-celery-worker

The Celery beat scheduler runs a backup check every minute that catches and retries stalled generation tasks.

Getting Started with Local Generation

  1. Check your GPU: Run nvidia-smi and verify you have 12GB+ VRAM
  2. Install Ollama: Pull Llama 3.2 for script generation
  3. Configure NVIDIA Docker: Install container toolkit and verify GPU access
  4. Deploy with Docker Compose: Single command to start all services
  5. Run migrations: Set up the database schema
  6. Start the frontend: Access the platform at localhost:3000
  7. Create a market and generate: Your first free, private AI video

The entire setup takes about 30 minutes if your GPU and drivers are already working.

Start Generating Videos for Free

AIReelVideo's local deployment gives you a complete AI video pipeline running on your own hardware. Zero ongoing costs, full data privacy, and unlimited generation capacity. If you have a GPU with 12GB of VRAM, you have everything you need.

Clone the repository and deploy your own AI video platform today.

Key Features

CogVideoX Local Generation

Run CogVideoX-2B on your own GPU. Generate 15-20 second vertical videos without sending data to any external service. Requires 12GB VRAM.

Ollama Script Generation

Write video scripts using Llama 3.2 running locally through Ollama. No API keys, no usage fees, no data leaving your machine.

Complete Local Pipeline

Trend discovery, script generation, video generation, and caption rendering all run locally. The entire content pipeline with zero external dependencies.

Zero Ongoing Cost

After the initial hardware investment, every video is free. No tokens, no subscriptions, no per-video charges. Generate unlimited content.

Full Data Privacy

Your scripts, videos, and brand data never leave your server. Important for businesses in regulated industries or anyone who values data sovereignty.

Docker Compose Deployment

The entire platform deploys with a single docker compose up command. PostgreSQL, Redis, Celery workers, and the API server all containerized.

Frequently Asked Questions

CogVideoX-2B requires approximately 12GB of VRAM. An NVIDIA RTX 3080 Ti, RTX 3090, RTX 4070 Ti, or any card with 12GB+ VRAM works. The RTX 3080 Ti has been specifically tested and confirmed to work well. AMD GPUs are not currently supported due to CogVideoX's CUDA requirement.

On an RTX 3080 Ti, CogVideoX generates a 15-20 second video in approximately 5 minutes. Faster cards will reduce this time. Script generation with Ollama is nearly instant (a few seconds). The total pipeline from script to captioned video takes about 6-7 minutes per video.

CogVideoX-2B produces good quality suitable for social media, but it is a step below Sora 2 or Runway Gen-4.5 in terms of visual fidelity, motion smoothness, and fine detail. For TikTok and Reels viewed on mobile screens, the quality difference is less noticeable than when viewed on a large monitor.

Yes. You can configure different markets to use different models. Use CogVideoX for high-volume content where cost matters, and switch to Sora 2 for premium content where quality matters. The platform handles both without any workflow changes.

For the core pipeline (scripts + video generation + captions), no internet connection is needed once models are downloaded. Trend discovery requires internet access since it scrapes content from TikTok and YouTube. Publishing obviously requires internet to upload to platforms.

The CogVideoX model is approximately 10GB. Ollama's Llama 3.2 is about 2GB. The Docker images and PostgreSQL database add another 5-10GB. Generated videos take roughly 20-50MB each. Plan for at least 50GB of free space, more if you generate large volumes of content.

Yes. You can deploy AIReelVideo on cloud GPU instances from providers like Lambda Labs, Vast.ai, or RunPod. This gives you the privacy benefits of self-hosting without needing a local GPU. A cloud instance with a T4 or A10G GPU works well.

Related Articles

Compare to Alternatives