How AI Video Upscaling Works
ESRGAN Explained
The technology behind making blurry videos sharp. Real-ESRGAN, GFPGAN, and neural network upscaling explained in plain language.
The Problem with Traditional Upscaling
When you zoom into a low-resolution video, each pixel gets stretched. Traditional algorithms (bilinear, bicubic, Lanczos) try to smooth the result, but the output is always blurry — because the detail was never captured in the first place.
AI upscaling takes a fundamentally different approach: instead of stretching pixels, it predicts what the missing detail should look like based on patterns learned from millions of high-resolution images.
Real-ESRGAN: The Core Upscaling Model
Real-ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) was developed by Xintao Wang at Tencent's ARC Lab. It's the most widely-used open-source upscaling model for both images and video.
How it works
- Training — The model learned on pairs of high-res and synthetically degraded low-res images. It learned to predict what detail was lost during degradation.
- Generator Network — Takes your low-resolution frame and outputs a higher-resolution version with reconstructed detail.
- Discriminator Network — During training, a second network judged whether the upscaled result looked "real" — pushing the generator to produce increasingly realistic detail.
- Inference — On your video, only the generator runs — processing each frame through the neural network on a GPU in milliseconds.
BetterVideo uses the x2plus variant, which doubles resolution. The x4plus variant quadruples resolution but is 4x slower — for video, 2x upscaling offers the best speed-to-quality tradeoff.
GFPGAN: Face Restoration
GFPGAN (Generative Facial Prior GAN) is a specialized model for face restoration, also from Tencent's ARC Lab.
Why faces need special treatment
Humans are extremely sensitive to face quality — we notice blurry eyes or muddy skin immediately. General upscaling models improve faces somewhat, but a specialized face model produces dramatically better results.
How GFPGAN works
- Detection — RetinaFace detects face locations and landmarks in each frame
- Alignment — Detected faces are aligned to a standard position for the neural network
- Restoration — A pre-trained face generation model (StyleGAN2) provides "prior knowledge" of what faces should look like, guiding the restoration
- Paste-back — Restored faces are blended back into the original frame
BetterVideo runs GFPGAN at fidelity weight 0.6 — a balance between maximum AI restoration (0.0) and keeping the original face exactly as-is (1.0).
The Full BetterVideo Pipeline
Every frame of your video passes through this sequence:
Noise Analysis
5 sample frames are analyzed for brightness and noise levels. This determines whether denoising is needed and how much sharpening to apply.
Denoising
For low-light or noisy footage, non-local means denoising removes grain before upscaling (so the AI doesn't amplify noise).
Real-ESRGAN 2x
Every frame is upscaled 2x with fp16 precision on an NVIDIA A10G GPU. No tiling — the full frame is processed at once.
GFPGAN Faces
If faces were detected in the pre-scan, each face is restored. If no faces exist (landscapes, products), this step is skipped.
Adaptive Enhancement
CLAHE contrast (bright footage only) and unsharp mask sharpening with brightness-adaptive strength.
H.264 Encoding
Frames are piped directly to ffmpeg for platform-optimized H.264 encoding. No intermediate files on disk.
GPU Processing: Why Speed Matters
AI upscaling is computationally intensive. Each frame requires billions of floating-point operations through the neural network. BetterVideo runs on NVIDIA A10G GPUs with 24GB VRAM in the cloud, so:
- You don't need a powerful GPU on your own machine
- Processing is fast — 30-second video in under 60 seconds
- fp16 (half-precision) inference doubles throughput without visible quality loss
- Warm containers keep models loaded in GPU memory — no per-job model loading delay
Frequently Asked Questions
Real-ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) is an AI model from Tencent's ARC Lab that upscales images and video frames by generating plausible high-resolution detail from low-resolution input.
GFPGAN (Generative Facial Prior GAN) specializes in face restoration. It detects faces and restores clarity to eyes, skin, and features using a pre-trained face generation model as prior knowledge.
No. Traditional interpolation averages nearby pixels — creating blur. AI upscaling uses neural networks to predict what high-resolution detail should look like based on millions of training images.
Desktop tools need a powerful NVIDIA GPU (RTX 3060+). BetterVideo runs on cloud A10G GPUs so you don't need any GPU — processing happens on our servers.
See the technology in action
Upload a video and watch AI enhancement work on your footage. No download required.
No subscription required. Pay per use. Credits never expire.