Skip to content

SysAdminDoc/VideoSubtitleRemover

Repository files navigation

Video Subtitle Remover

Version License Platform

Video Subtitle Remover Pro

Version Platform License Python

Professional AI-powered tool for removing hard-coded subtitles from videos and images

Features | Installation | Usage | Configuration | CLI | Troubleshooting


Overview

Video Subtitle Remover Pro uses real AI neural networks to remove hard-coded subtitles and text watermarks from videos and images. Unlike simple blur or crop methods, it intelligently fills in removed areas with content that matches the surrounding video.

Based on YaoFANGUK/video-subtitle-remover, enhanced with a professional interface, real LaMa inpainting, multi-engine detection, and 12-language support.

Features

  • Real Video Inpainting -- Temporal Background Exposure (TBE) reconstructs the true background from neighbouring frames where the subtitle is absent. No external model weight downloads required.
  • Real AI Inpainting -- LaMa neural network for still-frame and residual refinement (via simple-lama-inpainting)
  • AUTO Inpaint Routing -- Per-batch routing between TBE and LaMa based on exposure score
  • Multi-Engine Detection -- RapidOCR (ONNX PP-OCR, 4-5x faster, leak-free) > PaddleOCR > Surya (GPL opt-in) > EasyOCR > OpenCV fallback chain (automatic)
  • Lossless Pipeline -- FFV1 lossless intermediate (only the final encode is lossy) for noticeably cleaner outputs than the legacy mp4v intermediate
  • HEVC + AV1 Output -- Pick H.264 / H.265 / AV1 from a dropdown; NVENC/QSV/AMF for HW encoding, libx265 / libsvtav1 software fallback
  • Multi-region Masks -- Draw multiple subtitle rects on a scrubbable video frame; backend honours every rect
  • Inpaint Preview -- "Preview cleanup" button runs detect + inpaint on the selected frame so you can A/B settings before committing
  • Seamless Boundaries -- Gaussian alpha feathering at every inpaint boundary, no visible cut lines
  • ~50 Language Support -- English / Chinese / Japanese / Korean / European, plus Thai, Vietnamese, Polish, Greek, Ukrainian, Filipino, Hebrew, Czech, and more
  • GPU Acceleration -- NVIDIA CUDA, AMD/Intel DirectML, hardware-decode hints (D3D11 / VAAPI / MFX), CPU fallback
  • Subtitle Region Selector -- Scrub to any frame and draw one or more rectangles
  • Batch Processing -- Queue files or drag entire folders; per-item cancellation
  • Multi-track Audio + Loudness Normalisation -- Pass through every audio track on Bluray rips; optional per-stream EBU R128 normalisation to LUFS targets (YouTube -14, Apple -16, broadcast -23)
  • Quality Self-Test -- PSNR / SSIM report with an ROI-cropped metric (measures the inpaint region, not the unchanged background) and an optional side-by-side comparison PNG
  • CLI + Presets -- python -m backend.processor --pattern ... --preset "YouTube (default)"; six built-in presets + user presets persisted to %APPDATA%
  • Chyron vs Subtitle Filter -- Keep persistent text (logos, lower-thirds) and remove dialogue, or vice versa
  • Karaoke Grouping -- Per-syllable boxes fuse into a single line mask so highlighted lyrics do not leak through the gaps
  • Live Preview During Processing -- 15 FPS throttled preview piped from the backend worker
  • Pre-batch ETA Estimate -- 30-frame detect probe seeds the ETA so users see "about X left" from the very first frame
  • Crash-Resume Checkpointing -- SHA-256 input fingerprint per file; re-running a glob skips finished work
  • Premium Dark UI -- Cohesive design system with custom sliders, toggles, status chips, taskbar progress, onboarding modal
  • Settings Persistence -- All knobs saved/restored between sessions; versioned schema with backfill migration
  • CI/CD Releases -- Automated Windows builds via GitHub Actions, pip-audit dependency scan included

System Requirements

Component Minimum Recommended
OS Windows 10 Windows 11
CPU Intel i5 / AMD Ryzen 5 Intel i7 / AMD Ryzen 7
RAM 8 GB 16+ GB
GPU Any (CPU mode) NVIDIA RTX 2060+
VRAM - 6+ GB
Python 3.10 3.12

Installation

Quick Install

  1. Download or clone this repository
  2. Double-click Run_VSR_Pro.bat — first run automatically:
    • Creates a virtual environment
    • Detects your GPU and installs appropriate packages
    • Installs PaddleOCR, EasyOCR, and LaMa inpainting
    • Launches the application
    • Use Run_VSR_Pro_Debug.bat if you want the same bootstrap flow with a visible console for troubleshooting

Manual Install

cd VideoSubtitleRemover

# Create virtual environment
python -m venv venv
.\venv\Scripts\activate

# Install PyTorch (choose one):
# NVIDIA:
pip install torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu118
# CPU:
pip install torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cpu

# Install dependencies
pip install -r requirements.txt

# Run
python VideoSubtitleRemover.py

FFmpeg (Required for audio)

winget install ffmpeg

Validation

python -m unittest discover -s tests -v

Usage

  1. Launch via Run_VSR_Pro.bat
  2. Add files — Click to browse, press Ctrl+O, right-click for folders, or drag & drop
  3. Select algorithm — LAMA (recommended), STTN, or ProPainter
  4. Set language if subtitles are non-English
  5. Optionally set region — Click "Set Region" to draw a rectangle on the subtitle area
  6. Start Processing and monitor progress
  7. Select a queue item to preview it, use Review mask to confirm detection, and double-click the preview for a larger source frame

Algorithm Comparison

Algorithm Inpainting Engine Speed Quality Best For
STTN Temporal Background Exposure Fastest Great Live-action video with changing subtitles (default)
LAMA Neural (LaMa) Medium Best still-frame Images, animations, static backgrounds
ProPainter TBE + LaMa refinement Slowest Best motion Motion-heavy footage, thick/decorative text

All three modes now do real inpainting. STTN recovers the literal background from adjacent frames where the subtitle is absent -- this works because hard-coded subtitles are sparse in time, and the pixels behind them are revealed whenever the text changes or disappears. LAMA is a single-frame neural fill. ProPainter is a hybrid: TBE reconstructs the background, then LaMa refines any residual.

Detection Engines

The app automatically selects the best available engine:

Priority Engine Install Languages Notes
1 RapidOCR (ONNX PP-OCR) pip install rapidocr 100+ 4-5x faster than PaddleOCR, leak-free (default)
2 PaddleOCR (PP-OCRv5) pip install paddleocr>=3.0.0 106 High accuracy reference implementation
3 Surya pip install surya-ocr 90+ Layout-aware (GPL)
4 EasyOCR pip install easyocr 80+ Legacy fallback
5 OpenCV fallback Built-in Any Threshold-based

CLI Usage

Process files from the command line:

python -m backend.processor -i input.mp4 -o output.mp4 -m lama --lang en --crf 20
Flag Description Default
-i, --input Input file path Required
-o, --output Output file path Required
--pattern Glob pattern for batch (e.g. inputs/*.mp4) -
--out-dir Output directory for batch mode -
--config JSON config overlay -
--preset NAME Apply a built-in or user preset by name -
--list-presets List every preset and exit -
-m, --mode Algorithm (sttn/lama/propainter/auto) sttn
--codec Output codec (h264/h265/av1) h264
-g, --gpu GPU device ID (-1 for CPU) 0
-l, --lang Detection language en
--crf Output quality (15-35, lower=better) 23
--skip-detection Use manual region only Off
--fast LAMA fast mode Off
--no-audio Strip audio Off
--single-audio Mux only first audio stream Off
--loudnorm <LUFS> EBU R128 loudness target (0 disables) 0
--frame-skip N Reuse mask for N frames (0=every frame) 0
--mask-dilate N Expand masks by N pixels 8
--no-hw-encode Force software encoding Off
--decode-accel HW decode hint (off/auto/d3d11/vaapi/mfx) off
--keep-chyrons Leave persistent text (logos / lower-thirds) Off
--keep-subtitles Leave dialogue subtitles Off
--karaoke-grouping Fuse per-syllable boxes on the same line Off
--quality-report Compute PSNR/SSIM after each run Off
--quality-sheet Side-by-side comparison PNG Off
--validate-config Print resolved config and exit Off
--skip-existing Skip files whose output already exists Off
--no-prefetch Disable worker-thread frame prefetcher Off
--json-log PATH Append a structured JSON-line log -

Configuration

Settings are stored in %APPDATA%\VideoSubtitleRemoverPro\settings.json and persist across sessions.

Advanced Settings

Setting Description Default Range
Neighbor Stride STTN temporal window 10 5-30
Reference Length STTN reference frames 10 5-30
Max Load Frames Batch size 30 10-100
CRF Quality Output quality (lower=better) 23 15-35
Output Codec H.264 / H.265 / AV1 h264 h264/h265/av1
Frame Skip Reuse detection mask for N frames 0 0-10
Mask Dilate Expand detected regions (px) 8 0-20
Mask Feather Soft alpha-blend at boundary (px) 4 0-15
TBE Coverage Min frames a pixel must be unmasked to trust its exposure 3 1-10
HW Encoding Use NVENC/QSV/AMF if available On On/Off
HW Decode Hint cv2 HW-accel hint with software fallback off off/auto/d3d11/vaapi/mfx
Loudness Target EBU R128 LUFS target (0 = off) 0 0 or -70..-5
Multi-track Audio Pass through every audio stream On On/Off
Quality Sheet Side-by-side PNG next to output Off On/Off

Troubleshooting

CUDA out of memory
  • Reduce Max Load Frames in Advanced Settings
  • Switch to LAMA mode (lower VRAM)
  • Use CPU mode as fallback
No audio in output
  • Install FFmpeg: winget install ffmpeg
  • Ensure "Preserve original audio" is checked
Poor detection accuracy
  • Try changing the detection language to match your subtitles
  • Use "Set Region" to manually define the subtitle area
  • Install PaddleOCR for best detection accuracy
Application won't start
  • Ensure Python 3.10+ is installed
  • Delete venv folder and re-run setup
  • Try Run_VSR_Pro_Debug.bat to keep the console open during startup
  • Check the log file: %APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log

Log Files

  • GUI log panel (collapsible, click "Open Log File" for full log)
  • File log: %APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log (5MB rotating)

Project Structure

VideoSubtitleRemover/
|-- VideoSubtitleRemover.py   # Main GUI application
|-- backend/
|   |-- __init__.py           # Module exports
|   |-- processor.py          # Core processing (detection + inpainting + mux)
|   |-- presets.py            # Shared preset library (GUI + CLI)
|   `-- model_hashes.py       # Vendored SHA-256 weight hashes
|-- docs/
|   `-- architecture.md       # Pipeline map for new contributors
|-- ROADMAP.md                # Shipped log + ordered backlog + research bench
|-- TODO.md                   # Active checklist (single source of truth)
|-- RESEARCH_FEATURE_PLAN.md  # Audit companion (historical analysis)
|-- setup.py                  # First-time environment setup
|-- Run_VSR_Pro.bat           # Windows launcher
|-- Run_VSR_Pro_Debug.bat     # Windows launcher with a visible console
|-- build_exe.bat             # PyInstaller build script
|-- requirements.txt          # Python dependencies
|-- tests/                    # Focused regression coverage for hardened paths
|-- .github/workflows/
|   `-- build.yml             # CI/CD release workflow + pip-audit
|-- assets/                   # Application assets
|-- models/                   # AI model weights (auto-downloaded)
`-- output/                   # Default output location

See docs/architecture.md for a walkthrough of the detect -> tracker -> mask -> TBE -> refine -> mux pipeline and the "add a new feature" checklist.

Credits

License

This project is licensed under the MIT License.


Video Subtitle Remover Pro -- Built by SysAdminDoc

Report Bug | Request Feature

About

AI-powered Python GUI for removing hard-coded subtitles and text watermarks from videos using STTN, LAMA, and ProPainter inpainting with GPU acceleration.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages