This project focuses on Image Colorization for human face images, primarily using a U-Net architecture, while also exploring the use of PatchGAN to potentially enhance the realism and overall visual quality of the generated colorized images.
The model is based on a U-Net architecture (see the figure below) inspired by UVCGAN [1] and Attention U-Net [2].
The training process was configured with the following settings::
# Training configuration
epoch = 100, batch_size = 64
# Loss weights
alpha = 100.0, beta = 10.0, theta = 0.2
# Optimizer
optimizer = Adam(lr=0.0002, betas=(0.5, 0.999))All experiments below are compared against a baseline model configured with ndims=16, depth=3, and num_blocks=1.
The following results illustrate how changing the depth of the U-Net—i.e., using different numbers of encoder/decoder levels:
| Model | ||||
|---|---|---|---|---|
| U-Net(depth=3) | ||||
| U-Net(depth=3) + GAN | ||||
| U-Net(depth=4) | ||||
| U-Net(depth=4) + GAN |
Table 1: Effect of U-Net depth and GAN on Colorization Performance
Image 1: Colorization results of U-Net variants with different depths and GAN
The following results demonstrate the impact of modifying the U-Net’s width:| Model | ||||
|---|---|---|---|---|
| U-Net(ndims=16) | ||||
| U-Net(ndims=16) + GAN | ||||
| U-Net(ndims=32) | ||||
| U-Net(ndims=32) + GAN |
Table 2: Influence of U-Net feature width and GAN integration on Image Colorization quality
Image 2: Results from U-Net models of different feature widths, with and without GAN support
Several experiments were conducted to evaluate different design choices in the colorization process. Two key observations emerged:
- Color space comparison: Predicting ab channels from L yields better colorization results than directly predicting RGB values.
- Loss function evaluation: When comparing reconstruction losses, the L1 loss led to notice
- Adversarial loss (PatchGAN): Incorporating a PatchGAN discriminato further improves perceptual quality, leading to lower FID scores and more realistic colorizations.
[1] Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang,
Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren (2022).
UVCGAN: UNetVision Transformer cycle-consistent GAN for unpaired image-to-image translation. [arXiv]
[2] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, Daniel Rueckert (2018).
Attention U-Net: Learning Where to Look for the Pancreas. [arXiv]


