Hybrid medical image segmentation model combining a pretrained ResNet-50 encoder with the Full-scale Connected Transformer (FCT) decoder for binary skin lesion segmentation on the ISIC 2018 dataset.
DermSeg grafts the FCT decoder onto a pretrained ResNet-50 backbone, replacing the original hand-built FCT encoder while preserving the decoder's core components: ConvAttention, Wide-Focus modules, and deep supervision heads.
Input (224×224×3)
│
▼
ResNet-50 Backbone (ImageNet pretrained)
│ conv1_relu → skip1 (112×112×64) → 1×1 proj → (112×112×16)
│ conv2_block3_out → skip2 ( 56×56×256) → 1×1 proj → ( 56×56×32)
│ conv3_block4_out → skip3 ( 28×28×512) → 1×1 proj → ( 28×28×64)
│ conv4_block6_out → skip4 ( 14×14×1024) → 1×1 proj → ( 14×14×128)
└─ conv5_block3_out → bottleneck (7×7×2048) → 1×1 proj → (7×7×384)
│
att_block (bottleneck)
│
┌────── FCT DECODER ──────────────────────────────────────────┘
│ Block 6: 7→14 + skip4 → att_block
│ Block 7: 14→28 + skip3 → att_block → deep_head_1
│ Block 8: 28→56 + skip2 → att_block → deep_head_2
└─ Block 9: 56→112 + skip1 → att_block → final output (224×224)
5 skip connections are tapped from ResNet-50 stages and projected via 1×1 convolutions to match decoder channel expectations, enabling multi-scale feature fusion from 7×7 to 112×112 spatial resolutions.
- Pretrained backbone — ResNet-50 with ImageNet weights; low-level feature detectors transfer directly to dermoscopy images
- FCT decoder preserved — ConvAttention (CvT-style), Wide-Focus dilated convolutions, StochasticDepth regularization
- Deep supervision — 3 output heads with weighted loss (0.14 / 0.29 / 0.57) for stable gradient flow
- Two-phase training — Phase 1 freezes backbone (decoder warm-up), Phase 2 selectively unfreezes conv4 & conv5
- Custom tf.data pipeline — ResNet-specific ImageNet preprocessing, joint image-mask augmentation, AUTOTUNE prefetching
| Metric | Score |
|---|---|
| Dice | ~0.90 |
| IoU | ~0.84 |
| Precision | ~0.91 |
| Recall | ~0.92 |
Evaluated on the held-out test set (15% of ISIC 2018 Task 1).
Update the table above with exact values fromfinal_metrics.jsonafter training.
ISIC 2018 Challenge — Task 1: Lesion Segmentation
- 2,594 dermoscopy images with binary segmentation masks
- Split: 70% train / 15% val / 15% test (reproducible with seed 42)
- Input resolution: 224×224 (canonical ImageNet size, divisible by 32)
Download from Kaggle and set paths in the config section of the notebook.
- Backbone frozen entirely
- Learning rate:
1e-3(Adam) - Trains only FCT decoder weights
- conv1–conv3 remain frozen (preserve low-level edge/colour detectors)
- conv4 & conv5 unfrozen
- Learning rate:
1e-4(Adam) withReduceLROnPlateau EarlyStoppingwith patience 12, restores best weights
Total Loss = 0.14 × L(head1) + 0.29 × L(head2) + 0.57 × L(final)
L = BCE + Dice loss
pip install tensorflow numpy matplotlib tqdm| Package | Version |
|---|---|
| TensorFlow | ≥ 2.10 |
| NumPy | ≥ 1.23 |
| Matplotlib | ≥ 3.6 |
| tqdm | any |
- Clone the repo and set up dataset paths in Section 3 (Configuration) of the notebook
- Run all cells sequentially — Phase 1 and Phase 2 training execute automatically
- Best weights are saved to
fct_resnet50_isic/after each phase - Final metrics are exported to
final_metrics.json; predictions visualized in Section 13
DermSeg/
├── fct-resnet50-isic.ipynb # Main training notebook
├── fct_resnet50_isic/ (will upload soon)
│ ├── phase1_best.weights.h5
│ ├── phase2_best.weights.h5
│ ├── fct_resnet50_final.weights.h5
│ ├── fct_resnet50_saved_model.keras
│ ├── final_metrics.json
│ ├── training_curves.png
│ └── tb_logs_p2/
└── README.md