Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 889 Bytes

File metadata and controls

19 lines (15 loc) · 889 Bytes

Week 7: Training Large Language Models (ZeRO, Data Parallelism)

📌 Briefly

Training on a single GPU (limitations, bottlenecks)
Mixed-precision training: FP32, BF16, FP16, FP8
Data parallelism and All-Reduce
ZeRO optimization stages
Fully Sharded Data Parallel (FSDP)


📚 Additional Materials