Class Imbalance
Problem
The training dataset (2,188 images, 9,423 annotations) has significant class imbalance:
| Class | Count | Ratio vs largest |
|---|---|---|
| Solda | 2,536 | 1.0x |
| IV-2 | 2,224 | 0.88x |
| IV-1B | 1,920 | 0.76x |
| IV-4 | 1,108 | 0.44x |
| IV-1A | 774 | 0.31x |
| IV-3 | 393 | 0.15x |
| IV-6 | 250 | 0.10x |
| IV-5 | 218 | 0.09x |
The top 3 classes hold 71% of all annotations. Solda has ~12x more instances than IV-5.
Why current augmentation does not help
The augmentations in hyperparameters.yaml (mosaic, mixup, copy_paste, flips, rotation, HSV shifts, etc.) apply uniformly to all images regardless of class content. They increase visual variety but preserve the original class distribution — an underrepresented class remains underrepresented.
Solution: Weighted Sampling (Experiment 5)
Implemented via scripts/weighted_sampler.py. Monkey-patches the Ultralytics dataloader to sample images with probability inversely proportional to their class frequency using WeightedRandomSampler.
- Each image's weight = max inverse-frequency of any class it contains
- Images with rare classes (IV-5, IV-6, IV-3) are sampled ~10x more often
- Total epoch length stays the same (2,188 draws), only the mix changes
- Validation is unaffected — mAP reflects true performance
Experiment 5 trains all model sizes — nano, small, medium, large — (both architectures, both approaches) with balanced sampling and compares per-class mAP against unbalanced baselines from Experiment 1.
Other approaches considered but not used (yet)
- Focal Loss (
fl_gamma) — Would confound the weighted sampling experiment. Discussed as future work. - Image duplication — Co-occurrence problem (duplicating for IV-5 also inflates Solda), increases disk/epoch time.
- Targeted augmentation — AugmenTory library doesn't support rebalancing; Albumentations doesn't handle YOLO polygons natively.
- Undersampling — Loses data from an already small dataset.
Notes
- For the benchmark (comparing architectures/formats), the imbalance affects all models equally, so comparisons remain fair
- The imbalance should be documented with the results as a dataset limitation
- Per-class mAP from the benchmark results will quantify the actual impact