Comparative Evaluation of Self-Supervised Pretraining Strategies for Few-Shot Medical Image Analysis
DOI:
https://doi.org/10.69987/AIMLR.2026.70202Keywords:
Self-supervised learning, few-shot learning, medical image analysis, transfer learningAbstract
Self-supervised learning has emerged as a promising solution to address the chronic scarcity of labeled medical imaging data. This study presents a comprehensive evaluation of mainstream self-supervised pretraining strategies, including contrastive learning methods (CLIP, DINO) and masked image modeling approaches (MAE), specifically focusing on their effectiveness in few-shot medical image analysis scenarios. We systematically assess the feature representation quality and downstream task performance of these methods across multiple medical imaging modalities including chest X-rays, CT scans, and MRI sequences. Our experimental framework evaluates these strategies under various data-scarce conditions (5-shot, 10-shot, and 50-shot settings) using standardized benchmark datasets. Linear probing experiments reveal that masked autoencoder-based methods achieve superior feature discriminability with 87.3% accuracy compared to 84.1% for contrastive approaches. However, contrastive methods demonstrate stronger cross-domain transfer capabilities, maintaining 81.2% average performance when adapted to unseen anatomical regions versus 76.8% for reconstruction-based methods. Our quantitative analysis further indicates that hybrid pretraining strategies combining both paradigms yield optimal results in extremely low-data regimes, achieving 89.6% classification accuracy with only 10 labeled samples per class. These findings provide evidence-based guidance for selecting appropriate self-supervised pretraining strategies based on specific clinical deployment scenarios, data availability constraints, and computational resource limitations.

