Cultural Bias Mitigation in Vision-Language Models for Digital Heritage Documentation: A Comparative Analysis of Debiasing Techniques
DOI:
https://doi.org/10.69987/AIMLR.2024.50303Keywords:
Vision-language models, cultural bias mitigation, digital heritage documentation, cross-modal adaptersAbstract
Although vision-language models have demonstrated remarkable capabilities in digital heritage documentation, they exhibit persistent cultural biases that compromise equitable representation of diverse cultural traditions. This study presents a systematic comparative analysis of debiasing techniques for vision-language models in heritage documentation contexts, categorizing approaches into data-level interventions, model-level modifications, and post-processing methods. We introduce Heritage-Bias, a specialized dataset containing 18,750 digitized artifacts from 15 cultural traditions with controlled variation in artifact attributes and contextual descriptions. Quantitative evaluation across multiple bias dimensions demonstrates that cross-modal adapter approaches achieve superior performance in preserving cultural nuance while reducing bias (47.2% reduction with 0.87 cultural attribute preservation). Combined interventions integrating counterfactual data generation with cross-modal adapters yield the most substantial improvements (53.8% overall bias reduction). Geo-cultural bias proves more resistant to mitigation than gender or skin tone bias, requiring specialized interventions incorporating domain expertise. Implementation analysis reveals context-dependent effectiveness patterns, with balanced dataset construction and output calibration serving as effective initial interventions for resource-constrained heritage institutions. Our findings establish a methodological framework for evaluating and addressing cultural bias in computational heritage documentation, promoting more equitable representation of global cultural heritage in digital preservation efforts.