LLM-as-Reranker for Personalized Recommendation: Popularity Bias Mitigation and Faithful Natural-Language Explanations on MovieLens 100K

Xiaohan Chang; Tong Ye; Sophia Luo

doi:10.69987/JACS.2023.30806

Authors

Xiaohan Chang Computer Science, University of Connecticut, CT, USA Author
Tong Ye Computer Science, Northeastern University, CA, USA Author
Sophia Luo Computer Science, USC, CA, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30806

Keywords:

Personalized recommendation, LLM-as-reranker, MovieLens 100K, popularity bias, long-tail recommendation, explainable recommendation, faithful natural-language explanations, matrix factorization, collaborative filtering

Abstract

This paper reports a complete, reproducible experimental study of a local LLM-as-reranker design for personalized movie recommendation on MovieLens 100K. The study uses the official u1-u5 five-fold splits, treats ratings of four or five stars as relevant top-k targets, and compares seven recommenders: popularity ranking, genre content ranking, user k-nearest-neighbor collaborative filtering, item k-nearest-neighbor collaborative filtering, biased matrix factorization, a strong hybrid MF-popularity candidate generator, and the proposed LLMR-FaithfulTail reranker. The proposed reranker converts each user's positive history into a natural-language evidence profile, scores the top-70 hybrid candidates using base utility, genre compatibility, prompt evidence, quality, and a personalized long-tail component, and generates explanations only from the same score components. The measured five-fold results show that LLMR-FaithfulTail achieved NDCG@10 of 0.214 ± 0.044 and Recall@10 of 0.122 ± 0.007. Relative to pure popularity ranking, it improved NDCG@10 while reducing average recommendation popularity from 257.0 ± 5.2 to 221.4 ± 4.8 and increasing catalog coverage from 0.035 ± 0.004 to 0.110 ± 0.015. The explanation audit over 5,000 generated explanations found 1.000 evidence precision, 1.000 score-component coverage, and zero unsupported claims because every sentence is grounded in item genres, user-history genres, and the stored reranking component table. The results demonstrate that an auditable LLM-style reranking interface can reduce head-item exposure without replacing conventional collaborative recommenders, although the small and old MovieLens 100K catalog limits claims about modern large-scale deployment.