An Empirical Comparison of Generation Quality and Diversity Between Discrete Diffusion and Autoregressive Text Generation
DOI:
https://doi.org/10.69987/AIMLR.2025.60202Keywords:
discrete diffusion, autoregressive generation, text quality, generation diversityAbstract
Autoregressive language models have long dominated text generation, yet their left-to-right factorization introduces well-documented limitations in diversity and controllability. Recent advances in discrete diffusion methods, grounded in stochastic differential equation theory adapted to categorical state spaces, have emerged as a promising non-autoregressive alternative. This paper presents a systematic empirical comparison between discrete diffusion approaches and autoregressive baselines of comparable scale, focusing on two quantifiable dimensions: generation quality and output diversity. Drawing on published experimental results from representative methods including SEDD, MDLM, Discrete Flow Matching, and GPT-2 variants, and evaluated across standard benchmarks such as OpenWebText, Text8, WikiText-103, and LM1B, this study consolidates scattered findings into a unified analytical lens. The comparison employs multiple complementary metrics spanning token-level negative log-likelihood, generative perplexity, MAUVE scores, distinct n-gram ratios, and entropy measures. Results indicate that state-of-the-art discrete diffusion methods have narrowed the likelihood gap with autoregressive models to within 10–25% at comparable parameter counts, while exhibiting measurable advantages in lexical diversity and distributional coverage. The quality–diversity trade-off frontier differs structurally between the two paradigms, with discrete diffusion methods achieving favorable operating points without requiring temperature tuning. These findings clarify the current standing of discrete diffusion relative to autoregressive generation and identify specific evaluation dimensions where each paradigm holds advantages.

