[CS.AI] The Truth Behind Diversity in Large Language Mode...

Recent advances in large language models (LLMs) have enabled the generation of high-quality prose, yet the question of whether these models are capable of generating diverse outputs remains contested. This work investigates the diversity of LLM-generated stories through the framework of narrative similarity. Using a contrastive framework and a dataset of human-written stories and prompts from r/WritingPrompts, we collect narrative similarity judgments across 10 representative LLMs, utilizing both human evaluations and three different automatic annotation methods.

Our findings reveal a consistent trend: LLM-generated narratives are consistently more similar to each other than human-written stories are. We demonstrate that frontier models in particular converge on a "mean" generic narrative that approximates individual human stories but lacks the collective diversity of human authors. Finally, we show that common mitigation strategies, including negative prompting and temperature scaling, fail to meaningfully address this homogeneity.

Blogger's Review: This study highlights the limitations of large language models in terms of creativity. While they can generate fluent text, they fall short in narrative diversity. This serves as a reminder to consider the uniqueness and innovation of their outputs critically when utilizing these models.

[CS.AI] The Truth Behind Diversity in Large Language Model Outputs