How Prompt Specificity Affects Edge Case Handling in LLM-Generated Code: An Empirical Evaluation

Minhao Li; Fanyi Zhao; Tianxing Tang

doi:10.69987/AIMLR.2024.50411

Authors

Minhao Li Master of Science in Computer Engineering, University of California, Davis, CA, USA Author
Fanyi Zhao Computer Science, Stevens Institute of Technology, NJ, USA Author
Tianxing Tang Translation and Localization Management, Middlebury Institute of International Studies, CA, USA Author

DOI:

https://doi.org/10.69987/AIMLR.2024.50411

Keywords:

large language models, code generation, prompt specificity, edge case evaluation

Abstract

Large language models have demonstrated strong performance on code generation benchmarks, yet standard evaluations may overestimate their robustness by relying on insufficient test suites that fail to exercise edge cases. This study investigates how varying levels of prompt specificity influence the ability of LLMs to generate code that correctly handles edge cases. We define four incremental specificity levels ranging from minimal function signatures to prompts containing explicit edge case hints and evaluate four LLMs (GPT-4o, Claude 3.5 Sonnet, DeepSeek-Coder-V2, and Qwen2.5-Coder-32B) on the HumanEval+ benchmark, which augments 164 HumanEval problems with approximately 80 times more test cases targeting boundary conditions. We introduce the Edge Pass Rate (EPR) metric to isolate edge case handling from general functional correctness. Our results show that increasing prompt specificity from minimal to edge-explicit yields a mean EPR improvement of 15.9 percentage points across all models, roughly 1.8 times the corresponding pass@1 gain of 8.9 points. Boundary value and negative number categories benefit most, while type coercion edge cases remain resistant. Weaker models exhibit greater sensitivity to prompt specificity, suggesting that prompt investment yields disproportionate returns when computational resources constrain the choice of LLM.

Author Biography

Tianxing Tang, Translation and Localization Management, Middlebury Institute of International Studies, CA, USA

How Prompt Specificity Affects Edge Case Handling in LLM-Generated Code: An Empirical Evaluation

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Share

Final Sidebar