Tactical Language + AI Tutoring from Structured Volleyball Rally Logs: Reproducible Experiments on NCAA Play-by-Play

Jubin Zhang

doi:10.69987/JACS.2024.40105

Authors

Jubin Zhang Department of Physical Education, North China Institute of Aerospace Engineering, Langfang 065000, China Author

DOI:

https://doi.org/10.69987/JACS.2024.40105

Keywords:

Volleyball analytics, rally representation, tactical tutoring, explainable NLP, logistic regression, play-by-play data, natural language generation

Abstract

This paper presents a fully reproducible tactical tutoring pipeline for indoor volleyball based on structured, rally-level representations. Motivated by recent volleyball rally language resources such as VREN and by the growing interest in natural language feedback systems, we study how to convert play-by-play logs into a compact token sequence and then learn models that predict (A) rally outcome (win/loss from the serving-team perspective) together with the winning/losing reason category, and (B) simplified setting type and attack type labels. We conduct complete experimental evaluations on the NCAA men’s Division I 2020 play-by-play logs released with ncaavolleyballr. Using a match-disjoint 70/10/20 split, logistic regression with unigram–bigram features reaches 0.989 Accuracy and 0.988 Macro-F1 on outcome prediction, and 0.9998 Accuracy and 0.9995 Macro-F1 on reason prediction. On Task B, the same approach achieves 0.993 Accuracy / 0.993 Macro-F1 for set-type prediction and 0.9996 Accuracy / 0.997 Macro-F1 for attack-type prediction. To support coaching-oriented interpretation, we extract token and token-pair evidence from learned coefficients and convert this evidence into concise instructional explanations using a deterministic tutor generator, enabling a rule-consistency audit with 100% agreement on sampled cases. Our artifacts include rally segmentation, tokenization, experimental settings, figures, and tables, making the study directly replicable.