From Tell-to-Design to Healthcare Test-Fit Constraint Checking: An LLM-Compatible Requirements-to-Constraints Framework

Justin Sun; Xiaoming Xiao; Huichao Dong

doi:10.69987/JACS.2023.31105

Authors

Justin Sun Applied Analytics, Columbia University, NY, USA Author
Xiaoming Xiao College of Civil Engineering, Hunan University, Changsha 410082, China Author
Huichao Dong Department of Architecture, University of Pennsylvania, Philadelphia, PA, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.31105

Keywords:

healthcare architecture, clinical user group, early-stage test-fit, constraint checking, behavioral health crisis unit, emergency department planning, natural language processing, large language models, room data sheets, adjacency constraints

Abstract

Language-guided floor-plan generation is attractive for early design, but hospital and behavioral health planning should not be treated as direct automatic plan generation. Early healthcare test-fits depend on jurisdictional review, owner standards, infection prevention, behavioral health safety, staff workflow, and clinical risk judgments that exceed a simple room-list prediction task. This paper reframes the problem as requirements-to-constraints translation followed by explicit constraint checking for early-stage test-fit design. We compiled HPEV-401, a source-grounded corpus of 401 emergency and behavioral health planning statements from four public healthcare planning sources: the International Health Facility Guidelines Emergency Unit functional planning unit, a room-data-sheet compendium, the U.S. Department of Veterans Affairs Emergency Department space planning criteria, and the Facility Guidelines Institute behavioral health crisis unit white paper. Each record preserves source and page metadata and is mapped to room-program, adjacency, and risk labels. Three translators were evaluated: a keyword baseline, a TF-IDF one-vs-rest logistic regression model, and a hybrid structured translator that combines lexical evidence, supervised multi-label prediction, and schema-level normalization. A graph-based checker then evaluated 324 candidate test-fit graphs created from the held-out split under four perturbation levels. The hybrid translator achieved held-out micro-F1 of 0.874 for room-program extraction, 0.744 for adjacency extraction, and 0.762 for risk extraction. With gold constraints, the checker detected all injected violations across nonzero severities. With hybrid-predicted constraints, the checker reached 0.956 recall at the highest perturbation severity. The results support a cautious design-assistance position: language models can structure clinical intent and route it to externally maintained profiles, while deterministic checkers expose conflicts for professional review rather than asserting code compliance.