Spatial RAG for Urban Crash Hotspot Discovery and Safety Countermeasure Recommendation
DOI:
https://doi.org/10.69987/JACS.2023.31206Keywords:
urban crash analysis, hotspot discovery, STATS19, spatial clustering, BIRCH, DBSCAN, kernel density estimation, retrieval-augmented generation, safety countermeasures, traffic exposureAbstract
Urban crash screening systems often identify high-burden places but leave safety analysts to translate those places into treatment concepts. This paper develops a Spatial RAG pipeline for urban crash hotspot discovery and countermeasure recommendation using official 2022 Great Britain road safety and traffic data. The study integrates 106,004 STATS19 collision records, 135,480 casualty records, 193,545 vehicle records, 22,240 AADF count-point rows, 206 local-authority traffic records, and 17,840 MRDB major-road links. The primary spatial benchmark uses the 71,763 geocoded urban injury collisions. National clustering compares BIRCH, MiniBatchKMeans, and DBSCAN on British National Grid coordinates. BIRCH and MiniBatchKMeans both produce 204-center partitions aligned with the number of active highway authorities in the urban sample; MiniBatchKMeans reaches the highest sampled silhouette of 0.460 in 2.94 s, while BIRCH reaches 0.447 in 3.07 s and supplies the hierarchical center structure used for downstream hotspot screening. DBSCAN identifies 679 dense components but leaves 29.9% of crashes as noise and forms a 20,192-crash largest component, making it less suitable as the national partition. Within the ten highest-burden BIRCH centers, Spatial RAG captures 14.87% of held-out severity burden at a 5% cell budget and reaches a mean severity-AUC10 of 0.1405, compared with 11.13% and 0.1093 for KDEGrid. The paired center-wise advantage over KDEGrid is significant with a one-sided Wilcoxon p-value of 0.001953. KDEGrid remains the most stable method at the top-5% budget, with a Jaccard overlap of 0.724. The retrieval layer assigns eight of the top ten hotspots to an urban vulnerable-road-user area-wide package and two to a major-road corridor speed-management package. The results show that severity-aware spatial screening and deterministic retrieval can convert an official crash archive into a transparent first-pass safety planning tool.







