[1]
Yuhan Li and Mingzhuo Yu, “Benchmarking CUDA, CuPy, and Triton Kernel Optimizations for 3D Point Cloud Segmentation: An Empirical Comparison of Latency, Memory Efficiency, and GPU Utilization”, JACS, vol. 6, no. 5, pp. 21–30, May 2026, doi: 10.69987/JACS.2026.60503.