Market Microstructure Risk Forecasting from Limit Order Books: Multi-Horizon Price-Move Classification and Volatility Estimation with DeepLOB-Style CNN–LSTM and Temporal Transformers

Jiaying Jin; Tina Huang

doi:10.69987/JACS.2023.31205

Authors

Jiaying Jin Applied Analytics, Columbia University, NY, USA Author
Tina Huang Computer Engineering, Columbia University, NY, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.31205

Keywords:

Limit order book, market microstructure, mid-price prediction, DeepLOB, transformer, volatility forecasting, multi-horizon classification, FI-2010

Abstract

High-frequency limit order books (LOBs) encode short-horizon liquidity and order-flow conditions that drive market microstructure risk. This revised paper presents a compact, leakage-aware study of two related tasks on a FI-2010-derived working export: (i) three-class direction forecasting at 1-, 5-, and 10-tick horizons and (ii) short-horizon realized-volatility estimation as a risk proxy. We compare two representation-learning models—a DeepLOB-style CNN–LSTM and a small temporal Transformer—using 100-event windows of 40 LOB features. To reduce serial dependence, windows are extracted with stride 40, and the evaluation uses a day-respecting split (train days 1–7, validation day 8, test days 9–10). Because this split and working export differ from the canonical FI-2010 benchmark protocols, the reported scores are interpreted as within-study comparisons rather than direct reproductions of published benchmark tables. Under this protocol, the DeepLOB-style model achieves the strongest average Macro-F1 (0.3536), while the compact Transformer is most competitive at the shortest horizon (Macro-F1@1 = 0.3368). Frozen encoder embeddings also carry useful risk information: DeepLOB-Emb+Ridge reduces realized-volatility RMSE from 2.4670 to 1.9992 at h=1 and from 5.1444 to 4.5873 at h=10 (σ scaled by 1e4) relative to a persistence baseline. All figures and tables in this revision were regenerated from scratch, and the efficiency audit was recomputed from exact reference implementations, yielding 21,945 parameters for DeepLOB, 5,274 for the Transformer, and 3,017 for the no-attention FFN ablation. The results suggest that convolutional multi-scale structure remains strong under compact CPU-friendly settings, while attention helps at the shortest horizon but does not dominate under limited capacity and heavy subsampling.