Research Only — Not for Clinical Use

11-Model ML, Deep Learning & LSTM Comparison

Logistic Regression · Random Forest · Gradient Boosting · MLP · Residual MLP · Attention MLP
LSTM · Stacked LSTM · Bidirectional LSTM · LSTM+Attention · CNN-LSTM
Trained and evaluated on a Chinese AP cohort (n=722) with 5-fold stratified cross-validation

Performance Summary

All models evaluated with 5-fold Stratified Cross-Validation | Ground truth: SAP label (Atlanta 2012) | n=722 (585 severe / 137 mild)

Logistic Regression
0.817
AUC (5-fold CV)
F1=0.907 Sens=93.8% T=0.575
Random Forest
0.877
AUC (5-fold CV) ★ Best ML
F1=0.917 Sens=96.8% T=0.535
Gradient Boosting
0.874
AUC (5-fold CV)
F1=0.918 Sens=97.1% T=0.350
MLP (3-layer)
0.836
AUC (5-fold CV)
F1=0.909 Sens=96.9% T=0.282 Epochs≈11 LR=1e-3
Residual MLP
0.804
AUC (5-fold CV)
F1=0.912 Sens=97.8% T=0.203 Epochs≈10 LR=8e-4
Attention MLP
0.784
AUC (5-fold CV)
F1=0.909 Sens=98.3% T=0.418 Epochs≈7 LR=1e-3
LSTM
0.699
AUC (5-fold CV)
F1=0.895 Sens=100% T=0.300 Epochs≈19 LR=8e-4
Stacked LSTM
0.705
AUC (5-fold CV)
F1=0.898 Sens=98.3% T=0.489 Epochs≈17 LR=5e-4
Bidirectional LSTM
0.715
AUC (5-fold CV)
F1=0.898 Sens=99.5% T=0.318 Epochs≈22 LR=8e-4
LSTM + Attention
0.684
AUC (5-fold CV)
F1=0.896 Sens=100% T=0.172 Epochs≈27 LR=8e-4
CNN-LSTM ★ Best LSTM
0.777
AUC (5-fold CV)
F1=0.897 Sens=100% T=0.082 Epochs≈27 LR=8e-4

Full Comparison Table

ModelTypeAUCF1Threshold TPFPFNTN SensitivitySpecificityPPV
Logistic RegressionML 0.8170.9070.575 549773660 93.8%43.8%87.7%
Random Forest ★ML 0.8770.9170.535 566841953 96.8%38.7%87.1%
Gradient BoostingML 0.8740.9180.350 568851752 97.1%38.0%87.0%
MLP (3-layer)DL 0.8360.9090.282 5671031834 96.9%24.8%84.6%
Residual MLPDL 0.8040.9120.203 572981339 97.8%28.5%85.4%
Attention MLPDL 0.7840.9090.418 5751051032 98.3%23.4%84.6%
LSTMLSTM 0.6990.8950.300 58513700 100.0%0.0%81.0%
Stacked LSTMLSTM 0.7050.8980.489 5751211016 98.3%11.7%82.6%
Bidirectional LSTMLSTM 0.7150.8980.318 58212938 99.5%5.8%81.9%
LSTM + AttentionLSTM 0.6840.8960.172 58513601 100.0%0.7%81.1%
CNN-LSTM ★LSTM 0.7770.8970.082 58513502 100.0%1.5%81.2%

Confusion Matrices — Threshold Sweep

Each tab shows TP/FP/FN/TN across thresholds 0.10–0.80 | Ground truth: 1=Severe SAP, 0=Mild SAP

⚠️ Research only. All figures from 5-fold cross-validation on the same cohort — no external validation set.
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058212631199.5%8.0%82.2%0.900
0.20574118111998.1%13.9%82.9%0.899
0.30572107133097.8%21.9%84.2%0.905
0.40563101223696.2%26.3%84.8%0.902
0.5055690294795.0%34.3%86.1%0.903
0.575 ★54977366093.8%43.8%87.7%0.907
0.6054276436192.6%44.5%87.7%0.901
0.7051664697388.2%53.3%89.0%0.886
0.80467441189379.8%67.9%91.4%0.852
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058513700100%0.0%81.0%0.895
0.2058513700100%0.0%81.0%0.895
0.3058412411399.8%9.5%82.5%0.903
0.4058211032799.5%19.7%84.1%0.912
0.5057194144397.6%31.4%85.9%0.914
0.535 ★56684195396.8%38.7%87.1%0.917
0.6054767387093.5%51.1%89.1%0.912
0.7052044659388.9%67.9%92.2%0.905
0.804552213011577.8%83.9%95.4%0.857
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058311522299.7%16.1%83.5%0.909
0.2057610693198.5%22.6%84.5%0.909
0.3057091154697.4%33.6%86.2%0.915
0.350 ★56885175297.1%38.0%87.0%0.918
0.4056181245695.9%40.9%87.4%0.914
0.5055071356694.0%48.2%88.6%0.912
0.6053763487491.8%54.0%89.5%0.906
0.7052351628689.4%62.8%91.1%0.903
0.8049239939884.1%71.5%92.7%0.882
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058412511299.8%8.8%82.4%0.903
0.2057811672198.8%15.3%83.3%0.904
0.281 ★57610693198.5%22.6%84.5%0.909
0.30570105153297.4%23.4%84.4%0.905
0.4055796284195.2%29.9%85.3%0.900
0.5053781485691.8%40.9%86.9%0.893
0.6051061757687.2%55.5%89.3%0.882
0.70471431149480.5%68.6%91.6%0.857
0.804233016210772.3%78.1%93.4%0.815
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1057812771098.8%7.3%82.0%0.896
0.20571114142397.6%16.8%83.4%0.899
0.276 ★567106183196.9%22.6%84.3%0.901
0.30564105213296.4%23.4%84.3%0.900
0.4054994364393.8%31.4%85.4%0.894
0.5053583505491.5%39.4%86.6%0.889
0.6050572806586.3%47.4%87.5%0.869
0.70475531108481.2%61.3%90.0%0.854
0.804143117110670.8%77.4%93.0%0.804
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.105841311699.8%4.4%81.7%0.898
0.2057912561299.0%8.8%82.2%0.898
0.30575123101498.3%10.2%82.4%0.896
0.40574109112898.1%20.4%84.0%0.905
0.50572101133697.8%26.3%85.0%0.909
0.602 ★56685195296.8%38.0%86.9%0.916
0.7054066457192.3%51.8%89.1%0.907
0.80472491138880.7%64.2%90.6%0.854
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058513700100.0%0.0%81.0%0.895
0.2058513700100.0%0.0%81.0%0.895
0.300 ★58513700100.0%0.0%81.0%0.895
0.405801325599.1%3.6%81.5%0.894
0.50569122161597.3%10.9%82.3%0.892
0.60545115402293.2%16.1%82.6%0.876
0.7049882875585.1%40.1%85.9%0.855
0.80390471959066.7%65.7%89.2%0.763
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058513700100.0%0.0%81.0%0.895
0.2058513700100.0%0.0%81.0%0.895
0.305821373099.5%0.0%80.9%0.893
0.4057812771098.8%7.3%82.0%0.896
0.489 ★575121101698.3%11.7%82.6%0.898
0.50573121121697.9%11.7%82.6%0.896
0.60564111212696.4%19.0%83.6%0.895
0.70539102463592.1%25.5%84.1%0.879
0.80395511908667.5%62.8%88.6%0.766
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.105821333499.5%2.9%81.4%0.895
0.205821313699.5%4.4%81.6%0.897
0.318 ★5821293899.5%5.8%81.9%0.898
0.305821293899.5%5.8%81.9%0.898
0.4058012751099.1%7.3%82.0%0.898
0.5057612591298.5%8.8%82.2%0.896
0.60556106293195.0%22.6%84.0%0.892
0.70324432619455.4%68.6%88.3%0.681
0.802112237411536.1%83.9%90.6%0.516
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.1058513700100.0%0.0%81.0%0.895
0.172 ★58513601100.0%0.7%81.1%0.896
0.205841361199.8%0.7%81.1%0.895
0.305821363199.5%0.7%81.1%0.893
0.405811354299.3%1.5%81.1%0.893
0.505761309798.5%5.1%81.6%0.892
0.60536111492691.6%19.0%82.8%0.870
0.70479891064881.9%35.0%84.3%0.831
0.80409561768169.9%59.1%88.0%0.779
ThresholdTPFPFNTN SensitivitySpecificityPPVF1
0.082 ★58513502100.0%1.5%81.2%0.897
0.105821343399.5%2.2%81.3%0.895
0.20573125121297.9%8.8%82.1%0.893
0.30563120221796.2%12.4%82.4%0.888
0.40554113312494.7%17.5%83.1%0.885
0.50542100433792.6%27.0%84.4%0.883
0.6052778585990.1%43.1%87.1%0.886
0.7049861877685.1%55.5%89.1%0.871
0.80446501398776.2%63.5%89.9%0.825

ROC Curves

5-fold cross-validation out-of-fold predictions | X axis: False Positive Rate · Y axis: True Positive Rate

Logistic Regression — AUC 0.817

Random Forest — AUC 0.877 ★

Gradient Boosting — AUC 0.874

MLP (3-layer) — AUC 0.836 · lr=1e-3 · ~11 epochs

Residual MLP — AUC 0.804 · lr=8e-4 · ~10 epochs

Attention MLP — AUC 0.784 · lr=1e-3 · ~7 epochs

LSTM — AUC 0.699 · lr=8e-4 · ~19 epochs

Stacked LSTM — AUC 0.705 · lr=5e-4 · ~17 epochs

Bidirectional LSTM — AUC 0.715 · lr=8e-4 · ~22 epochs

LSTM + Attention — AUC 0.684 · lr=8e-4 · ~27 epochs

CNN-LSTM ★ — AUC 0.777 · lr=8e-4 · ~27 epochs

Feature Importance

Top-10 features per classical ML model | LR: |coefficient| · RF/GB: feature_importance score
Deep learning models use all 106 features via learned attention — no single-feature ranking is available.

Algorithm Descriptions

Logistic Regression

Linear model with L2 regularisation (C=0.5). Features scaled with StandardScaler. Optimal threshold 0.575 selected by maximum F1.

Key features: Lymphocytes, Calcium, Creatinine, Lactate, LDH. Fully interpretable — each coefficient directly quantifies feature contribution.

sklearn LogisticRegression · max_iter=1000 · random_state=42

Random Forest ★ Best AUC

Ensemble of 200 decision trees (max_depth=6, min_samples_leaf=5). Averages predicted probabilities across trees. Highest overall AUC (0.877).

Robust to missing values; no normalisation required. Key features: Calcium, D-dimer, LDH, Lactate, Hematocrit.

sklearn RandomForestClassifier · n_estimators=200 · max_depth=6

Gradient Boosting

150 shallow boosted trees (max_depth=3), learning_rate=0.05. Highest F1 score (0.918). Low threshold (0.35) achieves high sensitivity early on the ROC curve.

Key features: D-dimer, Calcium, LDH, Lactate, Hematocrit — nearly identical to RF, confirming feature stability.

sklearn GradientBoostingClassifier · n_estimators=150 · lr=0.05

MLP — 3-Layer Feedforward

Architecture: 256 → 128 → 64 → 1 sigmoid. Each hidden layer followed by BatchNormalization and Dropout (0.35/0.30/0.20). L2 regularisation on dense layers.

Early stopping on val_AUC (patience=8) converged at ~11 epochs per fold (range 8–16). AUC 0.836.

Adam lr=1e-3 · Optimal epochs≈11 (folds: 13/11/8/9/16) · Batch 32 · 5-fold CV

Residual MLP

Projection layer (128) followed by two residual blocks: Dense→BN→Dropout→Dense with skip connection. Lower LR (8e-4) chosen to stabilise skip-connection learning.

Early stopping converged at ~10 epochs per fold (range 4–17). Wider fold variance reflects sensitivity of residual depth to initialisation. AUC 0.804.

Adam lr=8e-4 · Optimal epochs≈10 (folds: 10/12/7/4/17) · Batch 32 · 5-fold CV

Attention MLP

Feature-wise sigmoid attention gate (Dense 106→106) scales each input before the downstream MLP (256→128→64→1). The gate weights are inspectable per sample.

Fastest convergence: early stopping fired at ~7 epochs per fold (range 2–14), indicating rapid feature selection learning. AUC 0.784.

Adam lr=1e-3 · Optimal epochs≈7 (folds: 9/2/6/5/14) · Batch 32 · 5-fold CV

LSTM

Vanilla single-layer LSTM (64 units) followed by Dropout(0.3) and a Dense(32,relu) head. Features reshaped to (106,1) so each lab value is treated as one time step.

Early stopping on val_AUC converged at ~19 epochs per fold (range 10–33). AUC 0.699.

Adam lr=8e-4 · Optimal epochs≈19 (folds: 19/10/33/25/10) · Batch 32

Stacked LSTM

Two-layer stacked LSTM: LSTM(64, return_sequences) → Dropout(0.25) → LSTM(32) → Dropout(0.25) → Dense(32). Lower LR (5e-4) chosen to stabilise deeper gradient flow.

Converged at ~17 epochs per fold (range 4–25). AUC 0.705.

Adam lr=5e-4 · Optimal epochs≈17 (folds: 16/22/25/16/4) · Batch 32

Bidirectional LSTM

BiLSTM(64) processes the 106-length sequence in both forward and backward directions, concatenating the outputs (128-dim), followed by BatchNormalization and Dropout(0.3).

Highest AUC among vanilla LSTM variants (~0.715). Converged at ~22 epochs per fold (range 6–43). High fold variance due to dataset size.

Adam lr=8e-4 · Optimal epochs≈22 (folds: 40/6/43/12/7) · Batch 32

LSTM + Attention

Bahdanau-style additive attention: LSTM(64, return_sequences=True) produces (106,64) states; Dense(1,tanh) scores each step; Softmax(axis=1) normalises; weighted sum collapses to context vector.

Converged at ~27 epochs per fold (range 14–53). AUC 0.684.

Adam lr=8e-4 · Optimal epochs≈27 (folds: 16/19/53/14/31) · Batch 32

CNN-LSTM ★ Best Sequence Model

Conv1D(32,k=5,same) extracts local feature patterns → MaxPool1D(2) halves the sequence → Conv1D(64,k=3,same) abstracts higher-level motifs → LSTM(64) captures temporal order → Dropout(0.3) → Dense(32,relu).

Highest AUC among all LSTM variants (0.777). Converged at ~27 epochs per fold (range 12–43).

Adam lr=8e-4 · Optimal epochs≈27 (folds: 16/12/43/37/26) · Batch 32

Limitations: No external test set — all figures from cross-validation on the same Chinese cohort (n=722). Class imbalance: 585 severe vs 137 mild. Dataset sourced from a single institution and has not been validated on Western or multi-site cohorts. Deep learning models are sensitive to random seed and training variance; results may differ slightly on re-training.