PenuX — Model Comparison

Performance Summary

All models evaluated with 5-fold Stratified Cross-Validation | Ground truth: SAP label (Atlanta 2012) | n=722 (585 severe / 137 mild)

🤖 Classical Machine Learning

Logistic Regression

0.817

AUC (5-fold CV)

F1=0.907 Sens=93.8% T=0.575

Random Forest

0.877

AUC (5-fold CV) ★ Best ML

F1=0.917 Sens=96.8% T=0.535

Gradient Boosting

0.874

AUC (5-fold CV)

F1=0.918 Sens=97.1% T=0.350

🧠 Deep Learning (TensorFlow / Keras)

MLP (3-layer)

0.836

AUC (5-fold CV)

F1=0.909 Sens=96.9% T=0.282 Epochs≈11 LR=1e-3

Residual MLP

0.804

AUC (5-fold CV)

F1=0.912 Sens=97.8% T=0.203 Epochs≈10 LR=8e-4

Attention MLP

0.784

AUC (5-fold CV)

F1=0.909 Sens=98.3% T=0.418 Epochs≈7 LR=1e-3

🔁 LSTM / Recurrent Deep Learning

LSTM

0.699

AUC (5-fold CV)

F1=0.895 Sens=100% T=0.300 Epochs≈19 LR=8e-4

Stacked LSTM

0.705

AUC (5-fold CV)

F1=0.898 Sens=98.3% T=0.489 Epochs≈17 LR=5e-4

Bidirectional LSTM

0.715

AUC (5-fold CV)

F1=0.898 Sens=99.5% T=0.318 Epochs≈22 LR=8e-4

LSTM + Attention

0.684

AUC (5-fold CV)

F1=0.896 Sens=100% T=0.172 Epochs≈27 LR=8e-4

CNN-LSTM ★ Best LSTM

0.777

AUC (5-fold CV)

F1=0.897 Sens=100% T=0.082 Epochs≈27 LR=8e-4

Full Comparison Table

Model	Type	AUC	F1	Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV
Logistic Regression	ML	0.817	0.907	0.575	549	77	36	60	93.8%	43.8%	87.7%
Random Forest ★	ML	0.877	0.917	0.535	566	84	19	53	96.8%	38.7%	87.1%
Gradient Boosting	ML	0.874	0.918	0.350	568	85	17	52	97.1%	38.0%	87.0%
MLP (3-layer)	DL	0.836	0.909	0.282	567	103	18	34	96.9%	24.8%	84.6%
Residual MLP	DL	0.804	0.912	0.203	572	98	13	39	97.8%	28.5%	85.4%
Attention MLP	DL	0.784	0.909	0.418	575	105	10	32	98.3%	23.4%	84.6%
LSTM	LSTM	0.699	0.895	0.300	585	137	0	0	100.0%	0.0%	81.0%
Stacked LSTM	LSTM	0.705	0.898	0.489	575	121	10	16	98.3%	11.7%	82.6%
Bidirectional LSTM	LSTM	0.715	0.898	0.318	582	129	3	8	99.5%	5.8%	81.9%
LSTM + Attention	LSTM	0.684	0.896	0.172	585	136	0	1	100.0%	0.7%	81.1%
CNN-LSTM ★	LSTM	0.777	0.897	0.082	585	135	0	2	100.0%	1.5%	81.2%

Confusion Matrices — Threshold Sweep

Each tab shows TP/FP/FN/TN across thresholds 0.10–0.80 | Ground truth: 1=Severe SAP, 0=Mild SAP

⚠️ Research only. All figures from 5-fold cross-validation on the same cohort — no external validation set.

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	582	126	3	11	99.5%	8.0%	82.2%	0.900
0.20	574	118	11	19	98.1%	13.9%	82.9%	0.899
0.30	572	107	13	30	97.8%	21.9%	84.2%	0.905
0.40	563	101	22	36	96.2%	26.3%	84.8%	0.902
0.50	556	90	29	47	95.0%	34.3%	86.1%	0.903
0.575 ★	549	77	36	60	93.8%	43.8%	87.7%	0.907
0.60	542	76	43	61	92.6%	44.5%	87.7%	0.901
0.70	516	64	69	73	88.2%	53.3%	89.0%	0.886
0.80	467	44	118	93	79.8%	67.9%	91.4%	0.852

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	585	137	0	0	100%	0.0%	81.0%	0.895
0.20	585	137	0	0	100%	0.0%	81.0%	0.895
0.30	584	124	1	13	99.8%	9.5%	82.5%	0.903
0.40	582	110	3	27	99.5%	19.7%	84.1%	0.912
0.50	571	94	14	43	97.6%	31.4%	85.9%	0.914
0.535 ★	566	84	19	53	96.8%	38.7%	87.1%	0.917
0.60	547	67	38	70	93.5%	51.1%	89.1%	0.912
0.70	520	44	65	93	88.9%	67.9%	92.2%	0.905
0.80	455	22	130	115	77.8%	83.9%	95.4%	0.857

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	583	115	2	22	99.7%	16.1%	83.5%	0.909
0.20	576	106	9	31	98.5%	22.6%	84.5%	0.909
0.30	570	91	15	46	97.4%	33.6%	86.2%	0.915
0.350 ★	568	85	17	52	97.1%	38.0%	87.0%	0.918
0.40	561	81	24	56	95.9%	40.9%	87.4%	0.914
0.50	550	71	35	66	94.0%	48.2%	88.6%	0.912
0.60	537	63	48	74	91.8%	54.0%	89.5%	0.906
0.70	523	51	62	86	89.4%	62.8%	91.1%	0.903
0.80	492	39	93	98	84.1%	71.5%	92.7%	0.882

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	584	125	1	12	99.8%	8.8%	82.4%	0.903
0.20	578	116	7	21	98.8%	15.3%	83.3%	0.904
0.281 ★	576	106	9	31	98.5%	22.6%	84.5%	0.909
0.30	570	105	15	32	97.4%	23.4%	84.4%	0.905
0.40	557	96	28	41	95.2%	29.9%	85.3%	0.900
0.50	537	81	48	56	91.8%	40.9%	86.9%	0.893
0.60	510	61	75	76	87.2%	55.5%	89.3%	0.882
0.70	471	43	114	94	80.5%	68.6%	91.6%	0.857
0.80	423	30	162	107	72.3%	78.1%	93.4%	0.815

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	578	127	7	10	98.8%	7.3%	82.0%	0.896
0.20	571	114	14	23	97.6%	16.8%	83.4%	0.899
0.276 ★	567	106	18	31	96.9%	22.6%	84.3%	0.901
0.30	564	105	21	32	96.4%	23.4%	84.3%	0.900
0.40	549	94	36	43	93.8%	31.4%	85.4%	0.894
0.50	535	83	50	54	91.5%	39.4%	86.6%	0.889
0.60	505	72	80	65	86.3%	47.4%	87.5%	0.869
0.70	475	53	110	84	81.2%	61.3%	90.0%	0.854
0.80	414	31	171	106	70.8%	77.4%	93.0%	0.804

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	584	131	1	6	99.8%	4.4%	81.7%	0.898
0.20	579	125	6	12	99.0%	8.8%	82.2%	0.898
0.30	575	123	10	14	98.3%	10.2%	82.4%	0.896
0.40	574	109	11	28	98.1%	20.4%	84.0%	0.905
0.50	572	101	13	36	97.8%	26.3%	85.0%	0.909
0.602 ★	566	85	19	52	96.8%	38.0%	86.9%	0.916
0.70	540	66	45	71	92.3%	51.8%	89.1%	0.907
0.80	472	49	113	88	80.7%	64.2%	90.6%	0.854

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.20	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.300 ★	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.40	580	132	5	5	99.1%	3.6%	81.5%	0.894
0.50	569	122	16	15	97.3%	10.9%	82.3%	0.892
0.60	545	115	40	22	93.2%	16.1%	82.6%	0.876
0.70	498	82	87	55	85.1%	40.1%	85.9%	0.855
0.80	390	47	195	90	66.7%	65.7%	89.2%	0.763

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.20	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.30	582	137	3	0	99.5%	0.0%	80.9%	0.893
0.40	578	127	7	10	98.8%	7.3%	82.0%	0.896
0.489 ★	575	121	10	16	98.3%	11.7%	82.6%	0.898
0.50	573	121	12	16	97.9%	11.7%	82.6%	0.896
0.60	564	111	21	26	96.4%	19.0%	83.6%	0.895
0.70	539	102	46	35	92.1%	25.5%	84.1%	0.879
0.80	395	51	190	86	67.5%	62.8%	88.6%	0.766

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	582	133	3	4	99.5%	2.9%	81.4%	0.895
0.20	582	131	3	6	99.5%	4.4%	81.6%	0.897
0.318 ★	582	129	3	8	99.5%	5.8%	81.9%	0.898
0.30	582	129	3	8	99.5%	5.8%	81.9%	0.898
0.40	580	127	5	10	99.1%	7.3%	82.0%	0.898
0.50	576	125	9	12	98.5%	8.8%	82.2%	0.896
0.60	556	106	29	31	95.0%	22.6%	84.0%	0.892
0.70	324	43	261	94	55.4%	68.6%	88.3%	0.681
0.80	211	22	374	115	36.1%	83.9%	90.6%	0.516

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.10	585	137	0	0	100.0%	0.0%	81.0%	0.895
0.172 ★	585	136	0	1	100.0%	0.7%	81.1%	0.896
0.20	584	136	1	1	99.8%	0.7%	81.1%	0.895
0.30	582	136	3	1	99.5%	0.7%	81.1%	0.893
0.40	581	135	4	2	99.3%	1.5%	81.1%	0.893
0.50	576	130	9	7	98.5%	5.1%	81.6%	0.892
0.60	536	111	49	26	91.6%	19.0%	82.8%	0.870
0.70	479	89	106	48	81.9%	35.0%	84.3%	0.831
0.80	409	56	176	81	69.9%	59.1%	88.0%	0.779

Threshold	TP	FP	FN	TN	Sensitivity	Specificity	PPV	F1
0.082 ★	585	135	0	2	100.0%	1.5%	81.2%	0.897
0.10	582	134	3	3	99.5%	2.2%	81.3%	0.895
0.20	573	125	12	12	97.9%	8.8%	82.1%	0.893
0.30	563	120	22	17	96.2%	12.4%	82.4%	0.888
0.40	554	113	31	24	94.7%	17.5%	83.1%	0.885
0.50	542	100	43	37	92.6%	27.0%	84.4%	0.883
0.60	527	78	58	59	90.1%	43.1%	87.1%	0.886
0.70	498	61	87	76	85.1%	55.5%	89.1%	0.871
0.80	446	50	139	87	76.2%	63.5%	89.9%	0.825

Algorithm Descriptions

Classical Machine Learning

Logistic Regression

Linear model with L2 regularisation (C=0.5). Features scaled with StandardScaler. Optimal threshold 0.575 selected by maximum F1.

Key features: Lymphocytes, Calcium, Creatinine, Lactate, LDH. Fully interpretable — each coefficient directly quantifies feature contribution.

sklearn LogisticRegression · max_iter=1000 · random_state=42

Random Forest ★ Best AUC

Ensemble of 200 decision trees (max_depth=6, min_samples_leaf=5). Averages predicted probabilities across trees. Highest overall AUC (0.877).

Robust to missing values; no normalisation required. Key features: Calcium, D-dimer, LDH, Lactate, Hematocrit.

sklearn RandomForestClassifier · n_estimators=200 · max_depth=6

Gradient Boosting

150 shallow boosted trees (max_depth=3), learning_rate=0.05. Highest F1 score (0.918). Low threshold (0.35) achieves high sensitivity early on the ROC curve.

Key features: D-dimer, Calcium, LDH, Lactate, Hematocrit — nearly identical to RF, confirming feature stability.

sklearn GradientBoostingClassifier · n_estimators=150 · lr=0.05

Deep Learning (TensorFlow 2 / Keras)

MLP — 3-Layer Feedforward

Architecture: 256 → 128 → 64 → 1 sigmoid. Each hidden layer followed by BatchNormalization and Dropout (0.35/0.30/0.20). L2 regularisation on dense layers.

Early stopping on val_AUC (patience=8) converged at ~11 epochs per fold (range 8–16). AUC 0.836.

Adam lr=1e-3 · Optimal epochs≈11 (folds: 13/11/8/9/16) · Batch 32 · 5-fold CV

Residual MLP

Projection layer (128) followed by two residual blocks: Dense→BN→Dropout→Dense with skip connection. Lower LR (8e-4) chosen to stabilise skip-connection learning.

Early stopping converged at ~10 epochs per fold (range 4–17). Wider fold variance reflects sensitivity of residual depth to initialisation. AUC 0.804.

Adam lr=8e-4 · Optimal epochs≈10 (folds: 10/12/7/4/17) · Batch 32 · 5-fold CV

Attention MLP

Feature-wise sigmoid attention gate (Dense 106→106) scales each input before the downstream MLP (256→128→64→1). The gate weights are inspectable per sample.

Fastest convergence: early stopping fired at ~7 epochs per fold (range 2–14), indicating rapid feature selection learning. AUC 0.784.

Adam lr=1e-3 · Optimal epochs≈7 (folds: 9/2/6/5/14) · Batch 32 · 5-fold CV

LSTM / Recurrent Deep Learning

LSTM

Vanilla single-layer LSTM (64 units) followed by Dropout(0.3) and a Dense(32,relu) head. Features reshaped to (106,1) so each lab value is treated as one time step.

Early stopping on val_AUC converged at ~19 epochs per fold (range 10–33). AUC 0.699.

Adam lr=8e-4 · Optimal epochs≈19 (folds: 19/10/33/25/10) · Batch 32

Stacked LSTM

Two-layer stacked LSTM: LSTM(64, return_sequences) → Dropout(0.25) → LSTM(32) → Dropout(0.25) → Dense(32). Lower LR (5e-4) chosen to stabilise deeper gradient flow.

Converged at ~17 epochs per fold (range 4–25). AUC 0.705.

Adam lr=5e-4 · Optimal epochs≈17 (folds: 16/22/25/16/4) · Batch 32

Bidirectional LSTM

BiLSTM(64) processes the 106-length sequence in both forward and backward directions, concatenating the outputs (128-dim), followed by BatchNormalization and Dropout(0.3).

Highest AUC among vanilla LSTM variants (~0.715). Converged at ~22 epochs per fold (range 6–43). High fold variance due to dataset size.

Adam lr=8e-4 · Optimal epochs≈22 (folds: 40/6/43/12/7) · Batch 32

LSTM + Attention

Bahdanau-style additive attention: LSTM(64, return_sequences=True) produces (106,64) states; Dense(1,tanh) scores each step; Softmax(axis=1) normalises; weighted sum collapses to context vector.

Converged at ~27 epochs per fold (range 14–53). AUC 0.684.

Adam lr=8e-4 · Optimal epochs≈27 (folds: 16/19/53/14/31) · Batch 32

CNN-LSTM ★ Best Sequence Model

Conv1D(32,k=5,same) extracts local feature patterns → MaxPool1D(2) halves the sequence → Conv1D(64,k=3,same) abstracts higher-level motifs → LSTM(64) captures temporal order → Dropout(0.3) → Dense(32,relu).

Highest AUC among all LSTM variants (0.777). Converged at ~27 epochs per fold (range 12–43).

Adam lr=8e-4 · Optimal epochs≈27 (folds: 16/12/43/37/26) · Batch 32

Limitations: No external test set — all figures from cross-validation on the same Chinese cohort (n=722). Class imbalance: 585 severe vs 137 mild. Dataset sourced from a single institution and has not been validated on Western or multi-site cohorts. Deep learning models are sensitive to random seed and training variance; results may differ slightly on re-training.

11-Model ML, Deep Learning & LSTM Comparison

Performance Summary

Full Comparison Table

Confusion Matrices — Threshold Sweep

ROC Curves

Logistic Regression — AUC 0.817

Random Forest — AUC 0.877 ★

Gradient Boosting — AUC 0.874

MLP (3-layer) — AUC 0.836 · lr=1e-3 · ~11 epochs

Residual MLP — AUC 0.804 · lr=8e-4 · ~10 epochs

Attention MLP — AUC 0.784 · lr=1e-3 · ~7 epochs

LSTM — AUC 0.699 · lr=8e-4 · ~19 epochs

Stacked LSTM — AUC 0.705 · lr=5e-4 · ~17 epochs

Bidirectional LSTM — AUC 0.715 · lr=8e-4 · ~22 epochs

LSTM + Attention — AUC 0.684 · lr=8e-4 · ~27 epochs

CNN-LSTM ★ — AUC 0.777 · lr=8e-4 · ~27 epochs

Feature Importance

Algorithm Descriptions

Logistic Regression

Random Forest ★ Best AUC

Gradient Boosting

MLP — 3-Layer Feedforward

Residual MLP

Attention MLP

LSTM

Stacked LSTM

Bidirectional LSTM

LSTM + Attention

CNN-LSTM ★ Best Sequence Model