Respiratory Research：开发并多数据库验证用于预测肺炎患者院内死亡率的可解释机器学习模型：一项横跨四个医疗系统的综合分析

Abstract

Background: Existing machine learning studies for pneumonia mortality prediction are limited by small sample sizes, single-center designs, and lack of comprehensive external validation across diverse healthcare systems. No previous study has systematically validated machine learning models across multiple large-scale databases for pneumonia mortality prediction.

背景：现有肺炎死亡预测的机器学习研究受限于小样本量、单中心设计及缺乏跨多元医疗体系的全面外部验证。既往未有研究系统性地通过多个大规模数据库对肺炎死亡预测的机器学习模型进行验证。

Methods: This retrospective multicenter study utilized four large-scale databases to develop and validate machine learning models for predicting in-hospital mortality in pneumonia patients. MIMIC-IV served as the primary training dataset (9,410 patients), with external validation on MIMIC-III (2,487 patients), eICU (13,541 patients), and an in-house multicenter prospective cohort from fudan university (345 patients). Five algorithms were implemented: Random Forest, XGBoost, Logistic Regression, LASSO, and Support Vector Machine. Feature selection used the Boruta algorithm across 21 variables. Model interpretability was assessed using SHAP analysis.

方法：这项回顾性多中心研究利用四个大规模数据库开发和验证预测肺炎患者院内死亡的机器学习模型。以MIMIC-IV为主要训练数据集（9,410例患者），并在MIMIC-III（2,487例）、eICU（13,541例）以及复旦大学自建的多中心前瞻性队列（345例）中进行外部验证。采用五种算法：随机森林、XGBoost、逻辑回归、LASSO和支持向量机。特征选择使用Boruta算法，涵盖21个变量。模型可解释性通过SHAP分析评估。

Results: The cohort comprised 25,783 pneumonia patients with mortality rates of 17.1%-38.3% across databases. Nine consistently important features were identified: age, diastolic blood pressure, heart rate, temperature, respiratory rate, creatinine, blood urea nitrogen, platelet count, and white blood cell count. XGBoost achieved optimal performance with training AUC 0.747 (95% CI: 0.733-0.761) and robust external validation AUCs of 0.672 (MIMIC-IV testing), 0.670 (MIMIC-III), 0.695 (eICU), and 0.653 (FAHZU). SHAP analysis revealed platelet count as the most influential predictor, followed by blood urea nitrogen and age.

结果：该队列包含25,783例肺炎患者，各数据库死亡率介于17.1%至38.3%之间。识别出九个始终重要的特征：年龄、舒张压、心率、体温、呼吸频率、肌酐、血尿素氮、血小板计数和白细胞计数。XGBoost表现最优，训练AUC为0.747（95% CI：0.733–0.761），并在多个外部验证中表现稳健，AUC分别为：MIMIC-IV测试集0.672、MIMIC-III 0.670、eICU 0.695、FAHZU 0.653。SHAP分析显示血小板计数是最具影响力的预测因子，其次是血尿素氮和年龄。

Conclusions: This study represents the first comprehensive multi-database validation of machine learning models for pneumonia mortality prediction, demonstrating superior performance compared to traditional scoring systems. The XGBoost model with SHAP interpretability provides a robust tool for clinical decision support, with consistent validation across four databases including our in-house prospective cohort.

结论：本研究首次实现了通过多个大规模数据库对肺炎死亡预测机器学习模型的全面验证，其表现优于传统评分系统。具有SHAP可解释性的XGBoost模型为临床决策支持提供了有力工具，并在包括我们自建前瞻性队列在内的四个数据库中得到一致验证。

原创文章（本站视频密码：66668888），作者：xujunzju，如若转载，请注明出处：https://zyicu.cn/?p=21105

Respiratory Research：开发并多数据库验证用于预测肺炎患者院内死亡率的可解释机器学习模型：一项横跨四个医疗系统的综合分析

Abstract

发表回复

邮箱：

xujunzju@gmail.com

公众号：

xujunzju6174

Respiratory Research：开发并多数据库验证用于预测肺炎患者院内死亡率的可解释机器学习模型：一项横跨四个医疗系统的综合分析

Abstract

相关推荐

重装eicu数据库(演示版本)

数据库研究如何更易发表

ICU数据库：AmsterdamUMCdb

发表回复

邮箱：

xujunzju@gmail.com

公众号：

xujunzju6174