Development of a deep learning model for predicting critical events in a pediatric intensive care unit

Article information

Acute Crit Care. 2024;39(1):186-191

Publication date (electronic) : 2024 February 20

doi : https://doi.org/10.4266/acc.2023.01424

In Kyung Lee ¹

, Bongjin Lee^,²^,³

, June Dong Park ²

¹Department of Pediatrics, Seoul St. Mary’s Hospital, Seoul, Korea

²Department of Pediatrics, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea

³Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, Korea

Corresponding author: Bongjin Lee Department of Pediatrics, Seoul National University Hospital, Seoul National University College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Korea Tel: +82-2-2072-3682 E-mail: pedbjl@snu.ac.kr

Received 2023 November 3; Revised 2023 December 3; Accepted 2024 January 1.

Abstract

Background:

Identifying critically ill patients at risk of cardiac arrest is important because it offers the opportunity for early intervention and increased survival. The aim of this study was to develop a deep learning model to predict critical events, such as cardiopulmonary resuscitation or mortality.

Methods:

This retrospective observational study was conducted at a tertiary university hospital. All patients younger than 18 years who were admitted to the pediatric intensive care unit from January 2010 to May 2023 were included. The main outcome was prediction performance of the deep learning model at forecasting critical events. Long short-term memory was used as a deep learning algorithm. The five-fold cross validation method was employed for model learning and testing.

Results:

Among the vital sign measurements collected during the study period, 11,660 measurements were used to develop the model after preprocessing; 1,060 of these data points were measurements that corresponded to critical events. The prediction performance of the model was the area under the receiver operating characteristic curve (95% confidence interval) of 0.988 (0.975–1.000), and the area under the precision-recall curve was 0.862 (0.700–1.000).

Conclusions:

The performance of the developed model at predicting critical events was excellent. However, follow-up research is needed for external validation.

Keywords: cardiac arrest; cardiopulmonary resuscitation; machine learning; mortality; pediatric; intensive care unit

INTRODUCTION

Effective prediction and early identification of critical events in pediatric patients within the intensive care setting is a vital aspect of improving patient outcomes. Critical events encompass a range of significant and potentially life-threatening occurrences that require immediate intervention, such as cardiopulmonary resuscitation (CPR) [1], while cardiac arrest specifically refers to cessation of cardiac activity. Both critical events and cardiac arrests pose substantial challenges and carry high risks among children in intensive care units (ICUs) [2,3]. Timely recognition and intervention during these critical events are crucial for enhancing patient survival and minimizing long-term morbidity [4,5].

However, accurately predicting critical events in children is challenging due to variations in vital signs according to age. Existing clinical scoring tools, including the Pediatric Risk of Mortality III (PRISM-III) [6] and the Pediatric Index of Mortality 3 (PIM3) [7], offer severity assessments of pediatric patients primarily at the time of admission. While these clinical scores are effective at estimating severity upon admission, it is also crucial to accurately predict severity during hospitalization. Machine learning models have emerged as valuable tools for this purpose and effectively capture vital sign changes throughout the course of ICU stays. Machine learning, a branch of artificial intelligence, encompasses the creation of computer algorithms designed to learn and make predictions based on data. These algorithms analyze data to discern patterns and relationships, enabling them to make informed decisions when presented with new data [8].

These machine learning models have been extensively explored for their ability to predict mortality in adult ICU settings [9,10], which has left a notable gap in the development of well-established models tailored specifically to pediatric critical care. While some studies have developed mortality prediction during pediatric ICU (PICU) stays [11], predicting critical events holds greater importance. Predicting critical events allows for timely interventions, which may further improve patient prognosis. There remains a shortage of machine learning models that distinguish between critical events and mortality in pediatric critical care.

Therefore, the primary aim of our study was to develop a comprehensive and validated machine learning model specifically designed to predict critical events in a tertiary PICU setting. Our model aims to provide precise predictions and early identification of such critical events in pediatric patients. As a secondary objective, we also sought to evaluate the predictive performance of the model.

MATERIALS AND METHODS

Study Setting and Data Source

This retrospective observational study was conducted in a 24-bed PICU at a tertiary children’s hospital. Patients younger than 18 years who were admitted to this PICU between January 2010 and May 2023 were eligible for inclusion. The data used in this study were obtained from the clinical data warehouse of the hospital’s electronic health records. The protocol and contents of this study were reviewed by the Institutional Review Board of Seoul National University Hospital, and exemption from written consent was approved for the participants due to the retrospective nature of this study (No. H-1408-101-605).

Data Collection and Preprocessing

The demographic data of patients (such as age and sex); their vital signs of systolic blood pressure (SBP), diastolic blood pressure (DBP), heart rate (HR), respiratory rate (RR), body temperature (BT), and oxygen saturation (SpO₂) measured during PICU admission; and the measurement times were collected. Additionally, CPR or mortality events and their times of occurrence were collected.

Among the obtained vital signs, those that were non-physiologic and therefore suspected of being a keystroke error during the input process were excluded (SBP >300 mm Hg, DBP > SBP, HR >300 beats/min, and RR >120 breaths/min). For SBP, DBP, HR, and RR, whose normal ranges change with age, the data were analyzed using z-scores according to age rather than their individual values [12,13]. The z-score calculation used distribution curves and charts of vital signs by age that were derived from a previous study [14]. The time interval from the previous measurement of vital signs was defined as the measurement interval. A critical event was defined as CPR or mortality (without CPR). Additionally, the last 20 measurements obtained before a critical event occurred were defined as the critical group; in cases where a critical event did not occur, the last 20 measurements before PICU discharge were defined as the non-critical group. R software, version 4.3.1 (R Foundation for Statistical Computing, https://www.r-project.org), was used in this process, and the open packages of the generalized additive models for location scale and shape and sitar were used in the z-score calculation process [15-17].

Outcome

The main outcome of this study was how well the developed prediction model predicted critical events. The area under the receiver operating characteristic curve (AUROC; A) and the area under the precision-recall curve (AUPRC, B) were used to evaluate the model’s predictive performance.

Model Development and Test

Considering that critical events are not frequent, it was expected that the size of the non-critical group would be much larger than that of the critical group. Because excessively imbalanced datasets can cause bias in model development and performance evaluation, random undersampling was performed with a ratio of the critical group to the non-critical group of 1:10. Model development and testing were performed on the entire dataset using a five-fold cross validation method. This approach was used to separate the training dataset and the test dataset and to minimize the distortion of the results that can occur by dividing the training and the test set by specific splits. Here, the data were divided into five splits; learning occurs in four of them, while testing takes place in the remaining one, and the test split is performed five times without overlap.

Long short-term memory (LSTM) was used as the deep learning algorithm, and the z-scores for SBP, DBP, HR, and RR as well as BT, SpO₂, measurement interval, age (in months), and sex composed the input layer. The model had 128 hidden layers and was trained for 1,000 epochs at a learning rate of 0.001. Python 3.8 software was used in the LSTM model training process, and open libraries such as PyTorch and scikit-learn were employed [18,19].

RESULTS

Baseline Characteristics

From January 2010 to May 2023, 9,161 patients were included in this study, and the number of PICU hospitalizations was 13,185. Among the 13,185 hospitalized cases, the total number of vital signs was 849,334. Because of the imbalance in the dataset, which is illustrated by the number of critical events of 6,816 and the number of non-critical events of 842,518, the vital signs for each group were sorted in order of measurement time; only the last 20 measurements were selected for analysis. After random undersampling at a 1:10 ratio, 10,600 non-critical cases and 1,060 critical cases were derived from the data (Figure 1).

Figure 1.

Participant flowchart. PICU: pediatric intensive care unit.

Among patients with critical events, the median age was 10.5 months (3.0–53.0 months), while those without critical events had a median age of 35.0 months (5.0–92.0 months). Demographic and clinical characteristics, including sex; age; z-scores of SBP, DBP, HR, RR, BT, and SpO₂; and the time interval between vital sign measurements and critical events or discharge, are summarized in Table 1. Of the 1,060 critical cases examined, 44.8% of vital sign measurements were performed within 24 hours before CPR, while 55.2% were obtained within 24 hours prior to patient mortality.

Table 1.

Demographic characteristics of patients in each group and median values of vital signs before the critical event or discharge in each group

Main Outcome

The AUROC and AUPRC values were used to evaluate the performance of the critical event prediction model developed using this study. The AUROC (95% confidence interval [CI]) was excellent at 0.988 (0.975–1.000), and the AUPRC (95% CI) was also very good at 0.862 (0.700–1.000) (Figure 2). Figure 3 shows the loss values during the training process of five folds over 1,000 epochs of training.

Figure 2.

The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of the long short-term memory network model. Five-fold cross validation was performed on the developed prediction model, and each fold is displayed in a different color. CI: confidence interval.

Figure 3.

Line plots of the loss values over several training epochs. The epoch number was 10,000, and the learning rate was 0.001. Five-fold cross validation was performed on the developed prediction model, and each fold is displayed in a different color.

DISCUSSION

In this study, we developed an LTSM model to predict critical events in the PICU. The model demonstrated very good predictive performance. These results indicate the effectiveness of our model in identifying pediatric patients at risk of critical events, including cardiac arrest. Our deep learning model displayed superior predictive performance compared to previous studies in the PICU setting. For instance, a study that used machine learning to predict cardiac arrest in the PICU achieved a sensitivity of 71% and a specificity of 69% at one hour before the event [20]. Our LTSM model outperformed that study, yielding a higher AUROC value and indicating improved predictive accuracy. Another study focused on time series analysis to predict cardiac arrest in the PICU, where the best model achieved an accuracy of 94% and an AUROC of 98% [21]. However, the model in that study was associated with an unacceptably high false alarm rate, while our model showed a high AUPRC value of 0.862.

While some studies have specifically developed machine learning models for predicting pediatric mortality [11,22], the focus of our model was to predict critical events. This emphasis on critical events is crucial because predicting mortality alone may not fully capture the significance of such events, as it excludes patients who survive after receiving CPR. Furthermore, our model addressed the limitations of existing clinical scores, such as the PRISM-III and PIM3 [6,7], by capturing vital sign changes throughout PICU hospitalization periods. By incorporating changes in vital signs, our model offers a more comprehensive approach to predicting critical events in pediatric patients.

Early identification of patients at risk allows prompt intervention, leading to increased chances of successful resuscitation and reduced morbidity [23,24]. By leveraging the predictive capabilities of our machine learning model, clinicians can implement proactive monitoring and interventions, ultimately improving patient outcomes and resource utilization in the PICU setting. By precisely predicting critical events, our model enables timely interventions, enhanced patient monitoring, and optimized resource allocation. However, the implementation of such a model in real-world clinical practice poses challenges, such as ensuring interpretability and trust in the model’s outputs, as well as integrating it into existing clinical workflows. Future research should focus on strategies to enhance the interpretability and transparency of the model and evaluate its practical implementation in diverse healthcare settings.

It is important to acknowledge the limitations of our study. First, the retrospective, single-center design and the specific patient population of the tertiary PICU may limit the generalizability of our findings to other PICUs. Additionally, our model focused on vital sign data and did not incorporate other potential contributing factors to critical events, such as laboratory results or mental status. The use of random undersampling to address data imbalance may introduce biases, including information loss bias, potential inaccuracies in representing the original data distribution, and possible instability in the model. Future studies should address these limitations by conducting prospective, multi-center investigations and exploring the inclusion of additional clinical variables. Furthermore, implementation of our machine learning model in real-time and in different clinical contexts, such as the emergency department and general inpatient ward, warrants further investigation.

In conclusion, our study demonstrated the potential of machine learning models for effectively predicting critical events in the PICU. By incorporating fluctuations in vital signs throughout the PICU hospitalization stay, our model offers improved predictive accuracy compared to existing clinical scores and previous studies. Future research is warranted to validate the performance of our model in diverse clinical settings and to explore its potential impact on clinical decision-making and patient outcomes.

KEY MESSAGES

▪ Our deep learning model excels at predicting critical events and enhancing early interventions and patient outcomes in the pediatric intensive care unit (PICU).

▪ This research bridges a crucial gap in pediatric critical care, offering a precise tool to distinguish between critical events and mortality, which should ultimately improve patient prognosis.

▪ The clinical impact of our model lies in its potential to revolutionize PICU care by empowering clinicians with accurate predictions and timely interventions.

Notes

CONFLICT OF INTEREST

No potential conflict of interest relevant to this article was reported.

FUNDING

This study was conducted with financial support from Seoul National University Hospital (Grant no. 0420202130).

AUTHOR CONTRIBUTIONS

Conceptualization: BL. Methodology: BL. Formal analysis: IKL, BL. Data curation: IKL, BL. Visualization: BL. Project administration: BL. Funding acquisition: BL. Writing–original draft: IKL. Writing–review & editing: BL, JDP. All authors read and agreed to the published version of the manuscript.

ACKNOWLEDGMENTS

None.

References

1. Buckley TA, Short TG, Rowbottom YM, Oh TE. Critical incident reporting in the intensive care unit. Anaesthesia 1997;52:403–9.

2. Berg RA, Nadkarni VM, Clark AE, Moler F, Meert K, Harrison RE, et al. Incidence and outcomes of cardiopulmonary resuscitation in PICUs. Crit Care Med 2016;44:798–808.

3. Girotra S, Nallamothu BK, Spertus JA, Li Y, Krumholz HM, Chan PS, et al. Trends in survival after in-hospital cardiac arrest. N Engl J Med 2012;367:1912–20.

4. Sandroni C, Nolan J, Cavallaro F, Antonelli M. In-hospital cardiac arrest: incidence, prognosis and possible measures to improve survival. Intensive Care Med 2007;33:237–45.

5. Thorén A, Rawshani A, Herlitz J, Engdahl J, Kahan T, Gustafsson L, et al. ECG-monitoring of in-hospital cardiac arrest and factors associated with survival. Resuscitation 2020;150:130–8.

6. Pollack MM, Patel KM, Ruttimann UE. PRISM III: an updated Pediatric Risk of Mortality score. Crit Care Med 1996;24:743–52.

7. Straney L, Clements A, Parslow RC, Pearson G, Shann F, Alexander J, et al. Paediatric index of mortality 3: an updated model for predicting mortality in pediatric intensive care. Pediatr Crit Care Med 2013;14:673–81.

8. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol 2020;9:14.

9. Meiring C, Dixit A, Harris S, MacCallum NS, Brealey DA, Watkinson PJ, et al. Optimal intensive care outcome prediction over time using machine learning. PLoS One 2018;13e0206862.

10. Delahanty RJ, Kaufman D, Jones SS. Development and evaluation of an automated machine learning algorithm for in-hospital mortality risk adjustment among critical care patients. Crit Care Med 2018;46:e481–8.

11. Lee B, Kim K, Hwang H, Kim YS, Chung EH, Yoon JS, et al. Development of a machine learning model for predicting pediatric mortality in the early stages of intensive care unit admission. Sci Rep 2021;11:1263.

12. Fleming S, Thompson M, Stevens R, Heneghan C, Plüddemann A, Maconochie I, et al. Normal ranges of heart rate and respiratory rate in children from birth to 18 years of age: a systematic review of observational studies. Lancet 2011;377:1011–8.

13. Bae W, Kim K, Lee B. Distribution of pediatric vital signs in the emergency department: a nationwide study. Children (Basel) 2020;7:89.

14. Hwang S, Lee B. Machine learning-based prediction of critical illness in children visiting the emergency department. PLoS One 2022;17e0264184.

15. Rigby RA, Stasinopoulos DM. Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Stat Med 2004;23:3053–76.

16. Rigby RA, Stasinopoulos DM. Automatic smoothing parameter selection in GAMLSS with an application to centile estimation. Stat Methods Med Res 2014;23:318–32.

17. Cole TJ, Donaldson MD, Ben-Shlomo Y. SITAR: a useful instrument for growth curve analysis. Int J Epidemiol 2010;39:1558–66.

18. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. Pytorch: an imperative style, high-performance deep learning library. arXiv [Preprint] 2019;[cited 2024 Jan 15]. Available from: https://doi.org/10.48550/arXiv.1912.01703.

19. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–30.

20. Matam BR, Duncan H, Lowe D. Machine learning based framework to predict cardiac arrests in a paediatric intensive care unit: prediction of cardiac arrests. J Clin Monit Comput 2019;33:713–24.

21. Kennedy CE, Aoki N, Mariscalco M, Turley JP. Using time series analysis to predict cardiac arrest in a PICU. Pediatr Crit Care Med 2015;16:e332–9.

22. Aczon MD, Ledbetter DR, Laksana E, Ho LV, Wetzel RC. Continuous prediction of mortality in the PICU: a recurrent neural network model in a single-center dataset. Pediatr Crit Care Med 2021;22:519–29.

23. Rust LO, Gorham TJ, Bambach S, Bode RS, Maa T, Hoffman JM, et al. The deterioration risk index: developing and piloting a machine learning algorithm to reduce pediatric inpatient deterioration. Pediatr Crit Care Med 2023;24:322–33.

24. Dewan M, Soberano B, Sosa T, Zackoff M, Hagedorn P, Brady PW, et al. Assessment of a situation awareness quality improvement intervention to reduce cardiac arrests in the PICU. Pediatr Crit Care Med 2022;23:4–12.

Article information Continued

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1.

Demographic characteristics of patients in each group and median values of vital signs before the critical event or discharge in each group

Variable	Non-critical event (n=10,600)	Critical event (n=1,060)
Age (mo)	11 (3 to 53)	35 (5 to 92)
Sex
Female	5,040 (47.5)	380 (35.8)
Male	5,560 (52.5)	680 (64.2)
CPR	0	475 (44.8)
Mortality	0	585 (55.2)
Z-score of SBP by age	–0.3 (–0.7 to 0.2)	–1.0 (–1.5 to –0.4)
Z-score of DBP by age	0 (–0.5 to 0.6)	–0.6 (–1.2 to 0.1)
Z-score of HR by age	–0.7 (–1.3 to –0.3)	–0.3 (–1.0 to 0.3)
Body temperature (°C)	36.9 (36.6 to 37.3)	36.3 (35.8 to 36.8)
SpO₂ (%)	100.0 (97.0 to 100.0)	94.0 (87.0 to 98.0)
Measurement interval (min)	120.0 (120.0 to 120.0)	120.0 (60.0 to 120.0)

Values are presented as median (interquartile range) or number (%).

CPR: cardiopulmonary resuscitation; SBP: systolic blood pressure; DBP: diastolic blood pressure; HR: heart rate; SpO₂: pulse oxygen saturation.