
We demonstrated that "continual learning," a method where AI models are trained sequentially without sharing hospital data, improves the accuracy of endotracheal tube placement detection by overcoming environmental differences between medical sites.
Paper
International Retrospective Observational Study of Continual Learning for AI on Endotracheal Tube Placement from Chest Radiographs
NEJM AI
https://doi.org/10.1056/AIoa2500522
Author's Comments
We were invited by Dr. Rajpurkar of Harvard Medical School to collaborate on this study, participating as the only institution from Japan. I initially did not imagine this would become such a large-scale project involving so many joint researchers, but collaborating with researchers around the world has been a wonderful experience, and I look forward to further cooperation. I am particularly grateful to our collaborators Wendy, Cassandra, and Jennifer for their incredibly smooth communication. We have other collaborations in progress with them, so more results will gradually be released. For our next project, we definitely hope to be involved as lead researchers and drive the study forward.
Paper Overview
Medical Artificial Intelligence (AI) models often face a challenge: even if they show high performance during development, their accuracy can drop when deployed in a new hospital. This is because environments, such as patient demographics and X-ray equipment, differ from hospital to hospital. In this study, we verified the effectiveness of "continual learning," a method where the AI model alone travels sequentially from hospital to hospital to accumulate learning, without externalizing each hospital's data. The task involved determining whether an endotracheal tube is in the appropriate position based on chest X-ray images taken in intensive care units. This became a large-scale international collaborative study involving 23 hospitals across 12 countries and 5 continents.
Paper Details
This study analyzed chest X-ray images of 2,313 adult patients, primarily from 2021. For validation, we compared three approaches: the "original model" from development, a "fine-tuning model" retrained only on the deployment hospital's data, and a "continual learning model" trained by visiting multiple hospitals in sequence. Accuracy was evaluated by measuring the error (average error) between the AI's prediction and the distance from the tube tip to the carina as determined by radiologists.
As a result, while the original model had an average error of 16.39 mm, the fine-tuning model, which was adjusted individually at each hospital, improved to 12.49 mm. Furthermore, the proposed continual learning model achieved the highest accuracy with an average error of 10.58 mm. Notably, this result surpassed traditional methods even though each hospital provided only about 50 images for training. This suggests that even with limited data, incorporating continual learning enables AI models to adapt to diverse clinical environments and acquire more generalized performance.