Natural language processing for clinical laboratory data repository systems: implementation and evaluation for respiratory viruses

Background — With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in natural language processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports.

Objective — In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning–based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system.

Methods — The NLP model, a hierarchical multilabel classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus includes 87,500 unique laboratory reports annotated by 8 subject matter experts (SMEs). The classification task involved assigning the laboratory reports to labels at 2 levels: 24 fine-grained labels in level 1 and 6 coarse-grained labels in level 2. A “label” also refers to the status of a specific virus or strain being tested or detected (eg, influenza A is detected). The model’s performance stability and variation were analyzed across all labels in the classification task. Additionally, the model’s generalizability was evaluated internally and externally on various test sets.

Results — Overall, the NLP model performed well on internal, out-of-time (pre–COVID-19), and external (different laboratories) test sets with microaveraged F1-scores >94% across all classes. Higher precision and recall scores with less variability were observed for the internal and pre–COVID-19 test sets. As expected, the model’s performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model’s performance (lowest F1-score of 57%) was noticeably lower in the detected cases.

Conclusions — We demonstrated that deep learning–based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.

View Source

Information

Citation

Dolatabadi E, Chen B, Buchan SA, Marchand-Austin A, Azimaee M, McGeer AJ, Mubareka S, Kwong JC. JMIR AI. 2023;2:e44835. Epub 2023 Jun 6.

View Source

Contributing ICES Scientists

Associated Topics

Data science

Discover More

Journal Article

10/06/2025

The impact of violation of the proportional hazards assumption on the calibration of the Cox proportional hazards model

Austin PC, Giardiello D. Stat Med. 2025; 44(13-14):e70161.

Journal Article

19/05/2025

Effect of single-entry referral models and team-based care on wait times for hip and knee joint replacement in Ontario: a simulation study

Seyedi P, Aleman D, Baxter N, Bell C, Bodur M, Calzara A, Campbell R, Carter M, de Jager P, Emerson S, Irish J, Martin D, Lee S, Persitz J, Saxe-Braithwaite M, Takata J, Varkul O, Yang S, Zanchetta C, Urbach D. CMAJ. 2025;197(19): E524-E531. Epub 2025 May 19.

Journal Article

15/05/2025

Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users

Meaney C, Guan J, Wang X, Stukel T. BMC Med Res Methodol. 2025;25(1): 134. Epub 2025 May 15.

See All