Abstract
Background Lymphangioleiomyomatosis (LAM) is a rare multisystem disease with variable clinical manifestations and differing rates of progression that make management decisions and giving prognostic advice difficult. We used machine learning to identify clusters of associated features which could be used to stratify patients and predict outcomes in individuals.
Patients and methods Using unsupervised machine learning we generated patient clusters using data from 173 women with LAM from the UK and 186 replication subjects from the US National Heart, Lung, and Blood Institute (NHLBI) LAM registry. Prospective outcomes were associated with cluster results.
Results Two- and three-cluster models were developed. A three-cluster model separated a large group of subjects presenting with dyspnoea or pneumothorax from a second cluster with a high prevalence of angiomyolipoma symptoms (p=0.0001) and tuberous sclerosis complex (TSC) (p=0.041). Patients in the third cluster were older, never presented with dyspnoea or pneumothorax (p=0.0001) and had better lung function. Similar clusters were reproduced in the NHLBI cohort. Assigning patients to clusters predicted prospective outcomes: in a two-cluster model the future risk of pneumothorax was 3.3 (95% CI 1.7–5.6)-fold greater in cluster 1 than cluster 2 (p=0.0002). Using the three-cluster model, the need for intervention for angiomyolipoma was lower in clusters 2 and 3 than cluster 1 (p<0.00001). In the NHLBI cohort, the incidence of death or lung transplant was much lower in clusters 2 and 3 (p=0.0045).
Conclusions Machine learning has identified clinically relevant clusters associated with complications and outcome. Assigning individuals to clusters could improve decision making and prognostic information for patients.
Abstract
Using machine learning, simple clinical information from women with LAM can be used to group individuals into clusters. Clusters have differing clinical features, levels of complications and survival, and may improve personalised care for LAM. https://bit.ly/2UVanYV
Introduction
Lymphangioleiomyomatosis (LAM) is a rare multisystem disease that occurs both sporadically and in those with tuberous sclerosis complex (TSC) [1]. The prevalence of LAM is estimated to be less than 1 per 100 000 women [2], and the diagnosis of an orphan disease is frequently difficult for patients due to feelings of isolation and uncertainty over their prognosis and future disease manifestations [3]. This is particularly true for LAM where both the clinical manifestations and rates of disease progression vary. Although all patients have lung cysts, only 70% have pneumothorax [4, 5]. Half of women with sporadic LAM and almost all with TSC-LAM have angiomyolipomas, a proportion of which enlarge and are at risk of haemorrhage [6]. Around 20% of patients have significant lymphatic disease [7]. Prognosis can be difficult to predict as some patients have well-preserved lung function long term, while others require lung transplantation within a decade of diagnosis.
There are few predictive markers of outcome in LAM. Oestrogen is thought to contribute to disease progression [8–10] and pre-menopausal status is associated with more rapid loss of lung function [10, 11]. High levels of the lymphangiogenic growth factor, vascular endothelial growth factor type D (VEGF-D), and the presence of bronchodilator reversibility are associated with more rapid loss of forced expiratory volume in 1 s (FEV1) in some studies [12, 13], and genetic variants in vitamin D binding protein are associated with shorter survival [14]. Smaller studies have reported other features that are associated with outcome, including mode of presentation and initial lung function, although all of these associations lack predictive power in individual subjects [15, 16]. Uncertainty around disease progression and complications can worry patients, and lead to restrictive lifestyle changes and an unselective approach to management, with many patients given unnecessarily pessimistic advice [17, 18].
We hypothesised that groups of clinical features preferentially cluster together, and that identifying these associations would improve prediction of complications and outcomes. We used machine learning to associate biological and physiological variables in two national cohorts with the aim of identifying subphenotypes within the LAM population that could be used to predict disease manifestations and improve clinical advice.
Methods
The clinical cohorts, variables and analysis are described fully in the supplementary material.
Subjects and clinical data
The discovery cohort comprised 173 women recruited at the UK National Centre for LAM (Nottingham, UK) between 2011 and 2018 (figure 1). All subjects had LAM defined by American Thoracic Society/Japanese Respiratory Society criteria [19]. A further 10 women were added after the discovery analysis until December 2019. All patients attending the Centre were invited to participate and measurements were made as part of clinical care. At their first visit, which formed the baseline assessment, subjects had computed tomography (CT) of the chest, abdomen and pelvis, screening for TSC, lung function tests, bronchodilator reversibility testing and a 6-min walk test according to European Respiratory Society/American Thoracic Society standards [20]. CT was used to screen for angiomyolipoma and lymphatic disease, the latter defined as the presence of lymphatic enlargement, chylous pleural effusion or ascites. Review appointments were scheduled according to clinical need and at least annually; complications were recorded, FEV1 and transfer factor of the lung for carbon monoxide (TLCO) were repeated, and angiomyolipoma size monitored according to a defined protocol [21]. The East Midlands Research Ethics Committee approved the study (REC13/EM/0264) and participants gave written informed consent.
The replication cohort comprised 186 subjects recruited between 1998 and 2003 to the US National Heart, Lung, and Blood Institute (NHLBI) Registry study on the natural history of LAM (figure 1) [7]. Clinical and serial lung function data were obtained from the National Disease Research Interchange (Philadelphia, PA, USA). All-cause mortality and lung transplantation data for the period until December 2010, prior to the use of rapamycin, were obtained from the US National Death Index and the United Network for Organ Sharing databases, respectively.
Cluster assignment was performed using data from the baseline visit (table 1) and outcomes assessed prospectively from this point. Survival is quoted as overall time since diagnosis. Change in lung function was calculated as the slope of all FEV1 (ΔFEV1) or TLCO (ΔTLCO) values [22].
Machine learning methodology
The workflow is summarised in figure 2 and described in detail in the supplementary material. Briefly, the dataset was pre-processed, cleaned and checked for validity. Imputation of missing data was performed using multiple imputation by chain equations (MICE), random forest (RF) and MICE+RF. Cluster analysis using multiple algorithms was repeated five times to ensure cluster stability and 42 internal cluster validation schemes were applied to determine the optimal number of clusters. We identified the smallest number of variables necessary to classify women with LAM into clusters based on feature selection schemes including recursive feature elimination, correlation-based feature detection, maximum relevance minimum redundancy and bivariate statistical tests. Five classification algorithms, i.e. RF, decision tree (CART, C4.5 and C5.0) and naive Bayes, were used to develop models for classifying subjects into clusters. Five-fold cross-validation repeated for 10 runs was used when identifying markers and developing classification models. The analysis was carried out using R (www.r-project.org). The clustering algorithms are available at www.github.com/nmpn/lam-stratification.
Statistical analysis
Data were tested for normality using the Shapiro–Wilk test. Parametric data were analysed using the unpaired two-tailed t-test or one-way ANOVA and nonparametric data were analysed using Kruskal–Wallis or Mann–Whitney tests. Categorical data were analysed by the Chi-squared test or Fisher's test. Kolmogorov–Smirnov tests were used to determine whether two datasets had different distributions. Survival analysis was performed using Kaplan–Meier analysis and the Mantel–Cox test. Data were analysed in Excel (Microsoft, Redmond, WA, USA) and Prism version 7.03 (GraphPad, San Diego, CA, USA).
Results
Cluster model development
Complete demographic, presentation and phenotype data were available for all discovery cohort subjects, and treatment, disease activity and oestrogen exposure for >90%. Serum VEGF-D and bronchodilator response data were available for 74% and 61% of subjects, respectively (table 1). Data distribution of missing variables imputed using MICE, RF and MICE+RF did not differ from the original distributions; data imputed from MICE was used (supplementary figure S1).
Two clusters provided optimal separation of factors between groups by majority voting (figure 2 and supplementary table S1). Three clusters also proved clinically useful. Of the five machine learning techniques using five-fold cross-validation repeated 10 times, naive Bayes delivered the strongest accuracy (0.98, 95% CI 0.9502–0.9964), sensitivity (1.0) and specificity (0.96) for cluster assignment, and was used henceforth (supplementary table S2 and figure S2). Three classification models were developed, two comprised of two clusters and one of three clusters. The initial two-cluster model was based on multiple clustering algorithms, with variables based on feature selection techniques. The alternative two-cluster model used multiple clustering algorithms, with variables based on statistical tests. While both models produced similar groupings, the latter separated subjects using fewer terms, was more effective at predicting complications and is reported henceforth. The three-cluster model was based on hierarchy and k-means, with selected variables based on statistics comparing clusters. Subjects were assigned to the cluster for which the output probability was between 0.5 and 1.0.
Two-cluster model
13 input variables divided subjects into clusters comprising 51% and 49% of the discovery cohort (table 2). The most informative factors discriminating clusters were age at first LAM symptom (p=7.6×10−7), age at assessment (p=4×10−14), presentation with dyspnoea (p=0.00001), pneumothorax (p=0.00001), angiomyolipoma (p=0.00001) or as a chance finding (p=0.00001), ever experiencing pneumothorax (p=0.00001) or angiomyolipoma (p=0.00017) and baseline TLCO (p=0.0097) (supplementary figure S3). Cluster 1 was comprised of younger women with earlier onset disease, predominantly presenting with pneumothorax or angiomyolipoma that had often required intervention, whereas lymphatic manifestations were uncommon. Subjects in cluster 2 were on average 10 years older, tended to present with dyspnoea, and had more lymphatic complications and larger defects in gas transfer (lower TLCO and post-walk arterial oxygen saturation). Pneumothorax was infrequent and although many had angiomyolipomas, these seldom required intervention (table 2, supplementary tables S3 and S4, and supplementary figure S4).
Three-cluster model
In the three-cluster system, cluster 1 comprised 69% of subjects who were most likely to present with dyspnoea or pneumothorax and had moderately impaired lung function. Cluster 2 comprised 22% who very commonly presented with angiomyolipoma-related problems, rather than respiratory symptoms, a higher prevalence of TSC and better lung function than cluster 1. Cluster 3 comprised only 9% of subjects who were older at presentation with more recent symptom onset which comprised respiratory symptoms other than breathlessness or pneumothorax, or without LAM symptoms after investigations for other issues. Pneumothorax was very infrequent and lung function almost normal (table 3, figure 3, supplementary figure S3, and supplementary tables S5 and S6).
Cluster validation
To determine if these clusters could be reproduced in other populations, we used subjects recruited in a different country and time period from the discovery cohort. The NHLBI cohort was slightly younger with better lung function than the UK cohort, angiomyolipoma was less common, although other clinical characteristics were similar and age at diagnosis was used in place of age at first symptom. Applying the algorithm without imputation of missing data reproduced both models with a similar level of differentiation other than for angiomyolipoma (figure 4, and supplementary tables S7 and S8).
The effect of missing data on cluster assignment was examined by running the clustering algorithm with single factors omitted. Running the three-cluster model using 112 UK subjects for whom all factors were available was compared with sequential removal of each factor. Omission of factors resulted in misclassifications in a median (range) of 0.7% (0–7.1%) of subjects in cluster 1, 5.4% (0–38%) of subjects in cluster 2 and 8.3% (0–17%) of subjects in cluster 3. The chance of misclassification was greater where the original clustering probability was closer to 0.5 than 1.0 and with omission of factors with the greatest contribution to cluster separation, such as age at first symptom (figure 4, and supplementary figures S5 and S6).
Association of clusters with clinical outcomes
To determine if the models could be used to predict outcomes, we examined lung function decline and disease-related complications prospectively from the point of cluster assignment. Survival was assessed from diagnosis. As rapamycin reduces lung function decline, rapamycin-treated and untreated subjects were examined separately. Serial lung function data spanning a mean±sd of 54±36 and 38±17 months were available for 112 UK and 174 NHLBI subjects, respectively, who had not received rapamycin, and for 81 UK subjects treated with rapamycin for 45±30 months. There were no significant differences between clusters in rate of loss of FEV1 or TLCO using either model for untreated or rapamycin-treated subjects (figure 5a and b, and supplementary tables S9 and S10).
UK subjects are screened for angiomyolipoma at baseline and tumours monitored using a standardised protocol [21]. Risk of angiomyolipoma intervention was examined irrespective of treatment with rapamycin. Using the two-cluster model, risk of intervention was 0.059 per patient-year after assignment to cluster 1 and 0.025 per patient-year after assignment to cluster 2 (p<0.00001). In the three-cluster model, despite a high prevalence of angiomyolipoma in clusters 2 and 3, the need for interventions in clusters 2 and 3 was significantly lower than in cluster 1 (p<0.00001) (supplementary table S11).
Future risk of pneumothorax was greatest in cluster 1 using both models in both cohorts (supplementary figure S7). The two-cluster model had the best predictive power, where combining all subjects showed the risk of pneumothorax was 3.3 (95% CI 1.7–5.6)-fold greater in cluster 1 than cluster 2 (p=0.0002) (figure 5c).
Survival and transplant data were available for 166 patients in the NHLBI cohort. Over a mean follow-up of 14 years from cluster assignment and up to 33 years from diagnosis, 38 patients had required lung transplantation and 14 had died. Time to the combined end-point of death or transplant was similar in the two-cluster model (supplementary figure S8). In the three-cluster model the incidence of death or transplant was 41.7% in cluster 1, 0% in cluster 2 and 4.2% in cluster 3 (p=0.0045) (figure 5d and supplementary table S12).
Discussion
By applying machine learning to carefully characterised clinical cohorts we have identified groups of related factors which are together associated with outcomes in women with LAM. While clinicians, and indeed patients, have recognised some associations between disease-related manifestations, our data for the first time allow us to quantify the risk of complications, improve prognostic advice and work toward stratified care. Separation into three clusters identifies a large cluster tending to present with pneumothorax or dyspnoea. Women in the second cluster are on average 5 years younger with a high prevalence of angiomyolipoma symptoms and TSC. Women in cluster 3, while comprising only 9% of subjects, presented 10–15 years later than those in clusters 1 and 2 with nonclassical or no symptoms, did not experience pneumothorax and tended to have almost normal lung function. Cluster 1 represents the classic description of women with LAM, presenting in their mid-30s with dyspnoea or pneumothorax and airflow obstruction. Cluster 2, where angiomyolipoma haemorrhage or TSC is the first clue to the presence of LAM and respiratory disease, is less severe. The third cluster represents an increasingly recognised group with milder disease who present at an older age with nonclassical symptoms, including haemoptysis and cough, or without LAM symptoms. We consider our findings are widely applicable and robust as we were able to independently replicate clusters, and although accuracy was reduced somewhat by missing data, the factors required for clustering are available in routine practice. Factors less commonly measured and requiring imputation in the initial analysis, including exertional hypoxaemia, bronchodilator reversibility and VEGF-D, were not required for clustering.
The importance of our findings lies in the differences in clinical manifestations, complications and outcomes between clusters. Women with LAM present at varying ages with different symptoms, lung function and menopausal status. Current guidelines do not give guidance on risk of complications or survival and patients with markedly differing disease may receive similar clinical advice [18, 19, 23]. Applying the methodology described here could allow clinical advice and decision making to be improved. Those assigned to clusters 2 and 3 presenting in their 50s or later could be reassured that their lifespan is unlikely to be shortened by LAM. The risk of pneumothorax is a common concern [17] and applying the two-cluster model can better quantify this risk, with individuals in cluster 1 having a 10% 1-year and 43% 5-year risk of pneumothorax compared with 0% and 15%, respectively, in cluster 2. Such data could be used to improve both patient advice and inform discussions on the need for preventative surgery. Despite a higher prevalence of angiomyolipoma in clusters 2 and 3, the risk of an intervention during follow-up is lower than in cluster 1 and the need for surveillance may be less in these groups. This reflects the differing natural history of angiomyolipoma across the clusters, with cluster 2 and to a lesser extent cluster 3 more likely to present with angiomyolipoma and need intervention than cluster 1, meaning enlarging and symptomatic tumours have already been treated. The absence of presentation with angiomyolipoma symptoms in cluster 1, despite an angiomyolipoma prevalence approaching 50%, suggests that angiomyolipoma is often overlooked in this group and makes intervention more likely in these newly identified tumours.
The use of unsupervised machine learning informs us both which variables are important in phenotyping subjects and potentially underlie or report upon desease progression. Input variables were chosen for their potential relevance to LAM based on disease manifestations and previous literature. These features included mode and age of presentation, existing clinical manifestations and their severity, oestrogen exposure, and pattern of lung physiology. The strongest factors separating clusters were age at first symptom and age at time of assessment. We are unable to say whether clusters represent discreet endotypes: clusters may reflect differences in disease activity with lead-time bias separating subjects presenting earlier due to pneumothorax or angiomyolipoma rather than later with dyspnoea. However, as rate of FEV1 decline, the best-documented marker of disease activity [9, 10, 24], is similar in all clusters, and clusters have separate disease manifestations suggesting differences in organ involvement, it seems likely the clusters represent discreet endotypes. In either case, assigning women with LAM to these clusters may be clinically useful. The molecular and cellular processes underlying differences between clusters are not clear, and further work examining biomarkers and histological features within the clusters is required [25–27]. This initial study shows that machine learning can be applied to the relatively small datasets provided by rare lung diseases using only basic clinical data. Improvements in imaging and biomarker development mean that these variables could be factored into future models which may further improve predictive accuracy.
Our findings are based on two of the largest and best-categorised cohorts of women with LAM reported, yet despite using unbiased methodology the study has some limitations. Cluster 3 in both cohorts comprised a relatively small number of subjects that may have some inbuilt survivor bias. Some variables require further assessment, e.g. pre-menopausal status has been associated with accelerated loss of lung function. Menopausal status was not a strong differentiator between clusters, and rates of decline of FEV1 and TLCO were similar between clusters despite differing proportions of pre-menopausal women. Age was a strong determinant of cluster assignment, as menopausal status and age are related. Menopausal status may still contribute to some of these differences and should continue to be a factor in clinical decisions. Due to differences in data recording between the UK and USA we were unable to reproduce all data, particularly for angiomyolipoma. Since the NHLBI cohort closed, rapamycin has become the standard of care for those with progressive disease [23] and has improved outcomes. How rapamycin affects different clusters and how clustering may inform the decision to use rapamycin should be studied prospectively, including using data from the ongoing Multicenter Interventional Lymphangioleiomyomatosis Early Disease Trial (ClinicalTrials.gov: NCT03150914). Our study was not designed to predict the need for therapy; however, it could be argued that those in cluster 1 should already be considered for early treatment with mammalian target of rapamycin inhibitors to prevent further loss of lung function.
In conclusion, we have used machine learning techniques to stratify women with LAM into clusters using simple clinical data. The method has the potential to improve advice on disease trajectory, complications and screening. Further prospective studies are warranted to determine if this can be translated to improve management for women with LAM.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-03036-2020.SUPPLEMENT
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-03036-2020.Shareable
Acknowledgements
We are grateful to the original NHLBI cohort investigators, the women with LAM who contributed to the study and Anne Tattersfield (Emeritus Professor of Respiratory Medicine, University of Nottingham, Nottingham, UK) for critical reading of the manuscript.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Data sharing: De-identified participant data for the US National Heart, Lung, and Blood Institute cohort is available from the National Disease Research Interchange (https://ndriresource.org) on request according to their terms. Individual-level data from the current UK cohort, even when anonymised, are potentially recognisable within the community and are not being made available. Code to run the clustering protocols is available on GitHub (www.github.com/nmpn/lam-stratification) without restriction.
Author contributions: S. Chernbumroong and J.M. Garibaldi performed the machine learning analysis. J. Johnson extracted clinical data. S. Miller performed laboratory analyses. F.X. McCormack and N. Gupta analysed and provided the NHLBI survival data. S.R. Johnson conceived the study, obtained the funding, saw the UK patients, performed data analysis and wrote the manuscript. All authors contributed to the final manuscript.
Conflict of interest: S. Chernbumroong has nothing to disclose.
Conflict of interest: J. Johnson has nothing to disclose.
Conflict of interest: N. Gupta has nothing to disclose.
Conflict of interest: S. Miller reports grants from British Lung Foundation, outside the submitted work.
Conflict of interest: F.X. McCormack has nothing to disclose.
Conflict of interest: J.M. Garibaldi has nothing to disclose.
Conflict of interest: S.R. Johnson reports grants from the National Institute for Health Research, The LAM Foundation and LAM Action, during the conduct of the study; personal fees for advisory board work from Pfizer, outside the submitted work.
Support statement: The study was funded by the Nottingham Medical Research Council/Engineering and Physical Sciences Research Council Molecular Pathology Node and the National Institute for Health Research Rare Disease Translational Research Consortium. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received August 5, 2020.
- Accepted November 17, 2020.
- Copyright ©ERS 2021. For reproduction rights and permissions contact permissions{at}ersnet.org