Abstract
Multimorbidity frequently affects the ageing population and their co-existence may not occur at random. Understanding their interactions and that with clinical variables could be important for disease screening and management.
In a cohort of 1969 chronic obstructive pulmonary disease (COPD) patients and 316 non-COPD controls, we applied a network-based analysis to explore the associations between multiple comorbidities. Clinical characteristics (age, degree of obstruction, walking, dyspnoea, body mass index) and 79 comorbidities were identified and their interrelationships quantified. Using network visualisation software, we represented each clinical variable and comorbidity as a node with linkages representing statistically significant associations.
The resulting COPD comorbidity network had 428, 357 or 265 linkages depending on the statistical threshold used (p≤0.01, p≤0.001 or p≤0.0001). There were more nodes and links in COPD compared with controls after adjusting for age, sex and number of subjects. In COPD, a subset of nodes had a larger number of linkages representing hubs. Four sub-networks or modules were identified using an inter-linkage affinity algorithm and their display provided meaningful interactions not discernible by univariate analysis.
COPD patients are affected by larger number of multiple interlinked morbidities which clustering pattern may suggest common pathobiological processes or be utilised for screening and/or therapeutic interventions.
Abstract
COPD patients are affected by interlinked comorbidities forming structured networks http://ow.ly/MT4XT
Introduction
Increased life expectancy has resulted in an ageing population suffering from one or more chronic, non-communicable diseases [1]. 60% of individuals aged ≥65 years have multiple medical conditions referred to as multimorbidity [2], and the presence of these conditions is thought to be a combination of individual susceptibility to each condition, cumulative noxious exposures [3] and the effect of ageing. The extent to which these factors interact likely explains some of the inter-subject variability in disease burden but this process is poorly understood. Applying the classic epidemiological approach in the study of noncommunicable diseases has its limitations because the success of this model depends on a short interval from exposure to disease manifestation and a small number of exposures, vectors or interactions. A more integrated multidimensional approach is needed if we want to further understand the complex relationships of chronic multimorbidity.
Chronic obstructive pulmonary disease (COPD) is one of the most prevalent chronic diseases worldwide [4]. It is a complex multicomponent disease better characterised when using, in addition to the degree of airway obstruction, extra-pulmonary clinical variables like body mass index (BMI), exercise capacity, dyspnoea, smoking status, exacerbations and age captured in the validated multidimensional indices like BODE (BMI, airflow obstruction, dyspnoea and exercise capacity) [5], ADO (age, dyspnoea and airflow obstruction) [6] and DOSE (dyspnoea, airflow obstruction, smoking, exacerbation) [7]. COPD patients suffer from a high proportion of comorbid conditions [8–11] and up to two-thirds of individuals suffering from COPD die of non-pulmonary causes [12, 13]. Comorbidities also influence important patient-centred outcomes, such as health status [14, 15], response to therapeutic intervention [16], and frequent hospital readmissions [17, 18]. The majority of studies of COPD comorbidities report the relationship between a single or small number of comorbidities [19–22]. The inclusion of multiple comorbidities provides a broader view of the complex interaction amongst them that may reveal previously unappreciated associations.
In a previous report, our group described the presence of 79 different comorbidities in the BODE COPD cohort. While prevalence of these conditions ranged from 0.1 to 52%, only 12 comorbidities conferred an increased risk of death when co-occurring with COPD [13]. We propose, however, that the true clinical significance of the other 67 comorbidities may not be adequately assessed by standard regression analysis. In addition, prior reports suggest that comorbidities cluster differently among clinical, demographic and anthropometric characteristics (included in the BODE, ADO and DOSE indices) therefore exploring the interactions between comorbidities and those variables could provide a unique opportunity to better understand COPD complexity beyond a single disease perspective [22–25].
Network science has emerged as a field focussed on the understanding of complex systems by mapping the interconnectivity of such data as objects, persons, proteins, mobile phones or diseases [26–29]. Networks graphics are composed of individual components called nodes and a grid of interconnecting edges representing some type of association between nodes. These interconnected nodes can then be readily visualised revealing the structural basis of the system as either hierarchical, random or scale free networks. The structure of the network will help to identify highly connected individual nodes and specific communities of nodes called modules with great potential applicability in medicine [27]. We hypothesised that such a network-based approach would provide a better understanding of the interaction among comorbidities, clinical and demographic variables in patients with COPD and compared the results with a non-COPD control group.
Methods
The cohort
The BODE registry is an observational prospective multicenter cohort of COPD patients attending outpatient pulmonary clinics at one of the six BODE study centres (Tampa, FL and Boston, MA (USA) and Pamplona, Tenerife, Las Palmas de Gran Canarias and Zaragoza (Spain)). In brief, COPD was defined by history of smoking and spirometric measures of lung function which followed American Thoracic Society/European Respiratory Society standards [30]. We compared our COPD cohort with non-COPD controls recruited from the same centres and same period. Non-COPD controls were smokers and non-smoker individuals with no evidence of airway obstruction on spirometry who volunteered to participate in our cohort. All participants were in clinically stable condition and receiving standard therapy when appropriate. Participants were excluded if they had an illness that was likely to result in death within a year, or inability to take the lung-function and 6-min–walk tests or any condition that could unacceptably increase the subject's risk of performing any of the testing. Between November 1997 and June 2013, a total of 1969 COPD patients and 316 non-COPD controls were enrolled and data collected at the initial visit was used for this analysis. The ethics committee at each of the participating centres approved the study and all patients signed informed consent prior to enrolment.
Comorbidities ascertainment
All comorbidities were systematically recorded by means of patient's interview including those diseases listed in the Charlson Comorbidity Index [31]. Comorbidities not reported by the patient but documented in their medical records were also included. The presence of each comorbidity was then confirmed by the investigators at each site through a detailed review of available medical records and medication or therapy specific to any disease.
Clinical variables
Demographics, anthropometrics (height and weight), pulmonary function testing, 6-min walk test distance and Modified Medical Research Council dyspnoea score were collected at the initial visit. We chose the following 10 relevant clinical variables included in the multidimentional indices of COPD [5–7] to explore the associations with the 79 comorbidities in the COPD cohort. The selected cut-offs are based on previous published studies: 1) Age categories: a) Participants 55 years old or younger, b) participant between the ages of 56 and 64 years, c) and those 65 years or older [32]; 2) BMI: a) underweight BMI ≤21 kg·m−2 or b) obese BMI ≥30 kg·m−2 [23, 33]; 3) symptoms: MMRC score ≥2 points [34]; 4) obstruction: a) FEV1 ≤50% predicated, b) FEV1 60–80% of predicted (mild obstruction) [35]; 5) exercise capacity as those with a 6MWD ≤350 m [36]; and 6) those who are current smokers [7].
Analysis
We identified a total of 79 distinct comorbidities and compared, using Fisher's exact test, their cumulative number and prevalence between the COPD and no-COPD subjects correcting for sex and age. We determined the correlation between all comorbidities and the 10 clinical variables by calculating the Pearson's correlation (Φ) for binary variables as described in the study by Hidalgo et al. [26]. To correct for family-wise error rate due to multiple comparisons, a correlation was considered significant if the p-value was ≤0.01. Additional analyses were performed with more stringent thresholds at p≤0.001 and p≤0.0001.
The network graph was constructed with the 79 comorbidities and for the identification of network modules we also included the 10 clinical variables. Each comorbidity and clinical variable is represented in the graph by a specific node with two attributes: the diameter of the node is proportional to the prevalence of the comorbidity or clinical variable and the colour represents the organ system to which the comorbidity belongs (cardiovascular, pulmonary, gastrointestinal, etc.). Links or edges between nodes represent statistically significant associations. The edges’ thickness represents the strengths of their association (Pearson's Φ). We used the Gephi Graph visualisation and Manipulation software V-0.8.2 beta [37] to create the network graphics. We calculated the degree (number of edges) of each node and plotted their distribution in order to classify the network as either random (most nodes have similar number of edges that follow a Poisson's distribution) or scale free network (few nodes are highly connected and the distribution follow a one-sided heavy tail) [27]. We determined if there was a relationship between disease prevalence and the number of links (degree) and identified those nodes representing hubs. Finally we search for the existence of modules represented by highly interlinked topological clusters in the network using the computational algorithm proposed by Blondel et al. [38] included in the Gephi statistical module. This community detection algorithm is based on a network property called modularity. Modularity is the fraction of the edges that fall within the given groups of nodes minus the expected such fraction if edges were distributed at random. The value of the modularity lies in the range: −1 to 1. If positive then the number of edges within groups exceeds the number expected on the basis of chance. Further the algorithm used in our analysis factor the weight of the links (in our case strength of association between comorbidities or Φ) [39]. All analyses were performed using SAS JMP Pro software, version 11.0 (SAS Institute).
Results
The baseline characteristics of the 1969 patients and 316 non-COPD controls are presented in table 1. The COPD cohort is predominantly represented by ever-smokers males, with a mean age of 67±9 years and with moderate to severe obstruction at baseline while the controls have significantly higher number of current smokers (48%), higher proportion of females and are younger. To account for the statistical difference in mean age, all comparisons amongst COPD and non-COPD controls were stratified using the following age brackets: 55 years old or younger, age between 56 and 64 years and those 65 years or older (table 1). In addition, we randomly selected 311 COPD subject matching the non-COPD cohort by age, sex and smoking status and performed the same analysis (table E1 in the supplementary material).
The median number of comorbidities per individual differed significantly between the 2 groups and the difference remains significant when the comparison was adjusted by sex, age and matched numbers (table 1 and table E1 in the supplementary material).
The comparison of comorbidities prevalence between COPD and non-COPD individuals stratified by age categories and sex are presented in figure E1 and E2 in the online supplement. We observed an increase in the number of different comorbidities from 65 to 73 and 79 diseases with increasing age bracket and the prevalence tended to be higher in the COPD cohort compared to controls. The ranking for the top 20 most frequent comorbidities also differs by age group as shown in table E2.
In the COPD cohort, female patients had fewer comorbidities than males (median 4, IQR 2–6 compared to 5, IQR 3–8, p=0.0001). In addition to the sex-predominant comorbidities (i.e. breast cancer), female patients had a significantly higher proportion of osteoporosis, hypothyroidism and venous insufficiency. The detailed comparison of comorbidities prevalence by sex is presented in table E3 of the online supplement and their Networks are illustrated in figure E3.
Comorbidities network
The comorbidities network for COPD and non-COPD controls are presented in figure 1. The COPD Comorbidities Network comprises 79 nodes representing each comorbidity and a total of 428 links (degree) representing those correlations with a p-value of ≤0.01, while the non-COPD controls network have 56 nodes and 149 links. If we compare the control group (n=316) with a randomly selected COPD subset matched by number, age, sex and current smoking status, the resulting network has 69 nodes and 214 edges (figure E4 of the online supplement). Comorbidities are not exclusive to COPD patients, however the prevalence, diversity of diseases and density of associations (links) are higher in COPD compared to controls.
If we use a p-value cut-off of ≤0.001 the number of links decreases to 357 and to 265 if the significance threshold is lowered to a p≤0.0001 (figure E5a). The correlation coefficients values ranged from 0.06 to 0.42 (median 0.09, IQR 0.07–0.12) and for negative correlations from −0.05 to −0.16 (median −0.08, IQR −0.07 to −0.12) (figure E5b online supplement).
The median number of links per node was 8 (IQR 3–17) and the distribution (links per node or degree) for the COPD Comorbidity Network is shown in figure E6 (online supplement). The histogram better fit an exponential distribution suggesting features of scale free networks, where one-third (n=23) of the comorbidities possess two-thirds of all the edges (figure E6). 23 comorbidities have 56, 69 or 72% of the links depending of the p-values cut-off used for significant association (p≤0.01, p≤0.001 and p≤0.0001 respectively). With more stringent p-values, less prevalent and connected nodes are lost with little changes seen in the highly connected nodes also referred as hubs (figure 2).
Network based clustering: comorbidities modules
The network (diseases and clinical characteristics) contains distinct clusters or modules of highly interlinked nodes and the visual representation reveal a central theme. Modules were detected in both COPD and non-COPD controls and are displayed for comparison in figure 3. We choose to enumerate each module rather than provide a name to avoid taxonomic bias. For the COPD cohort, Module 1A is comprised of 17 nodes connected by 81 edges around the theme of older COPD individuals, cardiovascular comorbidities and clinical characteristics known to confer worse prognosis in patients with COPD [5]. Note that in Module 1B (non-COPD controls), cardiovascular disease also clusters together, however the prevalence and number of links are smaller. Module 2A is composed of 23 nodes connected by 60 links centred around the theme of younger, currently smoking COPD patients with behavioural risk factors and psychiatric conditions manifesting diseases resulting from such risks like hepatitis, liver cirrhosis, pancreatitis and HIV. Similarly in the non-COPD controls, we observe higher prevalence of current smokers in the younger individuals (Module 1B) and the association with anxiety and depression. Interestingly in the control group, this module includes another respiratory disease (asthma). Module 3A is composed of 17 nodes and 42 edges in which COPD individuals with less obstruction and higher BMI whose comorbidities cluster around the components of the metabolic syndrome. In contrast, those components of the metabolic syndrome are present in the non-COPD controls but in the older age group bracket (Module 3B). Module 4A has 11 nodes and 25 links centralised around the theme of gastro-oesophageal reflux disease, gastric and duodenal ulcers, osteoporosis and degenerative joint disease also seen in the non-COPD controls in Module 3B.
Discussion
This study using a novel data visualisation tool to study the integrated relationship among 79 different comorbidities and 10 important clinical variables affecting patients with COPD and the comparison with a cohort of non-COPD controls had three main findings. First, comorbidities are not exclusive of patients with COPD however the prevalence and number of simultaneous comorbidities are higher in COPD. Second, diseases coexisting are interlinked beyond simple coincidence, a phenomenon difficult to discern if we study comorbidities in isolation. Diseases aggregate modules with meaningful syndromic associations. Third, the COPD Comorbidity Network has characteristics suggesting a scale-free architecture revealing the presence of an influential group of highly connected comorbidities (hubs). The scale-free structure of this network and the stability of hubs are preserved independent of the threshold levels of statistical significance used. Based on this principle, we can propose that those influential comorbidities could be targeted for specific intervention or for screening.
It is known that after adjusting for confounding factors, patients with COPD have a higher prevalence of cardiovascular diseases and lung cancer [8, 10, 11] than patients without COPD. Much less is known about the many other comorbidities that those patients frequently have. In this study, we applied network analysis to explore the associations between the 79 comorbidities detected in the BODE cohort and the 10 clinically relevant variables that characterise COPD multidimentionality and compared the results with a non-COPD control cohort. Using an unbiased approach, we constructed a graphic expression of the complex relationship between prevalence, connections between diseases and clinical variables, the presence of hubs and the clustering of comorbidities. Interestingly, the highly connected comorbidities (figure 2 and table 2) are known to be associated with important patient-related outcomes [13, 14, 17, 19, 20, 42, 43]. If we assume that the COPD Comorbidity Network is similar in behaviour to common networks seen in daily life, like airports composed of important hubs with many incoming and outgoing connecting flights with less influential airports in minor destinations, then disruption of air traffic will be mostly felt when delays affect large hubs [44]. Likewise we could hypothesise that perturbation of the comorbidities network could be achieved by selecting highly connected and treatable comorbidities and measure the impact of specific treatments [27, 45, 46]. This postulation is provocative and untested, however prior evidence indirectly supports this idea. This may be the case of CPAP and its benefit on patients with overlap COPD-OSA, beta receptor-blockers and angiotensin-converting enzyme-inhibitor therapy and their potential impact in reducing mortality in patients with COPD and cardiovascular disease [40, 47, 48]. Our integrative method differs from previously published COPD comorbidities studies by the capacity of visualising a wide range of comorbidities rather than focussing on single or small number of them [14, 17, 20–22, 40, 49–54] and relating them to clinical variables. Those studies have suffered from a reductionist approach and may have missed important associations provided by other diseases thought to be “innocent bystanders” that have a more prominent role when analysed in group rather than individually.
This network system analysis expands our current knowledge by showing modules that follow identifiable clinical observations. One such module (figure 3, Module 3A and 3B) shows a cluster of diseases or clinical characteristics related to the metabolic syndrome, interestingly in the non-COPD controls this pattern is seen the older age bracket. Other modules reveal relationship that are less evident but equally logical such as in module 2A (figure 3) with the clustering of behavioural risk factors (substance abuse) and diseases associated with those behaviours like hepatitis, pancreatitis, liver cirrhosis and liver cancer. Moreover, diseases or behavioural risks present in module 2A like substance abuse, depression and anxiety are highly connected (more than 25 links) providing support for the role of behavioural and social ties in a complex diseases such as COPD [55–58]. In addition to its practical application, the visualisation of comorbidities into modules or subnetworks provide clues that are hypothesis generating as they suggest the possibility of shared genetics or pathobiological mechanisms within highly correlated comorbidities [59]. When combining disease modules with clinical characteristics we observed that not all comorbidities affect all COPD individuals similarly as is the case with modules 1A, 2A and 3A (figure 3). Indeed, age or BMI are correlated with particular types of comorbidities, an observation that has implications for clinicians attempting to relate diseases to specific clinical sub-groups [23, 59].
The third important finding of this analysis is that the COPD Comorbidity Network has characteristics of a scale-free architecture revealing the presence of an influential group of highly connected comorbidities (hubs) suggesting an influential or intermediary role. The scale-free structure of this network and the stability of hubs are well preserved even after adjusting from less to more stringent p-values. At lower p-values, less prevalent and connected nodes are lost with little changes seen within hubs. In fact, 23 comorbidities shown in table 2 hold 56, 69 or 72% of the links depending of the p-values cut-off used for significant association and interestingly many of those comorbidities have been shown to influence important outcomes as described in previous studies [13, 14, 17, 19, 20, 42, 43].
Our study has several limitations. First, our cohort is predominantly male. Nevertheless, the study included 219 females which is not a trivial number. This was large enough to perform a sex sub-analysis allowing us to describe those comorbidities affecting predominately and for the first time sex specific co-morbidities (table E3 in the online supplementary material). Second, a common argument on comorbidity ascertainment relates to the methodology used to report them. Our method is a combination of history taking, direct questioning and medical records documentation with an attempt to confirm the diagnosis. This combined methodology has a high sensitivity to capture a large number of comorbidities (79), which will be difficult to reproduce in those cohorts that only include a selected number of comorbidities. Alternatively, we may underestimate the true prevalence by not objectively repeating or seeking confirmatory tests for each disease, a task that is prohibitively expensive. Third, this cohort is composed of ever-smokers COPD individuals (at least 10 pack-years of cumulative exposure to cigarette smoking), and we can't extrapolate our results to the important never-smokers COPD group whose main exposure is indoor pollution from biomass fuel burning. Lastly, our cohort was compared with a younger control group; however, we validated our analysis by performing the same comparison with a randomly selected, matched COPD sub-group and by performing age stratified analysis.
In conclusion, using system network analysis, we show that many comorbidities encountered in patients with COPD are interlinked among themselves and with clinical variables in a pattern that provides information about common risk factors or disease mechanism difficult to discern if studied individually. This data analysis and visualisation approach provides a hypothesis for potential co-management and “module” targeting in an attempt to disrupt common patterns towards more advanced or irreversible diseases.
Footnotes
Editorial comment in: Eur Respir J 2015; 46: 591–592 [DOI: 10.1183/09031936.00054815]
This article has supplementary material available from erj.ersjournals.com
Support statement: M. Divo was supported by the European Respiratory Society Maurizio Vignola Award.
Conflict of interest: None declared.
- Received September 19, 2014.
- Accepted March 25, 2015.
- Copyright ©ERS 2015