Abstract
Pulmonary arterial hypertension (PAH) is a devastating complication of systemic sclerosis (SSc). Screening for PAH in SSc has increased detection, allowed early treatment for PAH and improved patient outcomes. Blood-based biomarkers that reliably identify SSc patients at risk of PAH, or with early disease, would significantly improve screening, potentially leading to improved survival, and provide novel mechanistic insights into early disease. The main objective of this study was to identify a proteomic biomarker signature that could discriminate SSc patients with and without PAH using a machine learning approach and to validate the findings in an external cohort.
Serum samples from patients with SSc and PAH (n=77) and SSc without pulmonary hypertension (non-PH) (n=80) were randomly selected from the clinical DETECT study and underwent proteomic screening using the Myriad RBM Discovery platform consisting of 313 proteins. Samples from an independent validation SSc cohort (PAH n=22 and non-PH n=22) were obtained from the University of Sheffield (Sheffield, UK).
Random forest analysis identified a novel panel of eight proteins, comprising collagen IV, endostatin, insulin-like growth factor binding protein (IGFBP)-2, IGFBP-7, matrix metallopeptidase-2, neuropilin-1, N-terminal pro-brain natriuretic peptide and RAGE (receptor for advanced glycation end products), that discriminated PAH from non-PH in SSc patients in the DETECT Discovery Cohort (average area under the receiver operating characteristic curve 0.741, 65.1% sensitivity/69.0% specificity), which was reproduced in the Sheffield Confirmatory Cohort (81.1% accuracy, 77.3% sensitivity/86.5% specificity).
This novel eight-protein biomarker panel has the potential to improve early detection of PAH in SSc patients and may provide novel insights into the pathogenesis of PAH in the context of SSc.
Abstract
Early screening for pulmonary arterial hypertension in patients with systemic sclerosis improves patient outcome. This study identified a novel eight-protein biomarker panel that has the potential to assist early detection of PAH in this patient group. https://bit.ly/373BNkL
Introduction
Pulmonary arterial hypertension (PAH) is a devastating complication of systemic sclerosis (SSc) affecting 7–12% of patients with this condition [1, 2]. Right heart failure as a result of PAH is one of the leading causes of death in this patient cohort, accounting for 26% of deaths [3], and SSc-PAH represents 15–20% of all forms of PAH in Europe [4] with a 30% 1-year mortality [5, 6]. There is significant interest in developing improved screening tools using a variety of approaches including blood biomarkers, imaging, exercise testing (reviewed in [7]) and real-world healthcare resource utilisation data [8, 9] to improve the detection of PAH and decrease the time from first symptom to diagnosis [10]. Study data from Humbert et al. [11] demonstrate that screening for PAH in an at-risk population of patients with SSc enables early diagnosis and therefore treatment of a milder form of the disease. This resulted in a significant increase in survival, but it is important to acknowledge the lead-time and length-time biases introduced by the screening study approach [11], and earlier treatment remains an important goal for PAH. To this end, the development of the DETECT algorithm, which has been shown to outperform symptom- and transthoracic echocardiography-based diagnosis in a subset of SSc patients with early stages of PAH [12, 13], has been proposed in the diagnosis work-up of the European Society of Cardiology/European Respiratory Society guidelines [14].
The DETECT algorithm includes values for the circulating biomarkers N-terminal pro-brain natriuretic peptide (NT-proBNP) and uric acid, demonstrating the potential of biomarkers to support early detection of PAH in SSc patients. Utilising serum samples and clinical data collected during the DETECT study, we hypothesised that a broader proteomic signature could be developed to classify patients with SSc into those with and without PAH. Using serum samples from 157 randomly selected patients from the DETECT study (DETECT Discovery Cohort) we identified an eight-protein signature for PAH in SSc patients. This panel was subsequently reproduced in an independent cohort of 44 sequentially consented SSc patients (Sheffield Confirmatory Cohort). We also demonstrate that some protein biomarkers can predict individual clinical variables from the DETECT study.
Materials and methods
DETECT Discovery Cohort
Patients were eligible for inclusion in DETECT if they were aged ≥18 years and had 1) a diagnosis of SSc of >3 years duration from the first non-Raynaud's symptom, 2) diffusing capacity of the lung for carbon monoxide (DLCO) <60% predicted, 3) forced vital capacity (FVC) ≥40% predicted and 4) not had pulmonary hypertension (PH) confirmed by right heart catheterisation (RHC) prior to enrolment. PAH was subsequently confirmed by RHC during the DETECT study. Approval was obtained from all relevant boards/ethics committees and all patients provided written informed consent [12]. Blood sampling was performed within a few days of RHC. All samples were collected and processed following Standard Operating Procedures and stored at −80°C until assaying. Stored serum samples from 77 out of 87 patients from the DETECT study with World Health Organization Group 1 PH (PAH) [3] were used in the DETECT Discovery Cohort. 10 patients with PAH were excluded because the samples reached the time limit of storage by consent. A random sample of 80 out of 321 non-PH DETECT study patients was selected for controls.
Sheffield Confirmatory Cohort
An independent validation cohort was obtained from PAH treatment-naive patients sequentially recruited to The Sheffield Teaching Hospitals Observational Study of Patients with Pulmonary Hypertension, Cardiovascular and Lung Disease (STH-ObS) undergoing RHC for suspected PH at the Sheffield Pulmonary Vascular Disease Unit (Royal Hallamshire Hospital, Sheffield, UK) (Research Ethics Committee 18/YH/0441). In addition to RHC, patients were systematically investigated with high-resolution computed tomography, cardiac magnetic resonance imaging, pulmonary function testing and the incremental shuttle walk test according to local standard procedure. Serum samples were collected from diagnostic RHC prior to diagnosis of PH from 2008 to 2015 following local Standard Operating Procedures. All patients had a clear diagnosis of SSc and investigations consistent with PAH without any concomitant interstitial lung disease (PAH n=22). A cohort of disease controls was selected from patients who had a firm diagnosis of SSc, but in whom RHC excluded PH (mean pulmonary arterial pressure (mPAP) <25 mmHg) (non-PH n=22). Serum from all patients was aliquoted and stored at −80°C until requested for this study.
Measurement of circulating protein biomarkers
All serum samples were stored at less than −70°C until tested. Samples were thawed at room temperature, vortexed, spun at 3700×g for 5 min to remove precipitates and transferred to a master microtitre plate. 313 analytes were then assessed in serum using a multiplexed immunoassay (DiscoveryMAP version 3.0 assay; Myriad RBM, Austin, TX, USA). Monitoring of internal controls and batch testing between cohorts was performed internally by Myriad RBM.
Clinical variables
The demographic and clinical variables selected from the DETECT study [3] to explore their relationship with the serum biomarkers included 1) demographic and clinical characteristics: sex, age, body mass index, smoking history, SSc subtype (limited, diffuse and mixed), current/past telangiectasias, disease severity (modified Rodnan skin score (mRSS)); 2) echocardiography: qualitative assessment of right ventricle pump function (normal, slightly impaired, moderately impaired and severely impaired), right atrium area, right ventricle area, right ventricle diameter, tricuspid annular plane systolic excursion (TAPSE) and tricuspid regurgitant velocity; 3) electrocardiography: right axis deviation; 4) RHC: pulmonary vascular resistance (PVR), mPAP and pulmonary arterial wedge pressure (PAWP); 5) pulmonary function tests: FVC, FVC % pred, DLCO and DLCO % pred; and 6) NT-proBNP.
These variables were also selected from the Sheffield Confirmatory Cohort, with the exception of current/past telangiectasias, disease severity (mRSS), right ventricle area, right ventricle diameter and TAPSE.
Data processing and analysis
The selection of patients for the DETECT Discovery Cohort was performed using all samples from patients with PAH who had valid serum samples followed by exclusion of patients for pragmatic reasons (quality control) and a random selection from the non-PH group. The Sheffield Confirmatory Cohort comprised patients sequentially recruited with suspected PH during the period from 2008 to 2015 who underwent extensive phenotyping for suspected PAH. A subselection for patients who had SSc either with or without PAH was then performed.
The demographic and clinical variables for the DETECT Discovery Cohort (table 1) and the Sheffield Confirmatory Cohort (table 2) were summarised using descriptive statistics, and the PAH and non-PH groups were compared using the Wilcoxon rank-sum test for continuous variables (and for the “qualitative evaluation of right ventricle pump” after conversion of the ordered factor levels to 1–4 values) and Fisher's exact test for categorical variables.
Baseline characteristics of pulmonary arterial hypertension (PAH) and non-pulmonary hypertension (non-PH) systemic sclerosis (SSc) patients in the DETECT Discovery Cohort
Patient characteristics of pulmonary arterial hypertension (PAH) and non-pulmonary hypertension (non-PH) systemic sclerosis (SSc) patients in the Sheffield Confirmatory Cohort
Analytes with >50% missing values or near zero variance were excluded from analyses. One patient from the Sheffield Confirmatory Cohort was removed because the data contained 25 missing values for analytes. Out of the 313 analytes that were measured, 271 analytes were used for the DETECT Discovery Cohort (n=157) and 258 analytes were used for the Sheffield Confirmatory Cohort (n=44) for subsequent analysis. Data transformations were applied to some of the clinical variables in order to obtain normal distributions. A log(x) transformation was applied to the variable rhcpvr (PVR), and a log(1+x) transformation was applied to the variables mrsstot (mRSS) and mmraa (right atrium area). After initial analyses we found that age was the 12th most important variable in the DETECT Discovery Cohort and 26th in the Sheffield Confirmatory Cohort. Sex was not in the top 100 variables for either cohort. We therefore chose not to correct for age or sex in subsequent analyses.
Random forests
Missing values were imputed with the NIPALS algorithm (www.github.com/kwstat/nipals; R package version 0.5). Within the random forest (RF) analysis, the number of variables randomly sampled as candidates at each node was set to its default value. The importance of biomarker variables in each cohort was assessed using the mean decrease in the node impurity criterion (Gini index). Three distinct RF analyses were performed with the aim of identifying a consistent panel of biomarkers to classify PAH from non-PH in patients with SSc: 1) in the DETECT Discovery Cohort using the 271 analytes detected, 2) in the Sheffield Confirmatory Cohort using the 258 analytes detected, and 3) in both cohorts using 238 common protein analytes between the discovery and validation datasets (31 analytes from the DETECT dataset could not be mapped to the Sheffield dataset and 56 analytes from the Sheffield dataset could not be mapped to the DETECT dataset, because they were either previously filtered or absent in the original dataset). Further RF analyses performed all possible combinations of smaller panels of the eight or less biomarkers that were common to the top 20 biomarkers in the three RF analyses. Performance of the RFs was internally validated by averaging the area under the receiver operating characteristic curve (ROC-AUC) analyses of repeated (100 times) 10-fold cross-validations. RF analyses were performed using the package randomForest version 4.6-14 and R version 3.5.0 (R Project, Vienna, Austria) from CRAN (www.CRAN.R-project.org). The performance of the RF was assessed in the Sheffield Confirmatory Cohort using balance accuracy.
Partial least squares
17 clinical variables from the DETECT Discovery Cohort were taken individually and assessed as a dependent variable and sparse partial least squares (SPLS) regressions performed to identify biomarkers best able to predict the clinical variable. The biomarker data were Box–Cox transformed to normalise the residual distributions. Lambda parameters ranging from −2 to 2 were allowed. Performances were estimated by repeated (100 times) 10-fold cross-validations. The number of components was set to 1 and the sparsity level at 0.7. Missing values were imputed within the cross-validation process using a k-nearest-neighbours approach (k=5). SPLS analyses were performed using the package spls version 2.2-3 run in R version 3.5.0 from CRAN.
Results
Patient disposition, demographic and clinical characteristics
The patient disposition flowchart (figure 1) shows the number of patients and analytes for the DETECT Discovery Cohort and the Sheffield Confirmatory Cohort. The demographic and clinical characteristics of the two cohorts are summarised and the two groups compared as shown in tables 1 and 2, respectively. A comparison of the two cohorts is included as supplementary figures S1–S9. This comparison confirmed, as expected, that the DETECT samples (collected from rheumatology clinics) were slightly more heterogeneous with less advanced PAH than the Sheffield samples collected from a specialist PH referral centre.
Patients and analytes for a) the DETECT Discovery Cohort and b) the Sheffield Confirmatory Cohort. RHC: right heart catheterisation; PH: pulmonary hypertension; PAH: pulmonary arterial hypertension; STH-ObS: The Sheffield Teaching Hospitals Observational Study of Patients with Pulmonary Hypertension, Cardiovascular and Lung Disease; CTD: connective tissue disease; ILD: interstitial lung disease. #: enrolled 2008–2011; ¶: enrolled 2008–2015; +: 271 protein analytes passed quality control in the DETECT Discovery Cohort and 238 protein analytes passed quality control in the Sheffield Confirmatory Cohort (238 protein analytes were suitable for investigation in both cohorts).
Protein biomarker selection using RFs
Analyses of serum samples from PAH (n=77) and non-PH patients (n=80) from the DETECT Discovery Cohort identified 271 of the 313 protein analytes on the Myriad RBM Discovery platform that passed quality control (supplementary table S1). RF analysis identified proteins that segregated PAH from non-PH patients with SSc with an average ROC-AUC of 0.71. Figure 2a shows the top 20 variables (proteins) of importance to distinguish PAH, as ranked by the mean decrease in the Gini index. To determine whether this could be replicated in a distinct cohort of treatment-naive patients with samples collected at diagnostic RHC, we ran 44 serum samples (PAH n=22 and no-PH n=22) from Sheffield on the same Myriad RBM Discovery platform. In this cohort, 258 out of 313 protein analytes passed quality control (supplementary table S1). An independent RF analysis identified proteins that predicted PAH with a ROC-AUC of 0.83. Figure 2b shows the top 20 variables (proteins) of importance in the Sheffield Confirmatory Cohort.
Variables (proteins) of importance to classify pulmonary arterial hypertension. Variable importance output of random forests applied to a) the DETECT Discovery Cohort, b) the Sheffield Confirmatory Cohort and c) 238 common proteins between the two cohorts, applied on the DETECT Discovery Cohort. The plots show the most important variables (y-axis) as assessed by the mean decrease of the Gini index (x-axis). Proteins are ordered top to bottom as most to least important. The eight common variables in all analyses appear in red. See supplementary table S1 for details of the proteins on the Myriad RBM Discovery platform.
Encouragingly, 238 common analytes were consistently measured in both the DETECT and Sheffield cohorts, and an accuracy of 86% was observed when applying the RF trained on the DETECT Discovery Cohort to the Sheffield Confirmatory Cohort. Specifically, collagen IV, endostatin, insulin-like growth factor binding protein (IGFBP)-2, IGFBP-7, matrix metallopeptidase (MMP)-2, neuropilin-1, N-terminal pro-brain natriuretic peptide (NT-proBNP) and RAGE (receptor for advanced glycation end products) were identified as common PAH biomarkers. All results and summary statistics of the biomarkers that were analysed in the DETECT Discovery Cohort and in the Sheffield Confirmatory Cohort are shown in supplementary tables S2 and S3, respectively. Although eight of the top 20 variables of importance were identified in both cohorts after independent RF analysis, a different ranking of the common biomarkers in the two cohorts was noted (figure 2a and b). We therefore performed a new RF analysis on the DETECT Discovery Cohort dataset using the 238 common analytes identified in both cohorts. The 20 top-most important variables of this RF analysis are shown in figure 2c. The eight common PAH biomarkers from the independent analysis of each cohort were again selected in the top 20 variables of importance. The average ROC-AUC for this new analysis in the DETECT Discovery Cohort was 0.72.
The individual protein levels of the conserved eight biomarkers were significantly higher in SSc patients with PAH compared with non-PH patients in the DETECT Discovery Cohort and the Sheffield Confirmatory Cohort (figure 3).
Serum concentrations of the eight best-performing and common proteins in predicting pulmonary arterial hypertension (PAH) in the DETECT Discovery Cohort and the Sheffield Confirmatory Cohort: a) collagen IV, b) endostatin, c) insulin-like growth factor binding protein (IGFBP)-2, d) IGFBP-7, e) matrix metallopeptidase (MMP)-2, f) neuropilin-1, g) N-terminal pro-brain natriuretic peptide (NT-proBNP) and h) RAGE (receptor for advanced glycation end products). PH: pulmonary hypertension. Boxes indicate median and interquartile range; whiskers indicate the full range of the data. Individual patient samples are represented by dots. p-values from the Wilcoxon rank-sum test between the two patient groups.
Performance of the eight-protein panel to classify PAH
To determine the potential of the eight protein biomarkers to classify PAH from a mixed cohort of patients with SSc, we performed further RF analyses for all 255 possible combinations of the eight biomarkers identified to determine the panel with the best performance. Panel performance was estimated by repeated cross-validation and a subset panel of six biomarkers, including RAGE, IGFBP-7, collagen IV, endostatin, MMP-2 and IGFBP-2, classified PAH with the best ROC-AUC (0.751) in the DETECT Discovery Cohort with a sensitivity of 66.8% and a specificity of 71.4% (figure 4a). We next assessed the performance of this six-protein biomarker panel in the Sheffield Confirmatory Cohort, which gave a ROC-AUC of 0.866 (figure 4b) and balanced accuracy of 0.705 with a sensitivity of 54.5% and a specificity of 86.4%.
a, b) Performance of the panel of six common protein biomarkers in a) the DETECT Discovery Cohort and b) the Sheffield Confirmatory Cohort: receiver operating characteristic (ROC) curves of the pulmonary arterial hypertension (PAH) versus non-pulmonary hypertension (non-PH) classifier. ROC-AUC: area under the ROC curve; RAGE: receptor for advanced glycation end products; IGFBP: insulin-like growth factor binding protein; MMP: matrix metallopeptidase; SSc: systemic sclerosis; NT-proBNP: N-terminal pro-brain natriuretic peptide. The six selected proteins are the subset from the eight common proteins that produced the best ROC-AUC in the DETECT Discovery Cohort (0.751). c, d) Addition of c) NT-proBNP or d) NT-proBNP plus neuropilin-1 to the six selected proteins.
Given the decrease in sensitivity observed with our six-protein biomarker panel in the Sheffield Confirmatory Cohort, we tested whether adding back NT-proBNP or NT-proBNP plus neuropilin-1 (since NT-proBNP is already part of the DETECT algorithm) would improve the reproducibility of the panel. As expected from our previous analysis, adding NT-proBNP (seven-biomarker panel; figure 4c) or NT-proBNP plus neuropilin-1 (eight-biomarker panel; figure 4d) produced a reduced performance in the DETECT Discovery Cohort, generating a ROC-AUC of 0.741 with a sensitivity of 65.2% and a specificity of 68.9% for the seven-biomarker panel (figure 4c) and a ROC-AUC of 0.741 with a sensitivity of 65.1% and a specificity of 69.0% for the eight-biomarker panel (figure 4d). We next tested the performance of both the seven- and eight-protein panels in the Sheffield Confirmatory Cohort. For the seven-protein panel including NT-proBNP we achieved a balanced accuracy of 0.77 with a sensitivity of 68.2% and a specificity of 86.4%. This was slightly improved in the eight-protein panel including both NT-proBNP and neuropilin-1, generating a balanced accuracy of 0.81 with a sensitivity of 77.3% and a specificity of 86.5%. Therefore, while the addition of NT-proBNP or NT-proBNP plus neuropilin-1 slightly decreased the sensitivity and specificity in the derivation cohort, the addition of the two biomarkers improved the accuracy, sensitivity and specificity in the validation cohort.
Identifying biomarkers that predict clinical variables related to PAH
The combination of serological biomarkers and clinical variables to create composite scores has strengthened the conventional diagnostic approach in SSc patients [12]. Identifying additional biomarkers, beyond NT-proBNP and uric acid, that could predict clinical variables related to PAH would be highly advantageous and reduce the need for repeated invasive procedures, e.g. RHC. To investigate whether any of the protein biomarkers measured could accurately predict any of the recorded clinical variables we applied a SPLS regression analysis to the DETECT Discovery Cohort. Of the clinical variables tested (table 1), the association with our biomarker composite panel was generally weak, with PVR providing the best R2 (R2=0.321): NT-proBNP, RAGE, IGFBP-7, pyruvate carboxylase (cFib), vascular cell adhesion molecule (VCAM)-1 and surfactant protein D (SP-D) (figure 5). The highest correlation for PVR was obtained with NT-proBNP (r=0.46), RAGE (r=0.43) and IGFBP-7 (r=0.41). To verify the relevance of the identified variables we applied a RF analysis as an alternative to the SPLS analysis. RAGE, NT-proBNP, IGFBP-7, SP-D and VCAM-1 that were selected by SPLS to predict PVR were also among the 10 most important features according to a RF approach. When looking at these biomarkers in relation to all clinical variables, we observed that RAGE also showed a relevant correlation with FVC % pred (r=0.51, p=8.63e-12) (supplementary figure S10).
Sparse partial least squares association of pulmonary vascular resistance (PVR) to six common biomarker proteins: a) N-terminal pro-brain natriuretic peptide (NT-proBNP), b) RAGE (receptor for advanced glycation end products), c) insulin-like growth factor binding protein (IGFBP)-7, d) pyruvate carboxylase (cFib), e) vascular cell adhesion molecule (VCAM)-1 and f) surfactant protein D (SP-D). Correlation plots for each individual biomarker variable with PVR, showing Pearson's correlation coefficient between the logarithm of the two variables and the corresponding p-value.
Discussion
PAH is a devastating complication of SSc, and there is evidence that early detection and treatment can improve the outcomes. The DETECT algorithm is a frequently used screening model for the detection of PAH in SSc patients [12]. It contains eight variables, including clinical variables from multiple tests and two circulating biomarkers, i.e. NT-proBNP and serum uric acid, both reflecting cardiac dysfunction [15–17]. The sensitivity for the detection of PAH using DETECT is high (96%), but the specificity is relatively low (48%). Much effort is therefore currently given to the discovery of diagnostic biomarkers for the accurate and noninvasive prediction of PAH in SSc patients and other “at-risk” populations.
In this study we used serum samples from the DETECT study and an unbiased high-throughput assay platform to discover novel protein biomarkers with the potential to aid screening and diagnosis. We have identified and validated a panel of eight biomarkers, i.e. RAGE, IGFBP-7, collagen IV, endostatin, MMP-2, IGFBP-2, NT-proBNP and neuropilin-1, with potential for classifying patients with PAH in a mixed population of patients with SSc.
Several of the proteins identified have been previously found to play significant roles in pulmonary vascular remodelling (RAGE and MMP-2), angiogenesis and cellular growth (collagen IV, endostatin, IGFBP-2 and neuropilin-1), and cardiac dysfunction (NT-proBNP and IGFBP-7). Among the top-ranking proteins, RAGE plays an important role in the accumulation of extracellular matrix proteins and particularly in vascular remodelling [18–23]. RAGE expression has been shown to be upregulated in pulmonary arteries that were isolated from Sugen 5416 plus hypoxia (SuHx)-induced PH mice and deletion of RAGE protects these mice from PH. Serum levels of soluble RAGE were also shown to be higher compared with controls in patients with idiopathic PAH and chronic thromboembolic pulmonary hypertension [24, 25] and in SuHx PH mice [22], and in vitro soluble RAGE has been shown to regulate bone morphogenetic proteins and calcium binding protein S100A4-induced proliferation and migration in pulmonary artery smooth muscle cells [20, 26]. The diagnostic value of RAGE was maintained when adjusted for age in our study (data not shown), negating a response to increasing advanced glycation products with age. We also found that RAGE levels were lower in serum of patients with diffuse compared with limited SSc, which excluded a relationship between RAGE expression levels and the extent of skin and organ fibrosis involvement (supplementary figure S11). MMP-2, the other marker of vascular remodelling, is a metalloproteinase involved in the breakdown of the extracellular matrix and collagen IV, and contributes to the degradation of basal membranes [27]. Interestingly, hypoxia can attenuate the physiological postnatal increase of MMP-2 expression, which impacts alveolar development and associated pulmonary arterial remodelling [28]. It was recently shown that MMP-2 expression increased under hypoxia in pulmonary artery endothelium concomitant with a thickening of blood vessels. Inhibition of MMP-2 in a mouse model of PH prevented the development of PH and the proliferation of pulmonary artery endothelial cells under hypoxia [29].
IGFBP-7, the protein ranked second in our panel, has been associated with cellular senescence and cardiac dysfunction [30–33], and has the potential to complement the role of NT-proBNP, an established marker of cardiac stress [34]. Endostatin (collagen XVIII) and collagen IV, ranking at positions 3 and 4 in our panel, are important components of the extracellular vascular basement membrane that separates, for instance, epithelial cells and endothelial cells in the heart [35–37]. Endostatin was previously reported as a potential biomarker in PAH that could predict adverse outcome [38]. Although the role of collagen IV in PAH has not been described specifically, collagen IV synthesis can be promoted by nitric oxide production and collagen IV increase was shown to contribute to angiogenesis of lung endothelial cells [38]. Neuropilin-1, another molecule involved in angiogenesis, interacts with vascular endothelial growth factor (VEGF) family members to stimulate angiogenesis in endothelial cells [39, 40]. VEGF receptors are expressed by endothelial cells within the plexiform lesions in patients with PAH [41, 42]. These different markers of angiogenesis have therefore direct implication in the pathology of PAH and may relate to the early development of PAH in SSc patients. The potential role that these proteins could play in various aspects of PAH pathobiology provides encouragement that the proteins selected may have sensitivity to disease-modifying therapies, although a further study on longitudinal samples would be required to demonstrate this.
The loss in performance of the six-protein panel derived from the DETECT Discovery Cohort when tested in the Sheffield Confirmatory Cohort most likely relates to disease heterogeneity in both SSc and PAH. Other contributing factors could be reflective of different sample processing or that the Sheffield Confirmatory Cohort was obtained from patients referred to a specialist PAH centre and they therefore had a higher probability of having PAH; indeed, the Sheffield Confirmatory Cohort may have also had more advanced PAH (supplementary figures S1–S9) as reflected by the 2-fold higher median NT-proBNP when compared with the DETECT Discovery Cohort. However, it is also encouraging that the protein panel performs well within both the general rheumatology setting where PAH may be less severe and the specialist PH centre with potentially more advanced disease. It is important than any biomarker/panel has sensitivity across the spectrum of disease. NT-proBNP is a widely used biomarker for PAH, reflecting an elevated right ventricle overload. However, raised NT-proBNP is not specific to PAH since other pathological conditions can lead to an increase in right ventricle overload and NT-proBNP levels [7].
As well as identifying a biomarker panel to assist early diagnosis of patients with PAH and SSc, we also examined whether we could identify potential protein biomarker surrogates for clinical variables commonly used to diagnose PAH. Among those tested, PVR was the clinical parameter best predicted by a biomarker panel. This panel, consisting of five proteins (RAGE, NT-proBNP, SP-D, VCAM-1 and cFib), contained two proteins (RAGE and NT-proBNP) that overlapped with the most reproducible eight-protein diagnostic panel (RAGE, IGFBP-7, collagen IV, endostatin, MMP-2, IGFBP-2, NT-proBNP and neuropilin-1). While cFib was not confirmed using a second analytical method (RF), all other markers were confirmed. SP-D has a well-recognised role in regulating inflammation and VCAM-1 has a well-recognised role in the mediation of leukocyte–endothelial cell adhesion, supporting the concept that the proteins may influence pulmonary vascular remodelling and therefore PVR. PAH is characterised by progressive pulmonary vascular remodelling leading to increased PVR, and eventually to right heart failure and death [43, 44]. Since PVR reflects the progressive pathogenesis of PAH [45], monitoring PVR in response to treatment by any method, but particularly a noninvasive method, is highly desirable. The use of this panel may therefore provide valuable information on ongoing pathogenic processes in SSc-PAH patients.
In this study, our biomarker panel identifies patients with SSc-PAH in a superior manner to NT-proBNP alone (ROC-AUC 0.689) (data not shown), perhaps suggesting that our panel detects early PAH before cardiac stress becomes dominant. It is interesting therefore to examine how this panel might perform under the new 2018 World Symposium on Pulmonary Hypertension recommended classification (mPAP 21–24 mmHg) to identify patients that have mild or early PAH [46]. However, this would have to be tested in larger relevant cohorts of patients. Since our initial cohort selection and analysis it has been recognised that patients with a borderline elevation of mPAP (i.e. mPAP 21–24 mmHg) can develop symptoms comparable to patients with mPAP ≥25 mmHg. Specifically, they have increased risk of progression to ≥25 mmHg and had a higher mortality rate than patients with mPAP <21 mmHg [47, 48]. Given the recent recommendation to change the cut-off for the definition of PH to include patients with mPAP >20 mmHg [46] we reran the analysis with this new criteria (mPAP >20 mmHg, PVR ≥3 WU and PAWP ≤15 mmHg). Reassuringly, only two biomarkers, i.e. IGFBP-7 and neuropilin-1 (the two weakest variables of importance), were dropped from the eight variables identified, suggesting that the remaining six proteins are particularly sensitive to early changes associated with PAH in the context of SSc.
A previous study by Rice et al. [49] identified both midkine and follistatin-like 3 as two proteins that might serve as SSc-PAH biomarkers, albeit in a small discovery cohort of only 13 patients, all with limited SSc. Our study included patients with diffuse, limited and mixed SSc in the DETECT Discovery Cohort, which is more reflective of the patient population within the rheumatology setting. Unfortunately, neither protein was within the Myriad DiscoveryMAP version 3.3 platform, so we were unable to determine whether these proteins could classify PAH in both of our cohorts. Future prospective studies should look to compare or incorporate these proteins.
A limitation of our current study is the lack of longitudinal data. Testing whether the protein panel changes in response to treatment and disease progression is an important next step. This is pertinent not only to PAH-specific therapies but also to background immunosuppressive therapies that many of these patients will be treated with. One other significant limitation of our study is the matched PAH and non-PH cohort size. With an estimated 10% transition of patients with SSc to develop PAH, future studies looking to validate these models would ideally have a more representative 1:10 PAH:no-PH proportion. We also acknowledge that our patient cohorts do not fully reflect the SSc patient population. The patients in the Sheffield Confirmatory Cohort were more advanced in their diagnosis process and had a high suspicion of having PAH for referral to a specialist centre, and the improved performance of the biomarker panel trained on the DETECT Discovery Cohort may reflect this. However, this study provides important proof-of-concept data that applying machine learning tools to proteomic data can identify protein biomarkers to help screen patients at risk of PAH. Although not directly tested here, it is highly possible that this protein panel, developed on PAH in the context of SSc, may have utility in other forms of PAH or identify patients in other non-PAH diagnostic groups who have some pulmonary vascular remodelling.
The ultimate aim of this study is to identify a screening protein panel that can be incorporated into a future iteration of the DETECT algorithm to enhance the sensitivity and specificity of the current DETECT algorithm. Clearly, before this can be achieved the current protein panel will require further validation and “tuning” in a prospective and longitudinal SSc cohort within the rheumatology setting. Although challenges remain, integrating proteomic profiling into an existing screening programme such as DETECT should be achievable.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary figures ERJ-02591-2020.Figures
Supplementary table S1 ERJ-02591-2020.Table_S1
Supplementary tables S2 and S3 ERJ-02591-2020.Table_S2_S3
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-02591-2020.Shareable
Acknowledgements
The authors would like to thank all investigators and patients involved in the DETECT study and those participating in The Sheffield Teaching Hospitals Observational Study of Patients with Pulmonary Hypertension, Cardiovascular and Lung Disease (STH-ObS).
Footnotes
This article has an editorial commentary: https://doi.org/10.1183/13993003.00205-2021
This article has supplementary material available from erj.ersjournals.com
Author contributions: All authors contributed to the study design, data acquisition, and analysis and interpretation of the data. Y. Bauer and A. Lawrie drafted the manuscript. All authors read and revised the content critically, approved the final manuscript, and are accountable for its content.
Conflict of interest: Y. Bauer is a former employee of Actelion Pharmaceuticals Ltd and Idorsia Pharmaceuticals Ltd, and is now an employee of Galapagos GmbH.
Conflict of interest: S. de Bernard reports grants from Idorsia, during the conduct of the study.
Conflict of interest: P. Hickey has nothing to disclose.
Conflict of interest: K. Ballard is an employee of Myriad RBM.
Conflict of interest: J. Cruz is an employee of Myriad RBM.
Conflict of interest: P. Cornelisse is a former employee of Actelion Pharmaceuticals Ltd.
Conflict of interest: H. Chadha-Boreham is a former employee of Actelion Pharmaceuticals Ltd.
Conflict of interest: O. Distler reports personal fees for consultancy from Amgen, AbbVie, Acceleron Pharma, AnaMar, Actelion, Alexion, Arxx Therapeutics, Baecon Discovery, Blade Therapeutics, Corbuspharma, CSL Behring, ChemomAb, Horizon Pharmaceuticals, Ergonex, Galapagos NV, Glenmark Pharmaceuticals, GSK, Inventiva, Italfarmaco, iQone, iQvia, Kymera, Lilly, Medac, Sanofi, Target Bio Science and UCB, grants and personal fees for consultancy and lectures from Bayer and Boehringer Ingelheim, personal fees for interviewing from Catenion, grants from Competitive Drug Development International Ltd, personal fees for consultancy and lectures from Medscape, MSD, Pfizer and Roche, grants and personal fees for consultancy from Mitsubishi Tanabe Pharma, personal fees for lectures from Novartis, outside the submitted work; and has a patent mir-29 for the treatment of systemic sclerosis issued (US8247389, EP2331143).
Conflict of interest: D. Rosenberg is an employee of and hold shares in Johnson and Johnson.
Conflict of interest: M. Doelberg is an employee of Actelion Pharmaceuticals Ltd.
Conflict of interest: S. Roux is a former employee of Actelion Pharmaceuticals Ltd.
Conflict of interest: O. Nayler is a former employee and former stock owner of Actelion Pharmaceuticals Ltd, and current employee and stock owner of Idorsia Pharmaceuticals Ltd.
Conflict of interest: A. Lawrie reports grants from the British Heart Foundation and Medical Research Council, grants, personal fees and other (conference attendance and travel) from Actelion Pharmaceuticals, grants and personal fees from GlaxoSmithKline, outside the submitted work.
Support statement: The work of Y. Bauer, O. Nayler and S. de Bernard was funded by Actelion Pharmaceuticals Ltd. P. Hickey was funded by a Donald Health Clinical Research Training Fellowship funded in partnership between Actelion Pharmaceuticals, Sheffield Teaching Hospitals Foundation NHS Trust and the University of Sheffield, and A. Lawrie was funded by British Heart Foundation (BHF) Senior Basic Science Research Fellowships (FS/13/48/30453 and FS/18/52/33808). Recruitment and collection of samples to The Sheffield Teaching Hospitals Observational Study of Patients with Pulmonary Hypertension, Cardiovascular and Lung Disease (STH-ObS) was supported by BHF PG/11/116/29288 and the Sheffield NIHR Clinical Research Facility. The views expressed in this manuscript are those of the authors and not necessarily those of Actelion Pharmaceuticals Ltd, Myriad RBM, the BHF, the NHS, the NIHR or the Dept of Health. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received July 6, 2020.
- Accepted November 17, 2020.
- Copyright ©ERS 2021
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.