Abstract
The British Thoracic Society and American College of Chest Physician guidelines outline criteria for investigating patients for lung cancer surgery. However, the guidelines are based on relatively old studies. Therefore, the relationship between pulmonary function test results and surgical outcome were studied prospectively in a large cohort of lung cancer patients.
From January 2001 to December 2003, 110 patients underwent surgery for lung cancer. All underwent full lung function testing in order to predict post-operative lung function.
The hospital mortality rate was 3% and major complication rate 22%. There was poor overall outcome in 13%. Mean pre-operative lung function values were: forced expiratory volume in one second (FEV1) 2.0 L (79.4% of the predicted value), and carbon monoxide diffusing capacity of the lung (DL,CO) 73.6% pred. The mean post-operative lung function values were: FEV1 1.4 L (55.6% pred), and DL,CO 51.3% pred. All lung function values were better predictors of poor surgical outcome when expressed as a percentage of the predicted value. Using a threshold of pre-operative FEV1 of 47% pred resulted in the most useful positive and negative predictive probabilities, 0.90 and 0.67, respectively.
Lung function values expressed as a percentage of the predicted value are more useful predictors of post-operative outcome than absolute values. The threshold of predicted forced expiratory volume in one second for surgical intervention could be lower (45–50% pred) than is currently accepted without increased mortality.
Surgical resection remains the treatment of choice for anatomically resectable nonsmall cell lung cancer 1, 2, offering the best prospect of long-term survival. However, many patients have coexisting chronic airflow limitation 3, which is associated with an increased risk from surgery. Loss of lung tissue may grossly impair post-operative ventilatory function in such patients, predisposing to cardiopulmonary complications, including death. The British Thoracic Society (BTS) and American College of Chest Physician guidelines 4, 5 outline criteria for investigating patients with borderline lung function. However, many of the studies on which the guidelines are based are relatively old, and often based on predominantly male patients who were significantly younger than patients currently being considered for surgery.
The latest BTS guidelines suggest that further investigation is unnecessary if a patient has a forced expiratory volume in one second (FEV1) of >2.0 L for pneumonectomy or >1.5 L for lobectomy. This is because studies have shown a mortality rate of <5% using these criteria. However, these figures are based mainly on studies involving males, and it is likely that lower values would be more appropriate for females. It is possible that a figure based on a value expressed as a percentage of the predicted value would be better still, since this would also take into account the patient's age, sex and height. At present, if spirometry shows the FEV1 to be less than the values given above, then the BTS guidelines recommend a sequence of investigations, consisting of ventilation/perfusion (V′/Q′) scintigraphy in order to estimate the post-operative FEV1 (ppo FEV1) and carbon monoxide diffusing capacity of the lung (DL,CO; ppo DL,CO) as a percentage of the predicted value. This is followed by an exercise test, either a shuttle walking test or a formal cardiopulmonary exercise test.
Many previous studies have shown predictive values for specific cut-off levels of FEV1 and DL,CO 6–11. A study performed by Pierce et al. 12 suggested that the product of ppo FEV1 and ppo DL,CO (both % pred) was the most useful predictor of survival from surgery. This value, termed the predicted post-operative product (PPP), incorporates baseline assessments of FEV1, DL,CO, perfusion scans and the proportion of lung to be resected in a single index. The benefit of the PPP is that it prevents patients being turned down on the basis of a single criterion. It would also be potentially applicable to both males and females, since it depends on the use of percentage predicted rather than absolute values for FEV1 and DL,CO. However, the original study by Pierce et al. 12 was relatively small (54 patients) and only included males. Further evaluation in a larger study including females was necessary in order to determine the wider applicability of the PPP, and to confirm the specific relationships between the PPP and probability of survival from surgery.
Therefore, the relationship between pulmonary function test results and surgical outcome were prospectively studied in a reasonably large cohort of currently typical resectable lung cancer patients at the cardiothoracic centre of Papworth Hospital (Cambridge, UK). In particular, the aim was to establish which of the following measurements best predicted poor surgical outcome: 1) pre-operative absolute FEV1; 2) pre-operative FEV1 (% pred); 3) pre-operative DL,CO (% pred); 4) ppo absolute FEV1; 5) ppo FEV1 (% pred); 6) ppo DL,CO (% pred); and 7) PPP. The aim was also to establish the lowest safe FEV1 and DL,CO for lung cancer operation.
MATERIALS AND METHODS
Study subjects
Between January 2001 and December 2003, 150 patients were assessed for curative surgery for lung cancer at Papworth Hospital. Of these, 32 were not referred for surgery and were excluded from the study. A further eight patients were deemed not resectable at the time of exploration (open and closed) and were not studied further. The remaining 110 patients made up the study sample. Written informed consent was given by all patients and the Huntington Ethics Committee (Fulbourn, UK) approved the study.
All patients underwent full lung function testing. Thirty-two (29%) showed borderline lung function (FEV1 <1.5 L for lobectomy and <2.0 L for pneumonectomy).
The patients also underwent perfusion scintigraphy in order to predict post-operative lung function. Formal cardiopulmonary exercise tests were performed on 105 patients in order to determine maximal oxygen uptake (V′O2,max) and, thus, predict operability. In order to determine the lowest safe operable lung function values, a lower threshold value of V′O2,max of 10 mL·kg body weight−1·min−1 was used. However, the final decision as regards operability was made by the surgeon.
Outcome assessment
The duration of hospital stay and outcome of surgery, including complication and mortality rates, were documented. The in-patient hospital stay was divided into periods of time in intensive care and time on the surgical ward. A complicated post-operative course was defined by the occurrence of any of the following: post-operative death, nonfatal myocardial infarction, heart failure, renal failure, respiratory failure, pulmonary embolism, septicaemia or pneumonia. However, since not all of these complications are irreversible or untreatable, not all of them would necessarily have precluded surgery if predicted in advance. Therefore, for the purpose of this analysis, in order to assess the ability of lung function test results to predict poor outcome, a more restricted definition of poor outcome, defined as post-operative death or respiratory failure, was used to identify patients who ideally should have been identified as high risk prior to surgery.
Full lung function test protocol
Spirometric assessment of patients was performed in the pulmonary function laboratory of Papworth Hospital according to Association of Respiratory Technology and Physiology (1999) guidelines 13. Patient height was formally assessed and compared with standard normal values 14.
FEV1 and FVC
The spirometer and recording system were calibrated daily. Measurements recorded for analysis were the best of three reproducible attempts. FEV1 was calculated from a record of forced vital capacity (FVC) performed on a wedge-bellows 12-s Vitalograph spirometer (Vitalograph, Ennis, Ireland).
Carbon monoxide diffusing capacity of the lung
The DL,CO and diffusion coefficient (DL,CO/alveolar volume) were estimated using the carbon monoxide single-breath technique.
After maximal expiration, the patients were asked to inhale quickly and as deeply as possible. The patient had to achieve ≥90% of vital capacity and inspire in <2 s. If the patient had an obstruction, the inhalation time was increased to 4 s. The patient held their breath for 10 s without straining. The test was performed at least twice with a time delay of 4 min between each attempt. At least two technically acceptable test results were obtained and the mean reported.
Lung volume
The patient was seated in a whole-body plethysmograph box (Masterscreen; Viasys Healthcare, Warwick, UK) for performing a slow vital capacity manoeuvre by exhaling until residual volume and then inspiring fully to total lung capacity before returning to normal tidal breathing. The test was repeated until three technically acceptable traces were obtained, and the mean of these was reported.
Cardiopulmonary exercise test (treadmill maximal oxygen uptake test)
The cardiopulmonary exercise test was performed using the Oxycon Pro (Viasys Healthcare) exercise system with the standardised exponential exercise protocol described by Northridge et al. 15, adapted to allow an additional 1-min warm-up period. Measurements made during this test included full metabolic gas exchange and ECG. The test lasted for a maximum of 20 min, of which the patient exercised for 16 min (4 min are taken up with baseline measurements and recovery). The patient was required to exercise for as long as possible until symptom restricted. The ECG was monitored, and oxygen and carbon dioxide in the expired air measured.
Quantitative perfusion scan protocols
All scintigraphy was performed in Papworth Hospital Nuclear Medicine Dept. An Elscint Apex SP4 single-headed camera (Elscint, Haifa, Israel) was used for the study. Patients were imaged in the erect position. The tests were supervised by qualified nuclear medicine technicians and reported by the chest radiologists.
Technetium-99m macroaggregated albumin (75 MBq) was used for the perfusion studies. An intravenous bolus was injected slowly, during which time the patient breathed deeply. Concurrent perfusion analysis using images for all eight views were interpreted subjectively. Relative perfusion analysis was performed on anterior and posterior images. The scan was reported for regions of interest with regard to relative perfusion of the right and left lungs divided into upper, middle and lower zones.
Predicted post-operative FEV1
The following equation was used to estimate the predicted post-lobectomy FEV1 using perfusion scan results 16: predicted ppo FEV1 = pre-operative FEV1(1–functional contribution of perfusion of the region to be resected). In order to estimate the predicted post-pneumonectomy FEV1, the equation was as follows: FEV1 = pre-operative FEV1(1–percentage perfusion of lung to be resected). The PPP was also calculated 12.
Thoracotomy
Standard posteriolateral thoracotomy was performed by one of three dedicated cardiothoracic surgeons. The routine surgical and anaesthetic procedure included single-lung ventilation using a double-lumen endobronchial tube during the operation. Standard post-operative physiotherapy was performed, involving breathing exercises and early ambulation.
Statistical analysis
Lung function measurements are summarised as mean±sd for each group. Associations between measures of lung function and outcome groups were assessed using an unpaired t-test. These associations were adjusted for operation type (pneumonectomy/lobectomy) using ANOVA, with outcome as a fixed effect and operation type as a covariate. Receiver-operating characteristic curves were plotted in order to assess the predictive value of lung function test results for poor outcome. Positive and negative predictive values were calculated for selected thresholds and using study estimates of the prevalence of poor outcome.
RESULTS
All 110 patients completed the study. Of the 110 patients in the study group, 44 (40%) were female and 66 (60%) male. Their mean±sd age was 69±8 yrs (range 42–85 yrs). Thirty-six patients (33%) underwent pneumonectomy and the remaining patients underwent single lobectomy (59%), bilateral lobectomy (4%) or wedge resection (4%). The majority of patients had squamous cell carcinoma (44%), adenocarcinoma (34%) or nonsmall cell lung cancer (14%). Tumours were staged as follows: 1A (40%), 1B (23%), 2A (4%), 2B (17%), 3A (14%), and 3B (2%).
The baseline pulmonary function test results of the study group are recorded in table 1⇓. The lowest safe operated pre-operative FEV1 and DL,CO were 1 L (37% pred) and 33% pred, respectively. The lowest operated post-operative values were ppo FEV1 0.6 L (22% pred), ppo DL,CO 21% pred and PPP 560. The mean V′O2,max was 18.3 mL·kg body weight−1·min−1 (range 10.2–34.5 mL·kg body weight−1·min−1).
Four (3%) patients died within 30 days of lung cancer surgery (1% lobectomy and 8% pneumonectomy). Twenty-four (22%) patients had a major complication, i.e. overall, 28 (25%) patients were considered to have had a complicated post-operative course. The major complications are recorded in table 2⇓. The most common major complications were pneumonia (19%) and respiratory failure (10%).
Table 3⇓ summarises the differences in lung function for patients with and without a complicated post-operative course. Pre-operative FEV1, pre-operative DL,CO, ppo DL,CO and ppo FEV1, all expressed as a percentage of the predicted value, were significantly associated with a complicated post-operative course. These results did not change substantially when adjusted for operation type (data not shown).
Fourteen (13%) patients had a poor operative outcome defined by 30-day mortality or respiratory failure. Table 4⇓ summarises the differences in lung function for patients with satisfactory and poor outcomes. Patients with a poor outcome had a significantly lower pre-operative FEV1 (% pred), but this was not evident from absolute FEV1 measurements. DL,CO measurements were significantly lower for patients with a poor outcome, and ppo DL,CO (% pred), ppo FEV1 (% pred) and PPP were significantly associated with outcome.
Figure 1⇓ shows the receiver-operating characteristic curves. The diagonal line indicates an area under the curve of 0.5, equivalent to the measurement having no predictive value. The area under the curve is greater for FEV1 expressed as a percentage of the predicted value than as an absolute value pre-operatively (0.73 versus 0.59), or as a predicted post-operative value by V’/Q’ scan (0.75 versus 0.59; fig. 1⇓). As FEV1 measured in litres has a lesser area under the curve, a threshold was not assessed to predict the outcome further. If pre-operative FEV1 is used to predict the outcome, and assuming a threshold of 65% for defining high risk for lung cancer surgery, the positive and negative predictive values are 93 and 29%, respectively. That is, the probability of a satisfactory outcome if FEV1 ≥65% is 0.93 and the probability of a poor outcome if FEV1 <65% is 0.29. The corresponding positive and negative predictive probabilities for a ppo FEV1 of <40% were 90 and 33%, respectively. If a new threshold of pre-operative FEV1 of 47% is used to define a high risk for lung cancer surgery, the probability of a satisfactory outcome if FEV1 >47% is 0.90 (94/104) and the probability of a poor outcome if FEV1 ≤47% is 0.67 (4/6).
The area under the curve for pre-operative DL,CO (% pred) was 0.70 and for ppo DL,CO (% pred) was 0.77 (fig. 1⇑). For pre-operative DL,CO (% pred), the probability of a satisfactory outcome if DL,CO ≥45% is 0.90 (93/103) and the probability of a poor outcome if DL,CO <45% is 0.57 (4/7). Neither ppo FEV1 nor ppo DL,CO is better than the pre-operative value in predicting outcome (data not shown).
The PPP exhibited the greatest area under the curve at 0.79 (fig. 1⇑). If PPP is used to predict outcome, and assuming a threshold of 1,600 for defining a high risk for lung cancer surgery, the probability of a satisfactory outcome if PPP ≥1,600 is 0.90 (87/96) and the probability of a poor outcome if PPP <1,600 is 0.35 (5/14). A new threshold of 760 results in the probability of a satisfactory outcome if PPP ≥760 of 0.89 (95/106) and of a poor outcome if PPP <760 of 0.75 (3/4).
DISCUSSION
Surgical resection remains the curative treatment of choice in patients with lung cancer. However, it has proved difficult to assess the lower limit of surgical tolerance. Several attempts have been made to determine these physiological limits 9, 17. Previous studies have shown predictive values for specific cut-off levels of baseline test results. Some have reported absolute values and others percentage predicted values. Three studies suggested that, if the pre-operative FEV1 is >1.5 L for lobectomy and >2.0 L for pneumonectomy, the mortality rates should be <5% 6, 18, 19. Alternatively, Putnam et al. 20 suggested that a pre-operative FEV1 of <65% pred is a significant risk factor.
There is substantial evidence that operative risk is related to absolute ppo FEV1 6–8, 12, 21–23. As a result of the findings of these investigators, it was recommended that a ppo FEV1 of >0.8 L is used as the cut-off for surgery. Others have found it more useful to use percentage predicted FEV1, recommending a threshold of >50% pred for FEV1 10 or >40% pred for ppo FEV1 9. For the latter, a value of <40% was associated with 50% mortality. They also found a ppo DL,CO of <40% pred was a good predictor of post-operative death. Another study recommended a lower cut-off threshold of 34% pred for ppo FEV1 20.
In the present study, it was found that the absolute values (pre-operative FEV1, FVC and ppo FEV1) did not predict surgical outcome defined by 30-day mortality and post-operative respiratory failure. In contrast, all the values corrected for age and sex (i.e. pre-operative FEV1, pre-operative DL,CO, ppo FEV1, ppo DL,CO and PPP, all expressed as a percentage of the normal predicted value) correlated significantly with both complicated post-operative course and poor surgical outcome. These findings reflect the fact that previous studies included a more homogeneous surgical population, consisting almost entirely of male patients of relatively young age. Since current surgical lung cancer patients are more likely to be elderly and/or female, it is no longer appropriate to use absolute lung function values. As such, future guidelines should adopt percentage predicted rather than absolute values.
In the present patients, using an FEV1 threshold of 65% pred, there was good positive prediction in that 93% of patients with values above this threshold had a satisfactory outcome from surgery. However, 71% of patients with values below this threshold also had a satisfactory outcome, suggesting that it may be too high. Similar predictive values were found for ppo FEV1 (% pred). Therefore, lower thresholds were established for pre-operative FEV1 (% pred), pre-operative DL,CO (% pred) and PPP that were optimum for the present sample. The thresholds of 47% pred for pre-operative FEV1, 45% pred for pre-operative DL,CO and 760 for PPP were optimum. It should be emphasised that these thresholds were established using retrospective analysis. It was found that the lowest safe pre-operative FEV1 was lower than previously reported. The absolute value of pre-operative FEV1 in the present sample was 1 L (37% pred) and for pre-operative DL,CO was 33% pred. The lowest post-operative values (ppo FEV1 0.6 L (22% pred), ppo DL,CO 21% pred and PPP 560) were also much lower than the recommended levels. Despite this, the present 30-day mortality rate (3% for all thoracotomy, 1% for lobectomy and 8% for pneumonectomy) was lower than the BTS guidelines for lobectomy (4%), and in line with BTS guidelines for pneumonectomy (8%) and the open and closed rate (5–10%). This reflects recent improvements in staging and surgical techniques and perioperative care, and results should be interpreted in this context. The present authors would not routinely recommend such low cut-off values without patients undergoing extensive fitness assessment, discussion at multidisciplinary meetings and surgeons that have the necessary expertise for dealing with difficult cases.
The present study has limitations in that, although it was a relatively large series, some of the negative predictive values were based on low numbers and were measured relatively imprecisely. Use of these thresholds should be tested in further larger series. In addition, the definition of poor outcome was limited to short-term post-surgery indices, and longer-term follow-up would be needed to confirm the longer-term predictive value of these cut-off values of lung function.
Since it was found that, with vigorous investigation, lower thresholds could be safely applied, it may be necessary to reconsider the surgical assessment guidelines for lung cancer patients. This would require close liaison between the referring chest physician and operating surgeons. If the present recommendations are confirmed in longer follow-up studies, such measures would hopefully improve surgical operation rates, cure rates and, hence, lung cancer mortality. For these reasons, using lower lung function thresholds would be beneficial.
In conclusion, the present authors recommend that percentage predicted rather than absolute lung function values be used in assessing patients for lung cancer surgery. The lower threshold of forced expiratory volume in one second for surgical intervention could be lower than is currently suggested, namely 45–50% of the predicted value, without increased mortality.
- Received June 28, 2004.
- Accepted November 26, 2004.
- © ERS Journals Ltd