Original Article
Cross-Validation of Item Selection and Scoring for the SF-12 Health Survey in Nine Countries: Results from the IQOLA Project

https://doi.org/10.1016/S0895-4356(98)00109-7Get rights and content

Abstract

Data from general population surveys (n = 1483 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to cross-validate the selection of questionnaire items for the SF-12 Health Survey and scoring algorithms for 12-item physical and mental component summary measures. In each country, multiple regression methods were used to select 12 SF-36 items that best reproduced the physical and mental health summary scores for the SF-36 Health Survey. Summary scores then were estimated with 12 items in three ways: using standard (U.S.-derived) SF-12 items and scoring algorithms; standard items and country-specific scoring; and country-specific sets of 12 items and scoring. Replication of the 36-item summary measures by the 12-item summary measures was then evaluated through comparison of mean scores and the strength of product-moment correlations.

Product-moment correlations between SF-36 summary measures and SF-12 summary measures (standard and country-specific) were very high, ranging from 0.94–0.96 and 0.94–0.97 for the physical and mental summary measures, respectively. Mean 36-item summary measures and comparable 12-item summary measures were within 0.0 to 1.5 points (median = 0.5 points) in each country and were comparable across age groups.

Because of the high degree of correspondence between summary physical and mental health measures estimated using the SF-12 and SF-36, it appears that the SF-12 will prove to be a practical alternative to the SF-36 in these countries, for purposes of large group comparisons in which the focus is on overall physical and mental health outcomes.

Introduction

Although the 36-item SF-36 Health Survey is a short-form measure, for some applications even a questionnaire with 36 questions is too lengthy. Large general population surveys may only have room for one page of questions about health. Questionnaires that include disease-specific measures may not have room for a generic measure of health status such as the SF-36. In addition, although the SF-36 can be completed in a relatively short amount of time (5 to 10 minutes on average), this may be too great a burden for some respondents. Therefore, use of a shorter form than the SF-36 is warranted in a number of instances.

Development of two summary measures from the SF-36 1, 2, 3, 4, 5 suggested that it might be possible to develop a shorter survey which would reproduce the SF-36 physical and mental health summary measures with fewer items. Because the number of items in a survey is dependent on the number of dimensions for which scores are to be estimated, fewer questions are needed to calculate two summary scores than to calculate eight scale scores. Thus, the SF-12 Health Survey was originally developed in the United States to provide a shorter alternative to the SF-36, for use in large-scale health measurement and monitoring efforts in which a 36-item questionnaire was too lengthy and in which the focus was on overall physical and mental health outcomes 6, 7. The SF-12 contains a subset of 12 items from the SF-36, including one or two items from each of the eight SF-36 scales (Figure 1). Two items are included from the Physical Functioning and Mental Health scales because these scales have been shown to best predict physical and mental health; two items each are also included from both Role Functioning scales, because these are relatively coarse scales. One item each is included from the remaining four scales. Information from all 12 items is used to construct physical and mental component summary measures (PCS-12 and MCS-12).

In the U.S. general population, the SF-12 items explained more than 90% of the variance in the SF-36 physical (PCS-36) and mental (MCS-36) summary measures [6]. In cross-validation with data from the Medical Outcomes Study, the PCS-36 and PCS-12 correlated 0.95 and the MCS-36 and MCS-12 correlated 0.97. Within the U.S. general population, mean PCS-36 and PCS-12 scores were within 1 point across subgroups differing in age and gender, and similar results were found in comparing the MCS-36 and MCS-12 [7]. Expected relationships between the SF-12 summary measures and clinical criteria were verified. Thus, in the United States the SF-12 reproduced the SF-36 summary measures with the same interpretations.

The two-component model of physical and mental health has been replicated using SF-36 data from large general population samples in nine Western European countries studied to date (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) [8]. These findings supported the derivation and testing of SF-36-based physical and mental health summary measures in these countries [9]. In this study, we cross-validate the selection of questionnaire items for the SF-12 in these nine countries and examine how well the SF-12-based summary measures reproduce the SF-36-based summary measures. We also compare the use of country-specific versus standard (U.S.-derived) scoring algorithms for the SF-12 summary measures.

Section snippets

Data

Data come from 10 general population surveys, which have been described in detail elsewhere [10]. In brief, samples were selected to be nationally representative in nine countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, the United Kingdom, and the United States). Data from Sweden were collected through seven mail surveys conducted in various regions of Sweden to provide a broad cross-section of the population [11]. Self-administration of the SF-36 was used in all

Results

The 12 items selected in each European country to empirically reproduce the SF-36 summary measures agreed considerably with the standard SF-12 items selected in the United States (Table 2; verbatim item content is provided elsewhere in this issue [14]). In 91 of 108 instances across the nine countries, the country-specific items were the same as in the standard U.S. SF-12. The same two Physical Functioning items, PF02 (moderate activities) and PF04 (several flights of stairs), were the two best

Discussion

In each of the nine European countries, there were substantial correlations between the summary measures scored from the SF-36 and SF-12 Health Surveys. Correlations were also substantial between scores based on three different estimation methods (standard items and scoring weights; standard items and country-specific scoring weights; and country-specific items and scoring weights). Mean scores were also very comparable across estimation methods. In addition, there was a high degree of

References (17)

There are more references available in the full text version of this article.

Cited by (0)

View full text