A primary goal of scale development is to create a valid measure of an underlying construct [1]. In the context of health, outcomes are measured to assist decision making in patients’ clinical management. Outcome measures can be used to predict which patients will benefit from a particular intervention and to document where the patient improves or declines after time or after an intervention is applied [2].
The development of a high-quality instrument involves multiple important steps. The first step is to develop a precise and detailed notion of the target construct [1]. Once the construct of interest is clearly defined, the next step is to create an item pool. The fundamental goal of this process is to systematically sample all contents that may be potentially relevant to the target construct. The logic being that psychometric analysis can identify poor quality items that should be removed from the scale, but it cannot identify items that should have been included but were not [1]. After the initial item pool is complete, these items need to be pilot tested and any items that prove to have practical limitations removed. The items are then tested in a larger sample population. Analysis of the clinimetric properties of the data is then undertaken, assessing validity and reliability.
Two currently available methods for assessing instrument unidimensionality are Rasch analysis, based on a modern test theory approach, and factor analysis, based on a classical test theory approach. The advantages of Rasch analysis have been well documented in the literature [3].
The Rasch model was developed by Georg Rasch for the investigation of reading ability in 1952 [4]. The Rasch model is a probabilistic model that states that an item response is a result of an interaction between person ability (e.g., level of mobility) and item difficulty (e.g., difficulty of mobility task) [5]. If data fit the model, the scale is defined as being unidimensional.
Alternatively, if data do not fit the model, this can be for a range of reasons. A common cause is that instrument items may be measuring another construct, and Rasch analysis allows these items to be identified. When scales are multidimensional, summing of item scores may cause misleading assumptions to be made. The Barthel Index, for example, is a multidimensional scale, as Rasch analysis has identified that both mobility and continence items exist [6].
Rasch analysis allows estimation of the intervals between ordinal items. Fitting of data to the Rasch model places item and person parameters on the same logit scale, providing a linear transformation of the raw score [7] facilitating interpretation of change scores. Some ordinal scales approximate interval scales as the relationship between the raw score and the Rasch-converted measure is almost linear, but this relationship often deteriorates toward the extremes of the scales [8]. The clinical implication of this being that it can be more or less difficult to achieve a change in score on an ordinal scale depending on which part of the scale spectrum the person commenced. Therefore, another advantage of Rasch analysis is that it provides clinicians with a more accurate level of measurement.
Rasch analysis also assists in the development and refinement of scales by identifying the persons with ability located above the hardest item or below the easiest item on the logit scale (i.e., identification of floor and ceiling effects). This is important as floor and ceiling effects are common in instruments that measure mobility [9]. In addition, the Rasch item hierarchy can also identify items of similar difficulty (item clustering), which can then be removed. Item removal, however, must be approached with caution as argued by Bohlig et al. [10].
Rasch analysis also facilitates the assessment of differential item functioning (DIF). DIF occurs when persons of the same ability have items that operate differently based on another variable, such as age or gender. Assessment of DIF is important as it improves generalizability of the instrument by testing that item response patterns are similar across, for example, different genders, age groups, or times of assessment. For instance, if men and women respond systematically differently on a particular item, this influences the ability to compare the total scores on the construct of interest across genders. Although the ability to assess DIF is not unique to only Rasch analysis, it is nevertheless a very useful feature with important clinical implications.
Rasch analysis also facilitates the investigation of item thresholds. In the RUMM2020 software (RUMM Laboratory, Perth, Western Australia, Australia) [11], for example, item thresholds exist where the probability of an item response category is equal to another for a particular level of person ability [5]. If the probability of each item response category is not in the expected order, this results in a disordered threshold. For example, the response options of “unable,” “assistance,” “supervision,” or “independent” are common for walking items. There may be, for example, no person ability for which it is most likely that a person required “supervision.” Therefore, the “supervision” category can be identified as redundant and may be removed or combined with another response option. Instrument and item misfit can be caused by the existence of disordered thresholds [5].
Rasch analysis provides a method for obtaining instrument construct validity. Measuring the ability of a person on a construct such as mobility is difficult as it can only be inferred by observing actions that are considered representative of the construct [12]. Although three-dimensional gait analysis is often considered the gold standard for measuring mobility [13], it has obvious limitations such as access to a gait laboratory and inability to use it in everyday clinical practice for assessment of mobility. Therefore, Rasch has a role in developing high-quality mobility outcome measures.
It has been suggested that mobility is gained and lost in a hierarchical fashion [14], and therefore, this construct was considered well suited to Rasch analysis. Mobility is also a fundamentally important construct for the clinical practice of physiotherapists and is an important health indicator.
Although there has previously been a systematic review of the application of Rasch in the rehabilitation outcome measures conducted [12], there has not been a review of the ways Rasch analysis has been applied in developing or refining health instruments. Therefore, the aims of this review were to identify the frequency that Rasch analysis had been used in health instrument development or refinement and to identify the characteristics of Rasch application in the development or refinement of mobility scales.