Comparing methods of measurement: why plotting difference against standard method is misleading

doi:10.1016/S0140-6736(95)91748-9

The Lancet

Volume 346, Issue 8982, 21 October 1995, Pages 1085-1087

https://doi.org/10.1016/S0140-6736(95)91748-9 Get rights and content

Abstract

Summary

When comparing a new method of measurement with a standard method, one of the things we want to know is whether the difference between the measurements by the two methods is related to the magnitude of the measurement. A plot of the difference against the standard measurement is sometimes suggested, but this will always appear to show a relation between difference and magnitude when there is none. A plot of the difference against the average of the standard and new measurements is unlikely to mislead in this way. We show this theoretically and by a practical example.

References (8)

Dg Altman et al.
Measurement in medicine: the analysis of method comparison studies
Statistician
(1983)
Jm Bland et al.
Statistical methods for assessing agreement between two methods of clinical measurement
Lancet
(1986)
A. Close et al.
Finger systolic pressure: its use in screening for hypertension and monitoring
BMJ
(1986)

There are more references available in the full text version of this article.

Cited by (2035)

Multidimensional morphological analysis of live sperm based on multiple-target tracking
2024, Computational and Structural Biotechnology Journal
Manual semen evaluation methods are subjective and time-consuming. In this study, a deep learning algorithmic framework was designed to enable non-invasive multidimensional morphological analysis of live sperm in motion, improve current clinical sperm morphology testing methods, and significantly contribute to the advancement of assisted reproductive technologies. We improved the FairMOT tracking algorithm by incorporating the distance and angle of the same sperm head movement in adjacent frames, as well as the head target detection frame IOU value, into the cost function of the Hungarian matching algorithm. For sperm morphology, we used the BlendMask segmentation method to segment individual sperm. SegNet was used to separate the head, midpiece, and principal piece comments from each sperm. Experienced in vivo sperm physicians confirmed a morphological accuracy percentage of 90.82%. A total of 1272 samples were collected from multiple tertiary hospitals for validation of the system, which were also evaluated by physicians. The results of our system were highly consistent with those of manual microscopy. This study realized the automated detection of progressive motility and morphology of sperm simultaneously, which is crucial for selection of morphologically normal and motile sperm for intracytoplasmic sperm injection.
Accuracy of two LiDAR-based augmented reality apps in breast height diameter measurement
2024, Ecological Informatics
Accurate measurement of the diameter at the breast height (DBH) is essential in forestry-related science and practice, but its measurement is currently done by labor-intensive tools such as calipers or devices designed to measure the girth. With the development in light detection and ranging (LiDAR) and augmented reality (AR) technologies, and their integration in low-cost mobile platforms, affordable proximal measurement applications were released on the market. This study examines the accuracy in DBH measurement of Arboreal Forest (hereafter DA) and Measure (hereafter DM) apps, by taking as a reference the measurements done by an accurate forestry caliper (hereafter DC). A number of 615 trees were considered, of which 395 were broadleaved (DBH between 10 and 73 cm, averaging 39.73 ± 9.91 cm) and 220 were coniferous (DBH between 25 and 89 cm, averaging 52.47 ± 12.81 cm), and measurements were taken under sunny, cloudy and rainy weather. Comparison was done in terms of agreement (Bland and Altman's method), dependence (least square simple ordinary and regression through origin), correlation (Spearman's, Pearson's and Kendall's tests), and difference (mean absolute error - MAE, root mean squared error - RMSE, and bias - BIAS). Besides a close-to-perfect fit, strong association in data, and a good degree of agreement, the results indicated the presence of centimeter-level differences when comparing DM against DC (MAE = 0.715 cm, RMSE = 0.879 cm, BIAS = 0.333 cm) and DA against DC (MAE = 0.953 cm, RMSE = 1.246 cm, BIAS = -0.108 cm). When comparing DA against DM the differences were slightly higher (MAE = 1.175 cm, RMSE = 1.531 cm, BIAS = –0.446 cm). The magnitude in differences found is rather caused by the application used and not by the environmental conditions. Further studies may consider larger data samples to provide better estimates as well as checking the limits in measurement capabilities of these apps.
A highly accurate and semi-automated method for quantifying spherical microplastics based on digital slide scanners and image processing
2024, Environmental Research
Microplastics (MPs), the emerging pollutants appeared in water environment, have grabbed significant attention from researchers. The quantitative method of spherical MPs is the premise and key for the study of MPs in laboratory researches. However, the manual counting is time-consuming, and the existing semi-automated analysis lacked of robustness. In this study, a highly accurate quantification method for spherical MPs, called VS120-MC was proposed. VS120-MC consisted of the digital slide scanner VS120 and the MPs image processing software, MPs-Counter. The full-area scanning photography was employed to fundamentally avoid the error caused by random or partition sampling modes. To accomplish high-performance batch recognition, the Weak-Circle Elimination Algorithm (WEA) and the Variable Coefficient Threshold (VCT) was developed. Finally, lower than 0.6% recognition error rate of simulated images with different aggregated indices was achieved by MPs-Counter with fast processing speed (about 2 s/image). The smallest size for VS120-MC to detect was 1 μm. And the applicability of VS120-MC in real water body was investigated. The measured value of 1 μm spherical MPs in ultra-pure water and two kinds of polluted water after digestion showed a good linear relationship with the Manual measurements (R² = 0.982,0.987 and 0.978, respectively). For 10 μm spherical MPs, R² reached 0.988 for ultra-pure water and 0.984 for both of the polluted water. MPs-Counter also showed robustness when using the same set of parameters processing the images with different conditions. Overall, VS120-MC eliminated the error caused by traditional photography and realized an accurate, efficient, stable image processing tool, providing a reliable alternative for the quantification of spherical MPs.
Temporal Muscle Thickness: A Practical Approximation for Assessing Muscle Mass in Older Adults
2024, Journal of the American Medical Directors Association
Ongoing research has evidenced the importance of muscle measurement in predicting adverse outcomes. Measurement of other muscles is promising in current research. This study aimed to determine the correlation between temporal muscle thickness (TMT) and appendicular lean soft tissue (ALSTI) in older adults.
Cross-sectional study.
Single cohort gathered in Gothenburg, Sweden, consisting of individuals born in 1944 (n = 1203).
We studied 657 magnetic resonance images to measure TMT. Comparisons of TMT with dual-energy X-ray absorptiometry ALSTI (kg/m²) as a reference standard were performed. Finally, TMT associations with cognition evaluated using the Mini-Mental State Examination (MMSE), gait speed, and handgrip strength were explored with linear regressions.
The correlation between TMT and ALSTI was weak yet significant (r = 0.277, P < .001). TMT exhibited significant associations with MMSE (estimate = 0.168, P = .002), gait speed (estimate = 1.795, P < .001), and ALSTI (estimate = 0.508, P < .001). These associations varied when analyzed by sex.
In women, TMT was significantly associated with gait speed (estimate = 1.857, P = .005) and MMSE (estimate = 0.223, P = .003). In men, TMT scores were significantly correlated with ALSTI scores (estimate = 0.571, P < .001).
Repurposing head images can be an accessible alternative to detect muscle mass and ultimately detect sarcopenia. These studies have the potential to trigger interventions or further evaluation to improve the muscle and overall health of individuals. However, additional research is warranted before translating these findings into clinical practice.
2023 International Consensus on Cardiopulmonary Resuscitation and Emergency Cardiovascular Care Science With Treatment Recommendations: Summary From the Basic Life Support; Advanced Life Support; Pediatric Life Support; Neonatal Life Support; Education, Implementation, and Teams; and First Aid Task Forces
2024, Resuscitation
The International Liaison Committee on Resuscitation engages in a continuous review of new, peer-reviewed, published cardiopulmonary resuscitation and first aid science. Draft Consensus on Science With Treatment Recommendations are posted online throughout the year, and this annual summary provides more concise versions of the final Consensus on Science With Treatment Recommendations from all task forces for the year. Topics addressed by systematic reviews this year include resuscitation of cardiac arrest from drowning, extracorporeal cardiopulmonary resuscitation for adults and children, calcium during cardiac arrest, double sequential defibrillation, neuroprognostication after cardiac arrest for adults and children, maintaining normal temperature after preterm birth, heart rate monitoring methods for diagnostics in neonates, detection of exhaled carbon dioxide in neonates, family presence during resuscitation of adults, and a stepwise approach to resuscitation skills training. Members from 6 International Liaison Committee on Resuscitation task forces have assessed, discussed, and debated the quality of the evidence, using Grading of Recommendations Assessment, Development, and Evaluation criteria, and their statements include consensus treatment recommendations. Insights into the deliberations of the task forces are provided in the Justification and Evidence-to-Decision Framework Highlights sections. In addition, the task forces list priority knowledge gaps for further research. Additional topics are addressed with scoping reviews and evidence updates.
PREDICT-GTN 2: Two-factor streamlined models match FIGO performance in gestational trophoblastic neoplasia
2024, Gynecologic Oncology
The International Federation of Gynecology and Obstetrics (FIGO) scoring system uses the sum of eight risk-factors to predict single-agent chemotherapy resistance in Gestational Trophoblastic Neoplasia (GTN). To improve ease of use, this study aimed to generate: (i) streamlined models that match FIGO performance and; (ii) visual-decision aids (nomograms) for guiding management.
Using training (n = 4191) and validation datasets (n = 144) of GTN patients from two UK specialist centres, logistic regression analysis generated two-factor models for cross-validation and exploration. Performance was assessed using true and false positive rate, positive and negative predictive values, Bland-Altman calibration plots, receiver operating characteristic (ROC) curves, decision-curve analysis (DCA) and contingency tables. Nomograms were developed from estimated model parameters and performance cross-checked upon the training and validation dataset.
Three streamlined, two-factor models were selected for analysis: (i) M1, pre-treatment hCG + history of failed chemotherapy; (ii) M2, pre-treatment hCG + site of metastases and; (iii) M3, pre-treatment hCG + number of metastases. Using both training and validation datasets, these models showed no evidence of significant discordance from FIGO (McNemar's test p > 0.78) or across a range of performance parameters. This behaviour was maintained when applying algorithms simulating the logic of the nomograms.
Our streamlined models could be used to assess GTN patients and replace FIGO, statistically matching performance. Given the importance of imaging parameters in guiding treatment, M2 and M3 are favoured for ongoing validation. In resource-poor countries, where access to specialist centres is problematic, M1 could be pragmatically implemented. Further prospective validation on a larger cohort is recommended.

View all citing articles on Scopus

View full text

Comparing methods of measurement: why plotting difference against standard method is misleading

Abstract

Measurement in medicine: the analysis of method comparison studies

Statistician

Statistical methods for assessing agreement between two methods of clinical measurement

Lancet

Finger systolic pressure: its use in screening for hypertension and monitoring

BMJ