- Research article
- Open Access
Is retrospective assessment of health-related quality of life valid?
BMC Musculoskeletal Disorders volume 21, Article number: 415 (2020)
Health-related quality of life (HRQoL) is a commonly used health outcome. For many acute conditions (e.g. fractures), retrospective measurement of HRQoL is necessary to establish pre-morbid health status. However, the validity of retrospective measurement of HRQoL following an intervening significant health event has not been established. The aim of this study was to test the validity of retrospective measurement (recall) of HRQoL by using a test-retest design to measure reliability and agreement between prospective and retrospective patient-reported HRQoL before and after an intervening health event (elective orthopaedic surgery).
Participants were recruited from the pre-admission clinic of a metropolitan hospital. Participants were assessed for their HRQoL using the EQ-5D-5L at two time-points; prospectively at 2 weeks prior to their date of surgery and then retrospectively (recalling their pre-operative health) following elective hip or knee joint replacement surgery. Prospective measurements were compared with retrospective measurements for the five domain scores (nominal data) using intra-class correlation and for the EQ-Index score and EQ-Visual Analogue Scale (VAS) score (continuous data), using Pearson’s correlation. Agreement was tested in continuous variables using Lin’s coefficient of concordance (pc) and Bland-Altman plots.
One hundred seventy-four patients consented to participate. Eighty-eight paired prospective and retrospective scores were collected and there was a median between-test period of 15 days. At a group level, the prospective measurements were similar to the retrospective measurements; the modes and means of the five domain scores were not different and the mean differences (MD) between the scores for EQ-Index (MD = 0.02, on a scale of 0–1) and EQ-VAS (MD = 0.53, on a scale of 1–100) were negligible. However, the correlation of paired scores was varied; the range of domain score correlations was 0.52 to 0.74, the concordance was substantial for the EQ-Index scores (pc = 0.76, 95% CI = 0.66, 0.84) and moderate for the EQ-VAS scores (pc = 0.46, 95% CI = 0.28, 0.61).
Agreement between prospective and retrospective measurements was high at a group level and moderate to substantial at an individual level. Retrospective measurement of HRQoL using the EQ-5D-5L in an orthopaedic clinical context is a valid alternative to using reference data to estimate baseline or pre-morbid health status.
Health related quality of life (HRQoL) is an assessment and expression of subjective well-being that includes emotional, social and physical aspects. HRQoL can be used for health economic purposes and as an outcome measure in the treatment of health conditions. HRQoL is measured at an individual level; it is not directly observable and is only deducible from patients’ responses to questionnaires .
The EuroQol instruments are non-disease specific examples of multi-dimensional tools to measure HRQoL and the most widely applied version is the EQ-5D  with the five-level version (EQ-5D-5L) the most recent version. The EQ-5D-5L produces both categorical (five dimensions) and continuous data (utility index score and visual analogue scale, VAS, score). The five descriptive dimensions are combined according to a scoring algorithm to produce a quantitative index score (EQ-index) that can be compared to normative values. The index scores range from 1 (full health) to 0 (equivalent to death), with scores less than 0 defined as health states worse than death. Population norms for EQ-index scores have been reported for various populations around the world but none are available for Australia . The EQ-5D VAS can be used as an outcome in itself or to help weight the index scores.
The EQ-5D is commonly used and has been widely compared with other instruments and validated in musculoskeletal health-specific contexts. A recent systematic review demonstrated good reliability and validity and moderate responsiveness . Two studies using cohorts of patients being treated for carpal tunnel syndrome found excellent reliability , strong validity and minimal bias for age and gender . The properties of the EQ-5D have also been tested in musculoskeletal health-specific contexts such as proximal humeral fractures , adolescent idiopathic scoliosis , rheumatoid arthritis [8,9,10], lower back surgery  and knee osteoarthritis [12, 13] with similar outcomes reported.
The EQ-5D is only recommended for prospective use. The EuroQol instruments are framed in the present tense and there are no retrospective versions available. In a clinical context, unless the health condition is foreseeable, the clinician or health researcher does not have a baseline measure of the patient’s pre-morbid health state. Normative population-level, non-disease specific data can be used, or retrospective measurement of baseline health status can be measured. However, this is susceptible to recall bias. In a health economics context, the retrospective assessment of health status, gathered for the purpose of estimating healthy life expectancy has been shown to closely approximate estimates based on prospective health information . Numerous studies have investigated the test-retest reliability of the EQ-5D [15,16,17,18,19,20] and other studies have supported the use of retrospective measurement of baseline data following a health event or an intervention [21,22,23], and retrospective measurement in preference to using population norms in determining baseline health status [24, 25]. However, the validity of the retrospective use of the EQ-5D has not been established.
The primary aim of this study was to use a significant health event, elective hip or knee joint replacement surgery, to determine the validity of the retrospective measurement of self-reported health status using the EQ-5D-5L, comparing it to prospective measurement.
Recruitment and data collection
This study was designed as a test-retest reliability study with the test being the prospective (contemporaneous) measure of pre-operative health and the retest being the retrospective (recalled) measure of pre-operative health using EQ-5D, recorded after a major intervening health event. The tests were conducted 2 weeks apart and with elective hip or knee joint replacement surgery occurring in between. The study was conducted at The Sutherland Hospital, Sydney, Australia. Ethics approval was granted by the hospital ethics committee (LNR/17/POWH/384) and permission was granted by EuroQoL for the use of the EQ-5D-5L in this study.
Patients were screened through the elective surgery waitlist and recruited in-person from the pre-admission clinic. Patients were eligible for inclusion if they were (a) aged 18 years and older, and (b) presented for an elective hip or knee arthroplasty (e.g. total hip replacement or, knee replacement including bilateral and unicompartmental knee replacement). Patients were excluded if they were (a) unable to consent because they were cognitively impaired or not proficient in English, (b) unable to be contacted by telephone because they were hearing impaired or did not own a phone, or (c) unsuitable to be assessed prospectively because their planned date of operation was within 10 days of presentation or they planned on being overseas during the prospective period. At recruitment, participants were shown the format of the EQ-5D-5L and were given a hardcopy as part of the patient information consent form, so that they were familiar with the question format for telephone interview due to occur 2 weeks prior to planned surgery.
Participant data were collected relating to identifying information (name, date of birth, sex and medical record number), contact details (primary and secondary phone numbers), eligibility status, participation status (consented, declined or missed), primary language, operation type, important dates (pre-admission clinic, planned operation and actual operation), and the prospective and retrospective EQ-5D-5L (date of administration and scores). The prospective measurement was conducted by telephone 2 weeks pre-operatively by an investigator (AT). Participants completed the unmodified EQ-5D-5L by telephone with respect to their current, pre-operative health status.
The retrospective measure was administered verbally, in-person at the hospital post-operatively (AL, AP and MK), prior to their discharge from hospital. Participants were asked to recall their pre-treatment health status as it was 2 weeks prior to their surgery using a modified EQ-5D-5L (past tense) questionnaire administered verbally.
A sample size of 77 was determined using Zou’s (2012) sample size calculation incorporating assurance probability . A sample size of 77 ensured a 90% assurance probability given the half width of a 95% two-sided confidence interval for a correlation or concordance of 0.8.
The retrospective EQ-5D-5L measurements were compared with prospective measurements; the EQ-5D-5L produces seven separate outcomes (5 domain scores, an index score and a VAS score) and each prospective outcome was paired with its retrospective outcome and tested for correlation. The five domain scores were assessed for intra-class correlation (ICC). The domain scores were then converted to an index score using a scoring algorithm estimated from a sample of the United Kingdom (UK) adult general population, using an EQ-5D-5L calculator. The calculator was developed by Sheffield Hallam University on behalf of The Chartered Society of Physiotherapy (United Kingdom) in 2011 . The purpose of the tool is to enable illustration of change in quality of life as a result of physiotherapy interventions. However, in this instance, we used the tool to pair prospective and retrospective HRQoL scores and to display the distribution of difference in domains scores. The prospective index scores and VAS scores were correlated with the retrospective scores. Agreement was assessed for the continuous variables (EQ-index and EQ-VAS scores). Bland-Altman plots were made with 95% limits of agreement (LOA) and Lin’s concordance correlation coefficient (pc) was calculated. To interpret the results for correlation and for agreement, we used benchmarks defined by Cicchetti (1994), whereby 0.21 to 0.4 represented ‘fair’, 0.41–0.60 was ‘moderate’, 0.61–0.80 was ‘substantial’ and 0.81–1.00 was ‘almost perfect’ reliability .
Three hundred fifty-five patients attended the pre-admission clinic for joint replacement surgery at our facility in the period from 27 September 2017 to 28 September 2018 and 291 patients were screened for eligibility, of which 78 were ineligible, 39 declined to participate and 174 consented to participate. Prospective outcomes were gathered on 144 participants and retrospective outcomes were gathered on 104 participants. Both prospective and retrospective outcomes were gathered for 88 participants. The median time between tests was 15 days, the range was 3 to 64 days and the mean follow-up was 19 days.
The main reason for missing prospective measurements on recruited participants at the pre-operative stage was that the participant could not be contacted by their primary or secondary telephone number after five attempts on consecutive weekdays. The main reason for missing retrospective measurements on post-operative participants was that the patient’s date of surgery was changed or cancelled after their attendance at pre-admission clinic. Other reasons included that the patient required an escalation in their medical treatment after surgery (e.g. critical care admission) or they were discharged from the hospital before their follow-up. Participant flow is shown in Fig. 1.
The demographic characteristics of the study sample are described in Table 1. The mean age of participants was similar to non-participants. There were slightly more female participants (51%) than male participants and there were more knee replacements (67%) than hip replacements (33%) in this sample. English was the predominant first language for both participants and non-participants but, given that English-proficiency was an inclusion criterion for participation, the proportion of first languages other than English was higher in the population (26%) than in the study sample (8%).
Comparisons of the nominal data are displayed in Table 2. The medians and the modes for each of the five dimensions remained unchanged between the prospective and the retrospective measurements, indicating negligible difference between prospective and retrospective measurements at a group level.
The distribution of difference from prospective to retrospective measurements in each of the five dimensions is illustrated in Appendix 1. The distribution of change provides an indication of the magnitude and direction of difference in scores in each dimension from prospective to retrospective measurements. The range of agreement in paired prospective and retrospective scores was 48% (usual activities) to 65% (mobility).
The results gathered from each of the five EQ dimensions are presented in 5 × 5 tables (Appendix 2). Intraclass correlation coefficient (ICC) scores were reported for each of the five domains. Correlation was moderate for usual activities (r = 0.52) and anxiety/depression (r = 0.54) and was substantial for mobility (r = 0.74), personal care (r = 0.62) and pain/discomfort (r = 0.65).
The mean differences (MDs) for each of the continuous variables were negligible; 0.53 on a scale of 0 to 100 for the EQ-VAS and 0.02 on a scale of 0 to 1 for the EQ-Index. This indicated strong agreement between prospective and retrospective measures at a group level. However, the Bland-Altman plots and Lin’s concordance (pc) for paired EQ-VAS and EQ-Index scores indicated that agreement at an individual level was lower. The Bland-Altman plots for EQ-VAS (Fig. 2) and EQ-Index (Fig. 3) scores show that the 95% LOA. around the MD were − 32.34 to 33.38, on a scale of 0–100 for the EQ-VAS and − 0.31 to 0.36 on a scale of 0–1 for the EQ-Index. Agreement was moderate for EQ-VAS (pc = 0.46 95% CI, 0.28, 0.61) and substantial for EQ-Index scores (pc = 0.76 95% CI, 0.66, 0.84) (Table 3).
The results of this study showed that retrospective measurement of HRQoL produced equivalent results to prospective measurement at a group level but agreement was lower at an individual level. We could find no directly comparable studies but numerous recent studies investigating test-retest reliability of the EQ-5D-5L tool (not using recall of health status) have reported similar differences between the two testing periods (Table 4). A 2018 study surveyed a cross-section of the Indonesian general adult population  with a mean between-test period of 17 days. The findings were similar to our findings in that the group-level correlations were high for the five dimensions (ICC = 0.85–0.99) but that agreement for the EQ-VAS and EQ-Index scores was much lower (pc = 0.45 and 0.37 respectively). An English study examined the test-retest reliability of the EQ-5D-3L in the UK general population and reported an ICC of 0.83 in the EQ-Index scores . A Chinese study of carers of cancer patients reported high test-retest reliability for EQ-VAS and EQ-Index scores of 0.87 and 0.99 respectively . However, the follow-up between test and retest in that particular study was only 1 day meaning the results were likely to be heavily influenced by recall bias compared with the results of our study. Other studies with similar follow-up periods to our study have reported test-retest ICCs for EQ-Index scores of 0.78 , 0.70  and 0.75 . These results would lead us to suggest that retrospective measurement of baseline health status produces similar test-retest reliability compared to studies measuring current (prospective) health status at a similar interval.
Similar findings have been reported in the context of testing the validity of different modes of administration of patient-reported measures. In a 2017 study, the equivalence of patient-completed and telephone interview modes of administration of the EQ-5D-5L were tested in a cohort of orthopaedic patients . The equivalence was established according to the minimum important difference for the index and VAS scales but correlation between paired domain scores were similar to or lower than those reported in our study. Another study from 2014 tested the correlation of mail and telephone administration of the Oxford hip and knee scores in an orthopaedic cohort , showing that the two different modes of administration produced equivalent results at a group level but that agreement was low at an individual level.
Both prospective and retrospective measurements of difference in self-reported health status are susceptible to bias. Prospective measurements are subject to scale recalibration, a changed conceptualisation of the answer scale secondary to changed internal standards of construct interpretation and judgement from pre-test to post-test . Significant intervening health events can exaggerate this bias by shifting the frame of reference and catalysing a revaluation of prior and present health status . In the case of this study, the retrospective measurement might have been biased by factors such as general anaesthesia, the trauma of surgery or post-operative levels of pain. Schwartz & Sprangers (1999) suggest that this scale recalibration makes the retrospective measurement of change in self-reported health status more appropriate than the prospective measurement of change in self-reported health status because both recalled and current health status are evaluated using a consistent internal standard of construct interpretation and judgment .
Alternatively, retrospective measurement may be subject to recall bias, the incorrect self-assessment of former health status due to inaccurate or incomplete recollection . Recall bias occurs because recollection is a reconstructive and inferential process that is subject to errors, losses, distortions and psychological processes in the present state and over time . Bias results in underestimations or overestimations of former health status which may be non-directional, occurring by chance and cancelling out on average, or directional, consistently occurring and producing a unidirectional error. In this study, recall might have been biased by general anaesthesia, the trauma of surgery or the side effects of analgesia. To reduce the chance of recall bias, retrospective measures were taken as close to the patients discharge from hospital as possible. Given that our results were very similar to other test-retest studies that didn’t involve an intervening health event and that had similar time between test and retest, it is unlikely that our results were affected significantly by scale recalibration or recall bias. That said, though not the aim of the study, the use of a control group that was not exposed to an intervening health event would have provided evidence either way about the presence of such bias.
Strengths and limitations
The aim of this study was to investigate the validity of the retrospective use of HRQoL. Many validation studies test for reliability using correlation but do not necessarily test for agreement using concordance and this can be problematic in the context of longitudinal studies . Correlation measures the strength of the linear relationship between two measures but does not measure the equality between paired sets of values. This study used Bland-Altman plots and assessment of concordance to investigate agreement between prospective and retrospective measures.
The main limitation of this study was the low follow-up rate of participants. The investigators made efforts to maximise the follow-up rate by excluding patients who were less likely to be contactable by phone and by tracking patient journeys in the electronic medical record. The first EQ-5D-5L questionnaire was administered via telephone and the second EQ-5D-5L questionnaire was administered face-to-face which increased the risk of detection bias. The risk of bias was minimised by investigators using standardised explanations and delivering both surveys verbally. Another limitation was the lack of a control group which meant that the risk of scale recalibration or recall bias could not be assessed.
The validity of measuring HRQoL retrospectively has not previously been assessed. Our results indicate that retrospective measurement of HRQoL, using the EQ-5D-5L in an elective orthopaedic clinical context provides results that are almost equivalent to prospective measurement at a group-level but not at an individual level. These results are similar to the results of studies investigating test-retest reliability of EQ-5D-5L, suggesting that the retrospective measurement of HRQoL to estimate pre-morbid health status is valid.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
EuroQol 5-dimension 5-level
EuroQol-5D health utilities Index
EuroQol-Visual Analogue Scale
Health-related quality of life
Limits of agreement
- pc :
Lin’s coefficient of concordance
Blome C, Augustin M. Measuring change in quality of life: bias in prospective and retrospective evaluation. Value Health. 2015;18(1):110–5.
McCaffrey N, Kaambwa B, Currow DC, Ratcliffe J. Health-related quality of life measured using the EQ-5D-5L: south Australian population norms. Health Qual Life Outcomes. 2016;14(1):133.
Grobet C, Marks M, Tecklenburg L, et al. Application and measurement properties of EQ-5D to measure quality of life in patients with upper extremity orthopaedic disorders: a systematic literature review. Arch Orthop Trauma Surg. 2018;138(7):953–61.
Marti C, Hensler S, Herren DB, Niedermann K, Marks M. Measurement properties of the EuroQoL EQ-5D-5L to assess quality of life in patients undergoing carpal tunnel release. J Hand Surg Eur Vol. 2016;41:957–62.
Nazari G, MacDermid JC, Bain J, et al. Estimation of health-related-quality of life depends on which utility measure is selected for patients with carpal tunnel syndrome. J Hand Ther. 2017;30:299.
Slobogean GP, Noonan VK, O’Brien PJ. The reliability and validity of the disabilities of arm, shoulder, and hand, EuroQol-5D, health utilities index, and short form-6D outcome instruments in patients with proximal humeral fractures. J Shoulder Elb Surg. 2010;19(3):342–8.
Adobor RD, Rimeslatten S, Keller A, Brox JI. Repeatability, reliability, and concurrent validity of the Scoliosis Research Society-22 questionnaire and EuroQol in patients with adolescent idiopathic scoliosis. Spine. 2010;35:206–9.
Linde L, Sorensen J, Ostergaard M, Horslev-Petersen K, Hetland ML. Health-related quality of life: validity, reliability, and responsiveness of SF-36, 15D, EQ-5D (corrected) RAQoL, and HAQ in patients with rheumatoid arthritis. J Rheumatol. 2008;35:1528–37.
Luo N, Chew LH, Fong KY, et al. A comparison of the EuroQol-5D and the health utilities index mark 3 in patients with rheumatic disease. J Rheumatol. 2003;30:2268–74.
Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness and reliability of EuroQol (EQ-SD). Br J Rheumatol. 1997;36:551–9.
Solberg TK, Olsen JA, Ingebrigtsen T, Hofoss D, Nygaard OP. Health-related quality of life assessment by the EuroQol-5D can provide cost-utility data in the field of low-back surgery. Eur Spine J. 2005 Dec;14(10):1000–7.
Brazier JE, Harper R, Munro JF, Walters SJ, Snaith ML. Generic and condition-specific outcome measures for people with osteoarthritis of the knee. Rheumatology. 1999;38:870–7.
Fransen M, Edmonds J. Reliability and validity of the EuroQol in patients with osteoarthritis of the knee. Rheumatology (Oxford). 1999;38:807–13.
Molla MT, Lubitz J. Retrospective information on health status and its application for population health measures. [Erratum appears in Demography. 2008;45(3):preceding 489]. Demography. 2008;45(1):115–28.
Purba FD, Hunfeld JAM, Iskandarsyah A, Fitriana TS, Sadarjoen SS, Passchier J, Busschbach JJV. Quality of life of the Indonesian general population: test-retest reliability and population norms of the EQ-5D-5L and WHOQOL-BREF. PLoS One. 2018;13(5):1–20.
Al-Janabi H, Flynn TN, Peters TJ, et al. Test-retest reliability of capability measurement in the UK general population. Health Econ. 2015;24(5):625–30.
Li L, Liu C, Cai X, Yu H, Zeng X, Sui M, Zheng E, Yang Li Y, Xu J, Zhou J, Huang W. Validity and reliability of the EQ-5D-5L in family caregivers of leukemia patients. BMC Cancer. 2019;19:522.
Cheung PWH, Wong CKH, Samartzis D, Luk KDK, Lam CLK, Cheung KMC, Cheung JPY. Psychometric validation of the EuroQoL 5-dimension 5-level (EQ-5D-5L) in Chinese patients with adolescent idiopathic scoliosis. Scoliosis Spinal Disord. 2016 Aug 4;11:19.
Pattanaphesaj J, Thavorncharoensap M. Measurement properties of the EQ-5D-5L compared to EQ-5D-3L in the Thai diabetes patients. Health Qual Life Outcomes. 2015;13:14.
Kim TH, Jo MW, Lee SI, et al. Psychometric properties of the EQ-5D-5L in the general population of South Korea. Qual Life Res. 2013;22:2245–53.
Kreulen GJ, Stommel M, Gutek BA, Burns LR, Braden CJ. Utility of retrospective pretest ratings of patient satisfaction with health status. Res Nurs Health. 2002;25(3):233–41.
Lamb T. The retrospective pretest: an imperfect but useful tool. Evaluation Exchange. 2005;11(2):18–9.
Middel B, de Greef M, de Jongste MJ, Crijns HJ, Stewart R, van den Heuvel WJ. Why don't we ask patients with coronary heart disease directly how much they have changed after treatment? J Cardpulm Rehabil. 2002;22(1):47–52.
Wilson R, Derrett S, Hansen P, Langley J. Retrospective evaluation versus population norms for the measurement of baseline health status. Health Qual Life Outcomes. 2012;10:68.
Watson WL, Ozanne-Smith J, Richardson J. Retrospective baseline measurement of self-reported health status and health-related quality of life versus population norms in the evaluation of post-injury losses. Inj Prev. 2007;13(1):45–50.
Zou GY. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med. 2012;31(29):3972–81.
Sheffield Hallam University on behalf of The Chartered Society of Physiotherapy (UK). EQ-5D-5L - Calculator and explanation. 2011. Available online at https://www.csp.org.uk/documents/eq-5d-5l-calculator-and-explanation.
Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284–90.
Chatterji R, Naylor JM, Harris IA, Armstrong E, Davidson E, Ekmejian R, et al. An equivalence study: are patient-completed and telephone interview equivalent modes of administration for the EuroQol survey? Health Qual Life Outcomes. 2017;15(1):18.
Marena AM, et al. Mail Versus Telephone Administration of the Oxford Knee and Hip Scores. J Arthroplasty. 2014;29(3):491–4.
McPhail S, Haines T. Response shift, recall bias and their effect on measuring change in health-related quality of life amongst older hospital patients. Health Qual Life Outcomes. 2010;8:65.
Schwartz CE, Sprangers MA. Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Soc Sci Med. 1999;48(11):1531–48.
Berchtold A. Test–retest: agreement or reliability? Methodological Innovations. 2016;9(15):1–7.
We gratefully acknowledge the contributions made by Ms. Ashley Pitcher (AP) and Mr. Matt Khalaf (MK), physiotherapists at The Sutherland Hospital to participant recruitment and data collection of retrospective measures.
We also acknowledge using the EQ-5D-5L calculator, developed by Sheffield Hallam University on behalf of The Chartered Society of Physiotherapy (UK) and available online at https://www.csp.org.uk/documents/eq-5d-5l-calculator-and-explanation. The purpose of the calculator is to enable illustration of change in quality of life as a result of physiotherapy intervention. However, in this study, the tool was used to assess agreement between prospective and retrospective measures of HRQoL.
This research received no external funding.
Ethics approval and consent to participate
Ethical approval for this study was sought and received from the South Eastern Sydney Local Health District (SESLHD) Human Research Ethics Committee in 2017. The reference number is 17/183 (LNR/17/POWH/384).
All study participants provided informed consent. Signed consent forms are stored with the administering institution.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lawson, A., Tan, A.C., Naylor, J. et al. Is retrospective assessment of health-related quality of life valid?. BMC Musculoskelet Disord 21, 415 (2020). https://doi.org/10.1186/s12891-020-03434-8
- Quality of life
- Test-retest reliability
- Reproducibility of results
- Surveys and questionnaires