Skip to main content

Agreement among physiotherapists in assessing patient performance of exercises for low-back pain



There is no agreement for the performance assessment of patients who practice exercises.. (2 points to withdraw) This assessment is currently left to the physiotherapist’s personal judgement. We studied the agreement among physiotherapists in rating patient performance during exercises recommended for chronic low-back pain (LBP).


A vignette-based method was used. We first identified ten exercises recommended for LBP in the literature. Then, 42 patients with chronic LBP participating in a rehabilitation program were videotaped during their performance of one of the ten exercises. A vignette was an exercise video preceded by clinical information. Ten physiotherapists from primary (4) and tertiary care (6) viewed the 42 vignettes twice, one month apart, and rated patient performance from zero (worse performance) to ten (excellent performance) by considering the position and duration of the contraction or stretching. Intra-class correlation coefficients (ICCs) and 95% confidence intervals (95% CIs) were computed to assess inter- and intra-rater reliability.


The overall inter-rater agreement was fair (ICC 0.48 [95% CI 0.33–0.56]) but was better for stretching exercises (0.55 [0.35–0.64]) than strengthening exercises (0.42 [0.20–0.52]) and for tertiary-care physiotherapists (0.66 [0.54–0.76]) than primary-care physiotherapists (0.28 [0.09–0.37]). The intra-rater agreement was overall good (0.72 [0.57–0.81] to 0.88 [0.79–0.94]). It was better for stretching exercises (from 0.68 [0.46–0.81] to 0.96 [0.91–0.98]) than strengthening exercises (from 0.68 [0.38–0.84]) to 0.82 [0.56–0.92]).


The agreement in rating patient performance of exercises for LBP is good among physiotherapists trained in managing LBP but is low among non-trained physiotherapists.

Peer Review reports


Exercise therapy decreases pain and improves function in musculoskeletal diseases [1,2,3]. Individually designed exercise programs are effective in healthcare settings [1, 4]. The exercise program is usually learned during supervised physical therapy sessions and is performed at home by the patient alone, so the patient must be able to self-actualize the exercises at the end of supervised sessions.

There is no standardised way to assess patient performance during exercises. In practice, the assessment is left to the physiotherapists’ personal judgement. This judgement may result from an unconscious integration of various data such as their own beliefs and experience, patient characteristics (age, comorbidities), exercise characteristics, and the relationship with the patient [5]. Better assessment of patient performance could help to improve the teaching of exercises and determine how many physiotherapy sessions are required for one patient, to propose a more personalized treatment. Indeed, if the number of supervised sessions is not sufficient, the treatment can be ineffective and patients can stop home exercises because they do not feel able to practice alone. In contrast, if the number of supervised sessions is greater than needed, the exercises will be a waste of time both for the physiotherapist and the patient.. ( 2 points to withdraw) As well, we need to better understand why treatment fails for some patients and whether home exercises are correctly performed to better adapt the treatment strategy: if exercises are correctly performed, other treatments may be considered; otherwise, new learning sessions and advice may be necessary. Finally, patients may have doubts about their performance and should be advised promptly to avoid stopping the exercises. This advice could improve adherence to home exercises, which is a common problem in musculoskeletal-disease rehabilitation [6,7,8].

Low-back pain (LBP) is highly prevalent [9], disabling, and costly [10, 11] and represents the first cause for needing a physiotherapist in France [12, 13]. Numerous studies have shown the effectiveness of exercise therapy in reducing pain and improving function with this condition [1, 14, 15]. Therefore, LBP is an ideal condition to evaluate whether physiotherapists’ judgement can be trusted and is reproducible.

The aim of this study was to assess the agreement among physiotherapists in rating patient performance during exercises recommended for LBP.


This is an intra- and inter-reliability study. Case-vignettes were used to study physiotherapists agreement [16] because these allow for different health providers to assess the same exercise performed by the same patient.

Development of vignettes

Identification of exercises to translate into vignettes

We identified the exercises recommended in LBP by a non-exhaustive literature review. One author (CP) searched MEDLINE and PEDRO databases for articles evaluating the effectiveness of exercises in LBP that were published in English from 1982 to 2012.

From the articles obtained, the steering committee of the study (including one physical medicine and rehabilitation physician, one rheumatologist and one physiotherapist expert in LBP) selected ten exercises: six strengthening exercises (two for back muscles, two abdominal muscles, one gluteus muscles and one trunk stabilizing exercise) with alternating contraction/rest periods of five seconds (five repetitions) and four stretching exercises (one for hamstrings, one gluteus muscles, one back muscles and one rectus femoris muscles) with a stretch of at least 20 s (Fig. 1).

Fig. 1
figure 1

Exercises for strengthening and stretching

Creation of vignettes

A vignette included brief clinical information for the patient (age, main co-morbidities that could affect the achievement of the exercise [e.g., knee osteoarthritis can affect the ability to kneel]), a short description of the exercise (e.g., used for strengthening back muscles), and a video of a patient performing the exercise.

After giving their consent, 42 patients with nonspecific chronic LBP participating in a supervised rehabilitation program in a tertiary-care hospital (Cochin hospital, Paris, France) were videotaped while they performed one of the specific exercises they had learned (at least four different patients performed the same exercise). An example of a vignette is shown in Additional file 1. The acquisition of videos was highly standardised to ensure reproducibility (Additional file 2).

Additional file 1: An example of a vignette with a short video of a patient performing the exercise. (MP4 119063 kb)


All physiotherapists working in the rehabilitation department of the tertiary-care Cochin hospital were informed of the study and were asked to participate on a voluntary basis [15]. Six physiotherapists accepted to participate.

Six physiotherapists (staff personal contacts) working in primary care centres were informed of the study by e.mails. Four accepted to participate.

The experience was self-reported. They were asked: “Among your patients, what percentage of them have low back pain?” We considered physiotherapists to be experienced when more than 50% of their patients had low back pain and low experienced when less than 50% of their patients had low back pain.

Study design

The Fig. 2 shows the study design. Before scoring the vignettes, the physiotherapists provided the following information: age, gender, time working in the current job, experience in the management of LBP (proportion of patients with LBP they daily managed).

Fig. 2
figure 2

Diagram of the study design

All rehabilitation center physiotherapists scored the 42 vignettes twice. The first scoring determined the inter-rater agreement and the second scoring (one month apart) determined the intra-rater agreement. For each vignette, they received the following instruction: “We ask you to assess the patient’s ability to perform an exercise recommended for LBP. Taking into account the patient’s position during the whole exercise, the duration of the contraction or the stretching, could you please note from zero (worse performance) to ten (excellent performance) the patient’s ability to perform the following exercise?” The vignettes were saved on a personal computer and could be viewed individually by participants for one week. Participants were asked not to talk about the topic during the study period. The answers were anonymous. To assess intra-rater agreement, participants were asked one month later to rate the same vignettes in another random order and blind of their first answers.

The primary care physiotherapists received the same 42 vignettes by e-mail and rated them once. Because this process was time-consuming, a second scoring was not possible. Only inter-rater agreement could be assessed.

Finally, three other rehabilitation centre physiotherapists were asked to rate the performance of 16 patients doing exercises while they were being videotaped. Six months later, they rated the exercise videos of the same patients so we could compare the face-to-face assessment with the scoring of the exercise video.

Statistical analysis

We estimated the number of vignettes and participants needed for physiotherapists to assess agreement on the precision of the intra-class correlation coefficient (ICC), and on feasibility considerations (time needed to build a vignette, time needed for scoring, number of participants available). With 42 vignettes, each scored twice, and an expected inter-observer ICC of 0.60, the expected 95% confidence interval (95% CI) would be about 0.4 [17].

Data are described as median (range). As data had a near normal distribution, ICCs were used to assess intra- and inter-rater agreement. ICC estimates were calculated based on a single measurement, consistency, two-way mixed-effects model (ICC (3,1)). The bootstrap procedure (bias-corrected and accelerated bootstrap) was used to estimate 95% CIs. An ICC of 0 indicates chance agreement and 1 perfect agreement. We defined agreement as poor, ICC < 0.4; fair, 0.4 to 0.59; good, 0.6 to 0.74; and excellent, ≥0.75 [18]. Bland and Altman plotting was used to assess the quality of concordance among physiotherapists by the amplitude of the agreement intervals, with the upper and lower limits of agreement defined as the mean difference plus and minus 1.96 times the standard deviation of the differences.

Analyses involved use of R v3.1.2 (statistical software). Written consent was obtained for all participants. The study was approved by the local ethics committee (Comité d’évaluation éthique de l’INSERM (IRB00003888)).


Ten physiotherapists participated in the study: six from the rehabilitation department of Cochin hospital and four from primary care. The median age was 26 years (range 23–42) and six were women. The physiotherapists from primary care were less experienced in managing LBP than those from tertiary care. Patients with LBP represented 80% of the patients managed in Cochin hospital, while in primary care patients with LBP were not always majority. Only one in four primary care physiotherapists had 60% of his patients with LBP. (Table 1).

Table 1 Physiotherapist (PT) characteristics

Inter-rater agreement

Overall, the inter-rater agreement was fair for the ten physiotherapists (ICC 0.48 [95% CI 0.33–0.56]) (Table 2). The agreement was better for stretching exercises (0.55 [0.35–0.64]) than strengthening exercises (0.42 [0.20–0.52]), with an overlap of CI.

Table 2 Inter-rater agreement for PTs rating patients’ ability to perform exercises

The agreement among physiotherapists from tertiary care was good (ICC 0.66 [0.54–0.76]) (Table 2). It was better for stretching exercises (0.73 [0.56–0.82]) than strengthening exercises (0.58 [0.32–0.71]) with an overlap of CI. During the second scoring of the vignettes (one month later), the agreement increased to 0.70 [0.58–0.77]); the agreement for strengthening exercises improved (0.65 [0.43–0.77]) but remained stable for stretching exercises (0.73 [0.57–0.82]) with an overlap of CI.

By contrast, the inter-rater agreement among primary care physiotherapists was poor (ICC 0.28 [0.09–0.37]) (Table 2). The agreement was better for strengthening exercises (0.34 [0.07–0.48]) than stretching exercises (0.21 [− 0.01–0.28]) but remained low with an overlap of CI. One primary-care physiotherapist scored the vignettes differently from the others (higher or lower scores), especially for stretching exercises. Without this outlier, the global agreement was better (0.46 [0.23–0.51]) but still low; the agreement for strengthening exercises was low (0.29 [95% CI 0–0.46]) but was good for stretching exercises (0.70 [0.31–0.66]).

The amplitudes of the agreement intervals on a Bland– Altman plot (Fig. 3) indicated the

Fig. 3
figure 3

Bland and Altman plot for agreement among all physiotherapists (n = 10)

Horizontal dotted line is the mean difference, and upper and lower lines are 95% confidence intervals for limits of agreement.

quality of concordance among physiotherapists. The 5 and 95% CIs correspond to the limits of agreement. For the global preference, the graphs indicated that the mean of the differences among the raters was very close to zero, with plots having a double funnel shape. It shows a greater concordance for the highest scores (the most successful exercises) (Bland and Altman plots for strengthening and stretching exercises in Additional file 3).

Intra-rater agreement

The intra-rater agreement was very good for all physiotherapists (ICC 0.72 [95% CI 0.57–0.81] to 0.88 [0.79–0.94]) (Table 3). It was better for stretching exercises (from 0.68 [0.46–0.81] to 0.96 [0.91–0.98]) than strengthening exercises (from 0.68 [0.38–0.84] to 0.82 [0.56–0.92]).

Table 3 Intra-rater agreement for PTs rating patients’ ability to perform exercises

Comparison between video and face-to-face assessment

The three physiotherapists who rated the performance of patients live and on video all work in the rehabilitation department of Cochin hospital. They did not participate in the rest of the study. They were older (median 31 years [range 29–45]) and more experienced (median time working in the current job seven years [range 6–20]) than the other physiotherapists.

The intra-rater agreement was excellent and very good for two physiotherapists (ICC 0.93 [95% CI 0.38–1.00] and 0.71 [0.1–0.9]), whereas the third one had low agreement (0.39 [− 0.34–0.72]) (Additional file 4).


This study shows a good agreement among tertiary care physiotherapists, experienced in the management of LBP, but a low agreement among primary care physiotherapists, who were less experienced in the management of LBP. The reliability was greater during the second assessment, suggesting that training physiotherapists can improve their agreement in assessing patient performance of exercises for low-back pain.

These discrepancies may arise from a recruitment bias, as the physiotherapits of the rehabilitation centre have the same background and are used to manage a very specific population of patients (ie those who are not improved after a primary care physiotherapy), whereas the primary care physiotherapists may have different background and expectations. As well, we found better agreement for stretching than strengthening exercises, which suggests that stretching exercises are easier to score than strengthening exercises, which may require more feedback.

Although the performance of patients during therapeutic exercises may be a strong predictor of the effectiveness of exercises, it has never been studied previously. When the patient is learning the personalised exercise program during supervised sessions, the ability to perform the exercises should be assessed regularly. Our study suggests that this assessment could be easily performed by physiotherapists experienced in LBP management or with specific training. The adequate number of supervised sessions could be adapted to each patient, for more personalized care, which may be more effective. Moreover, regularly assessing patient performance when they practice therapeutic exercises at home could help determine when exercises are no longer performed adequately (“unlearning” curve) and when refreshing supervised sessions are required.

Adherence is a main issue for exercise therapy programs, especially home-based programs [19]. The World Health Organization has defined adherence as “the extent to which a person’s behaviour taking medication, following a diet, and/or executing lifestyle changes, corresponds with agreed recommendations from a health care provider” [20]. By extension, exercise adherence is often considered the extent to which a patient acts in accordance with the advised interval, exercise dose, and exercise dosing [21]. This definition does not take into account the performance of the patient when performing the prescribed exercises, which may explain why adherence is almost never reported in studies assessing the effectiveness of home-based exercise programs. For example, in a systematic review of interventions to improve adherence to exercise for chronic musculoskeletal pain, only 4 of 42 studies used the accuracy of exercises performed to rate adherence. However, an accurate reporting of adherence seems essential to better address the treatment efficacy of home-based exercises programs in clinical studies [22]. Consequently, future studies evaluating the effectiveness of therapeutic exercises should include an assessment of patient performance. Patient performance could be assessed by physiotherapists on a numeric rating scale from zero to ten.

We also wondered if these results could be transposed to a face-to-face assessment, without a video. The intra-rater agreement was high for two physiotherapists but low for the third one, perhaps because he was aphysiotherapist manager and therefore less involved in patient care and did not directly participate in teaching exercises to patients (Additional file 4). Thus, the judgment of the videos was close to live assessment. This finding has two major advantages: first, our results could be transposed to a face-to-face assessment and second, patient performance could be assessed via “telemedicine”, so that patients could be advised by a physiotherapist from their home.

This work has some limitations. A key limitation of this study as that COnsensus-based Standards for the selection of health (COSMIN) was not used to inform study design and decisions’. The number of physiotherapists was small, and there was a recruitment bias for the rehabilitation centre physiotherapists as they worked together. That is why we also wanted to include primary care physiotherapists. The confidence intervals could be wide because the numbers of physiotherapists were small. When the confidence intervals overlap, it was not possible to conclude that there is a definite difference But there was no overlap of confidence intervals when comparing the ICCs of the tertiary care physiotherapists with the ICCs of the primary care physiotherapists, suggesting a significative difference of reliability between them.. Finally, we focused on the exercises recommended for one particular condition, LBP, because this is a common problem with a significant socioeconomic impact. These results should be confirmed for other disorders, such as knee osteoarthritis or rotator cuff diseases.


The agreement among physiotherapists experienced in managing musculoskeletal disorder is good for using a ten-point scale to rate patient performance during exercises recommended for LBP. Training of less experienced physiotherapists is necessary. A ten-point scale could be used to assess patients performance in clinical studies evaluating the effectiveness of exercise therapy in LBP, but also in real life to determine the adequate number of physiotherapy sessions required and to help better understand the unlearning phenomenon of exercises.

This study is providing initial insights to determine the agreement among physiotherapists in assessing patient performance. Future studies will be needed to evaluate these findings in another population of physiotherapists and in other musculoskeletal disorders.


95% CI:

95% confidence interval


Intra-class correlation coefficient


Low-back pain


  1. Hayden JA, van Tulder MW, Malmivaara A, Koes BW. Exercise therapy for treatment of non-specific low back pain. Cochrane Database Syst Rev. 2005;20(3):CD000335.

    Google Scholar 

  2. Thomas KS, Muir KR, Doherty M, Jones AC, O’Reilly SC, Bassey EJ. Home based exercise programme for knee pain and knee osteoarthritis: randomised controlled trial. BMJ. 2002;325(7367):752.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Deyle GD, Henderson NE, Matekel RL, Ryder MG, Garber MB, Allison SC. Effectiveness of Manual Physical Therapy and Exercise in Osteoarthritis of the Knee: A Randomized. Controlled Trial. Ann Intern Med. 2000;132(3):173.

    Article  PubMed  CAS  Google Scholar 

  4. Jonas S. Philips E. Boston: ACSM’s Exercise in Medicine; 2009.

    Google Scholar 

  5. Edwards I, Jones M, Carr J, Braunack-Mayer A, Jensen GM. Clinical reasoning strategies in physical therapy. Phys Ther. 2004;84(4):312–30.

    PubMed  Google Scholar 

  6. Friedrich M, Gittler G, Halberstadt Y, Cermak T, Heiller I. Combined exercise and motivation program: effect on the compliance and level of disability of patients with chronic low back pain: a randomized controlled trial. Arch Phys Med Rehabil. 1998;79(5):475–87.

    Article  PubMed  CAS  Google Scholar 

  7. Härkäpää K, Järvikoski A, Mellin G, Hurri H, Luoma J. Health locus of control beliefs and psychological distress as predictors for treatment outcome in low-back pain patients: results of a 3-month follow-up of a controlled intervention study. Pain. 1991;46(1):35–41.

    Article  PubMed  Google Scholar 

  8. Beinart NA, Goodchild CE, Weinman JA, Ayis S, Godfrey EL. Individual and intervention-related factors associated with adherence to home exercise in chronic low back pain: a systematic review. Spine J. 2013;13(12):1940–50.

    Article  PubMed  Google Scholar 

  9. van Oostrom SH, Monique Verschuren WM, de Vetl HC, Picavet HS. Ten year course of low back pain in an adult population-based cohort--the Doetinchem cohort study. Eur J pain Lond Engl. 2011;15(9):993-998.

  10. Vos T, Flaxman AD, Naghavi M, Lozano R, Michaud C, Ezzati M, et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic analysis for the global burden of disease study 2010. Lancet. 2012;380(9859):2163–2196.

  11. GBD 2013 DALYs and HALE Collaborators, CJL M, Barber RM, Foreman KJ, Abbasoglu Ozgoren A, Abd-Allah F, et al. Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological. Transition Lancet. 2015;386(10009):2145–91.

    Article  Google Scholar 

  12. Assurance-maladie, des soins de qualité pour tous. Lombalgies. La kinésithérapie et l’imagerie médicale souvent utilisées en excès. Faits marquants: 15 études P37–43. 2000.

  13. HAS. Prise en charge massokinésithérapique dans la lombalgie commune: modalités de prescription. 2005.

  14. Abenhaim L, Rossignol M, Valat JP, Nordin M, Avouac B, Blotman F, et al. The role of activity in the therapeutic management of back pain. Report of the international Paris task force on back pain. Spine. 2000;25(4 Suppl):1S–33S.

    Article  PubMed  CAS  Google Scholar 

  15. van Tulder M, Malmivaara A, Esmail R, Koes B. Exercise therapy for low back pain: a systematic review within the framework of the cochrane collaboration back review group. Spine. 2000;25(21):2784–96.

    Article  PubMed  CAS  Google Scholar 

  16. Bachmann LM, Mühleisen A, Bock A, ter Riet G, Held U, Kessels AGH. Vignette studies of medical choice and judgement to study caregivers’ medical decision behaviour: systematic review. BMC. 2008;8:50.

    Article  Google Scholar 

  17. Giraudeau B, Mary JY. Planning a reproducibility study: how many subjects and how many replicates per subject for an expected width of the 95 per cent confidence interval of the intraclass correlation coefficient. Stat Med. 2001;20(21):3205–14.

    Article  PubMed  CAS  Google Scholar 

  18. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.

    Article  PubMed  CAS  Google Scholar 

  19. Palazzo C, Klinger E, Dorner V, Kadri A, Thierry O, Boumenir Y, et al. Barriers to home-based exercise program adherence with chronic low back pain: patient expectations regarding new technologies. Ann Phys Rehabil Med. 2016;59(2):107–13.

    Article  PubMed  Google Scholar 

  20. Organization WH. Adherence to long-term therapies: evidence for action: World Health Organization; 2003. 230 p

  21. Deka P, Pozehl B, Williams MA, Yates B. Adherence to recommended exercise guidelines in patients with heart failure. Heart Fail Rev. 2017;22(1):41–53.

    Article  PubMed  Google Scholar 

  22. Jordan JL, Holden MA, Mason EE, Foster NE. Interventions to improve adherence to exercise for chronic musculoskeletal pain in adults. Cochrane Database Syst Rev. 2010;1:CD005956.

    Google Scholar 

Download references

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations



AH collected the data, produced the statistics, and wrote the article. CP initiated the project, organized the study, produced the films and helped AH write the article. SP participated in the development and planning of the project. AR and MMLC participated in the organization of the study, helped for analysis and interpretation of the results. JL and AG participated in the design of the study, the creation of the vignettes and the interpretation of the results. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Aurore Hermet.

Ethics declarations

Ethics approval and consent to participate

Written consent was obtained for all participants. The study was approved by the ethics committee: Comité d’évaluation éthique de l’INSERM (IRB00003888).

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 2:

How videos were obtained. Videos were obtained in the same room with the same placement. During the editing, there were no corrections. We blurred faces to preserve patient anonymity. (PDF 114 kb)

Additional file 3:

Bland and Altman plots of agreement for all physiotherapists for strengthening and stretching exercises. (PDF 163 kb)

Additional file 4:

Intra-rater agreement among three rehabilitation centre physiotherapists (face-to-face assessment vs video assessment). (PDF 106 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hermet, A., Roren, A., Lefevre-Colau, MM. et al. Agreement among physiotherapists in assessing patient performance of exercises for low-back pain. BMC Musculoskelet Disord 19, 265 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Physical therapists
  • LowBack pain
  • Rehabilitation
  • Exercise therapy
  • Agreement