The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

Juul, Tina; Langberg, Henning; Enoch, Flemming; Søgaard, Karen

doi:10.1186/1471-2474-14-339

Research article
Open access
Published: 03 December 2013

The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

Tina Juul¹,
Henning Langberg²,
Flemming Enoch³ &
…
Karen Søgaard¹

BMC Musculoskeletal Disorders volume 14, Article number: 339 (2013) Cite this article

12k Accesses
68 Citations
10 Altmetric
Metrics details

Abstract

Background

This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice.

Methods

The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test.

Results

Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC = 0.19-0.25).

Conclusions

Intra- and inter-rater reliability ranged from moderate to almost perfect agreement with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement. The significant variability observed suggests that tests like the neck extensor test and the neck flexor muscle endurance test performed in a 45°-upright position are too unstable to be used when evaluating neck muscle performance.

Peer Review reports

Background

Neck pain is a common musculoskeletal complaint among adults. Worldwide estimates show that the 12-month prevalence of neck pain among adults ranges between 30% and 50%, depending on the definition of neck pain and the geographic spread of respondents [1]. At any given time, approximately 12-14% of the adult population reports having neck pain [1] and neck pain is now the second most common musculoskeletal disorder [2, 3]. Likewise, neck pain often causes impairment, work disability and contributes to increased sickness absence [4, 5] – thus millions of dollars are spent annually on treatment, compensation and lost earnings [6], and neck pain is a contributory cause of reduced health-related quality of life [7, 8]. Neck pain has been associated with impaired performance of muscles in the cervical spine [9–13], as well as reduced proprioception and changes in the cervical motion patterns [14–17]. For this reason, treatment often includes exercise therapy aimed at restoring these neuromuscular deficits [18–23].

In order to assess any neuromuscular deficits present, it is of clinical importance to use reliable and valid assessment tools. Several performance tests have been developed with the aim of quantifying different aspects of muscle performance [24–33]. The present study focuses specifically on five muscle performance tests, which are often used in clinical practice.

The Cranio-Cervical Flexion Test (CCFT) is a clinical assessment test of the deep cervical flexor muscle function [28, 30]. It targets activation and endurance of the deep cervical flexors in progressive inner range positions. The individual is placed in supine crook lying with the head in a neutral starting position, followed by an active head nodding action (cranio-cervical flexion) during which the patient tries to sequentially target five progressive stages (measured as an increased downward pressure of 22, 24, 26, 28 and 30 mmHg) [29, 30]. The reliability of the CCFT has previously been assessed and it has shown promising psychometric properties [29, 34–37]. Intraclass Correlation Coefficient (ICC) values have revealed substantial to almost perfect intra-rater reliability for the CCFT, with ICC values ranging from 0.78 to 0.98 (95% Confidence Interval (CI) ratings between 0.47-0.99) [24, 29, 35–37]. In addition, moderate to almost perfect inter-rater reliability has been reported, with ICC values from 0.57 to 0.91 (95% CI ratings between 0.37-0.96) [24, 34, 36].

Grimmer et al. [26] described a muscle performance test targeting neck flexor muscle endurance [26]. The test is performed with the subject in a supine crook lying position and measures the subject’s ability to maintain a cranio-cervical flexion (chin tuck), while performing an active head lift [26]. The maximal holding time is recorded in seconds. The recording is stopped when head movement, indicating fatigue occurs (i.e., inability to maintain upper cervical flexion, increase in neck flexion or lowering of the head). Reliability studies conducted on this muscle endurance test, as well as on several modified versions, have found substantial to almost perfect intra-rater reliability (ICC values from 0.71 to 0.96) [25–27, 38–41]. Likewise, moderate to almost perfect inter-rater reliability has been reported (ICC values from 0.54 to 1.0) [27, 39, 40, 42–44]. As patients with neck pain are often unable to perform the supine crook lying version, due to neck pain or reduced muscle strength, a modified version of the Neck Flexor Muscle Endurance (NFME) test is frequently used in clinical practice. The modified NFME test is performed in the same manner as the supine version [26, 27] apart from the individual sitting in a 45°-upright position, which decreases the load on the neck. Nevertheless, little is known about the psychometric properties of the modified version.

Cervical Joint Position Error (JPE), measured as the ability to relocate the head to a starting position following active cervical range of motion, has been examined in patients with neck pain using several different measurement methods [16, 32, 33, 45–48]. The test measures alterations in kinaesthetic awareness expressed as e.g. errors in head and neck repositioning. Studies using movement analysis devices, such as an ultrasound-based measuring device (Zebris) or electromagnetic tracking devices (3-Space Fastrak), have reported substantial to almost perfect intra- and inter-session reliability (ICC values from 0.61 to 0.84) [47, 49–51], while others have failed to do so (ICC values from −0.01 to 0.51) [49, 50, 52, 53]. Based on the results from e.g. Revel et al. [32] and Heikkilä et al. [45] it has been suggested that clinicians can use simple equipment such as a paper target and a head-mounted laser pointer to assess a subject’s ability to relocate the head to a neutral position following active cervical range of motion [54]. However, the reliability of such clinical performance tests is still unknown.

Over the last decade there has been an increased interest in muscle performance of the cervical flexors in patients with neck pain [12, 21, 30, 55]. Muscle performance tests have focused predominantly on the cervical flexor muscles and only a limited number of tests targeting the posterior neck muscles exist [25, 56]. However, recent research indicates that significant changes also occur in the posterior neck muscles [57–60], and there is a clinical need for the development of muscle performance tests targeting the posterior neck muscles. Drawing on the existing literature and the clinical practice we developed a new dynamic muscle performance test, which targets neck extensor muscle’ endurance.

When conducting reliability studies, great effort goes into standardising test procedures in order to reduce sources of variation and facilitate a stable outcome. One way to reduce test variation is by increasing the number of tests and using the average to calculate i.e. ICC values. Studies of muscle performance tests used for patients with neck pain have shown that an increased number of test trials (minimum of five trials) increases the test’s reliability (i.e., increased ICC values and decreased Limits Of Agreement (LOA)) [50, 51] by reducing measurement error [61]. However, when muscle performance tests are applied in clinical practice, clinicians often only conduct a muscle performance test once or twice, partly due to time constrains and partly due to avoiding pain or fatigue in the tested muscles, which may affect test reliability (cf. increased measurement error).

Therefore, we aimed to investigate whether muscle performance tests, which have shown promising psychometric properties, remain reliable when examined under conditions similar to those of daily clinical practice in physiotherapy. Likewise, we aimed to target some of the areas where limited evidence exists. In order to standardise test procedures, we used inexpensive, simple equipment, which easily can be applied in a clinical setting and which previously has been found useful in tests of lumbar motor control [62].

The aim of this study was to determine the clinical reliability of five muscle performance tests in patients with and without neck pain.

Methods

Study design

An intra-rater (between-day) and inter-rater (within-day) design was applied. Each participant attended two assessment sessions. At each occasion both examiners assessed the participant. Intra-rater reliability on two days and was examined by comparing results from the two assessment sessions, with a maximum of three working days between the assessment sessions. Inter-rater reliability between examiner A and B was examined was assessed on both assessment sessions (first and second assessment session). The study followed a three-phase reliability protocol, recommended by the International Academy of Manual/Musculoskeletal Medicine (IAMMM) [63]. The three-phase protocol consisted of a preparation, training and an overall agreement phase. During the preparation phase agreements on study conditions and logistics were achieved, while the training phase focused mainly on replicating test procedures and judgment. The aim of the overall agreement phase was to obtain an overall agreement percentage >80% between the two examiners. After completing the three-phase protocol, both physiotherapists (examiners A and B) agreed upon how to determine a given cut-point (in case a clear cut off point did not already exist) and how to standardise and perform each test.

Examiners

Between September 2011 and April 2012, two recently certified physiotherapists working at a private physiotherapy clinic (examiners A and B) examined 63 participants. A third physiotherapist (administrator) independently handled the administration of patients in terms of booking appointments and handing out questionnaires. The examiners were blinded to one another’s results and to whether the participant was a subject with or without neck pain. The order of examinations was random; that is, neither physiotherapist was consistently the first or the second examiner.

Participants

The Regional Scientific Ethical Committee for Southern Denmark, approved the current study (reference number 30513). All participants gave written informed consent, and the rights of the participants were protected.

The participants consisted of two groups, who were either subjects with neck pain or a healthy reference group. Subjects with neck pain were recruited from five private physiotherapist clinics in Copenhagen, Denmark, and the physiotherapists’ consecutively referred patients, who fulfilled the inclusion and exclusion criteria. Healthy participants were recruited via advertisements in local newspapers or among friends or relatives of the three physiotherapists conducting the data collection. Patients with neck pain were eligible for participation if they met the following inclusion criteria: 1) had experienced non-specific neck pain for more than four weeks; 2) were over 18 years of age; 3) had turned to a general practitioner, chiropractor or physiotherapist regarding their neck pain; and 4) spoke and understood Danish. Patients were excluded if they had radiculopathy (e.g., positive Spurling’s Test, Upper Limb Tension Test [64, 65]). Healthy subjects were eligible to participate if they: 1) were over 18 years of age; and 2) spoke and understood Danish. They were excluded if they: 1) had neck pain within the last year causing absence from work or a significant reduction in daily activity level for more than three days; 2) had back, shoulder or elbow pain; or had 3) a rheumatologic disease (e.g., rheumatoid arthritis). In addition, all participants were excluded if they had been diagnosed with a neurological disorder (e.g., Parkinson’s disease, multiple sclerosis), diabetes or cancer; 2) were pregnant; or 3) had a history of alcohol or drug abuse.

Data collection

Participants were screened for eligibility before participating in the study. If the participants met the inclusion and exclusion criteria, arrangement for the first assessment was scheduled. The first assessment took place with a maximum of five working days between the screening session and the first assessment session. Referred patients received written information materials in hard copy at the clinics. Healthy participants received written information materials via e-mail. Prior to the first assessment session, study procedures were explained in detail to the participants, and participants gave their informed consent. The administrator collected information from participants regarding their gender, age and self-reported height, weight and education level. Neck pain was recorded using a 100 mm Visual Analogue Scale (VAS) anchored with “no pain” at 0 mm and “worst imaginable pain” at 100 mm. Participants completed the Neck Disability Index (NDI) [66], a questionnaire designed to measure Activities of Daily Living (ADL) in patients with neck pain. It consists of ten items, each with six response categories (range 0–5, total score between 0–50) [66].

After completing the questionnaire, participants performed the five clinical muscle performance tests with one examiner, followed by a short break (approx. 10 min.). After the ten-minute rest period, participants performed the same five clinical muscle performance tests with the second examiner. Each test session lasted approximately 30 minutes and the order of the five tests was random. Efforts were made to ensure that all subjects were examined at the same time of day at the first and second assessment session.

Muscle performance tests

Joint position error (head repositioning)

The JPE test was a modified version of Heikkila and colleagues’ kinaesthetic sensibility test [45]. This test measures the subject’s ability to relocate their head to a starting position following active cervical range of motion in flexion, extension and bilateral rotation.

In the modified JPE test, the subject wore headgear (a cap) with sagittal and a frontal measuring tape attached to the back (Figure 1). The tape had measurements at 0.25 cm intervals along a 12 cm length, starting with 0.0 cm in the middle and extending to 6 cm in both directions. The subjects were placed erect in a chair with back support and with approximately 90° of hip and knee flexion. The feet were firmly placed on the ground. A spirit level laser (Class 3A Laser product, Wen Zhou Xinke, China) was placed on a flat and stable surface behind the subject. The spirit level laser was positioned with the laser pointing at the centre of the measuring tape (i.e., at 0.0 cm). The starting position was sitting with the head in a neutral position (i.e., 0.0 cm) and with eyes closed. Subjects were asked to memorize this position. They maintained the position for a few seconds before performing a full active cervical rotation, followed by relocation of the head to the starting position. They were instructed to perform the test, as accurately as possible and to verbally indicate when they perceived having returned to their starting position. This position was recorded. The examiner registered the distance from the recorded position to 0.0 cm on the measuring tape. Between each trial, the examiner manually adjusted the participant’s head to match the original starting position (i.e., 0.0 cm) and gave no feedback on accuracy. No verbal or visual feedback was provided during the test. A familiarisation trial was conducted before the formal trial. The rate at which participants performed the movements was not formally controlled. However, all subjects were instructed to move at a comfortable pace. Participants performed a total of three trials of each movement direction in the following order: right cervical rotation; left cervical rotation; neck flexion; and neck extension.

Cranio-cervical flexion test

The CCFT is a clinical assessment of the deep cervical flexor muscles function [28, 30]. The CCFT was performed with participants lying in supine crook on a plinth with the neck in a neutral position. Where necessary, head position was adjusted so the line of the face was horizontal by placing layers of towels under the head [30]. A deflated pressure biofeedback unit (Chattanooga Ltd Hixson, USA), with a pressure transducer attached, was placed underneath the neck abutting the occiput (Figure 2). It was inflated to a stable baseline pressure of 20 mmHg. Participants were instructed to perform a small, gentle and smooth nodding action (like saying ‘Yes’) to achieve cranio-cervical flexion. Progressive nodding action increased the pressure from the baseline of 20 mmHg to 22, 24, 26, 28 and 30 mmHg. Participants were instructed to maintain an isometric contraction at each progressed pressure level for ten seconds, before returning to a neutral position. A short break was given between each trail. Subjects were allowed one practice session to familiarise themselves with the test procedure and verbal feedback was provided to correct any incorrect movement strategies. The examiner observed the subject’s performance. When necessary, the examiner palpated the superficial neck muscles to ensure no use of incorrect movement strategies (e.g., undue use of superficial flexor muscles [e.g., m. Sternocleidomastoideus], posterior retraction of the head, breath holding, overshooting of the target pressure). The examiner recorded which level of pressure the participant successfully achieve.

Muscle endurance tests

The NFME test was based on a modified version of Harris et al. [27]. It is a clinical neck flexor muscle endurance test. The test was performed with the subject lying in supine crook on a plinth with the head in a neutral position (as during the CCFT). The participant wore headgear (a cap) with a 2 cm wide measuring tape applied to the top of the cap. A spirit level laser (Class 3A Laser product, Wen Zhou Xinke, China) was placed on a flat and stable surface above the subject (Figure 3). Initially, the participant was instructed to place their upper cervical spine in a slightly flexed position and gently lift their head off the plinth, while maintaining the upper cervical flexion. Subjects were allowed one short practice trial. The spirit level laser was positioned with the laser pointing at the centre of the measuring tape. The participant was instructed to hold the starting position for as long as possible. Verbal encouragement was given (e.g., “Hold your head up” or “Tuck your chin in”) if the participant started to change their head posture. The test was terminated when the laser moved outside either above or below - and thereby exceeded - the measuring tape due to head movement indicating fatigue (i.e., inability to maintain upper cervical flexion, increase in neck flexion or lowering of the head). The examiner recorded time to termination as the holding time in seconds. The participants performed this trial once.

A modified NFME test was performed with the participant sitting in a 45°-upright position. The plinth served as back support (Figure 4A). The participant wore the same headgear, but with a 1.5 cm wide measuring tape applied on the side of the cap, approximately 2 cm above the right ear (Figure 4B). The spirit level laser was placed on the right side of the subject. The laser pointed at the centre of the measuring tape. Participants were allowed one short practice trial. Starting position was set as described above and the same instructions were given. The test was terminated when the laser moved outside either above or below - and thereby exceeded - the measuring tape due to head movement indicating fatigue (i.e., inability to maintain upper cervical flexion, increase in neck flexion or lowering of the head). The examiner recorded time to termination as the holding time in seconds. The participants performed this trial once.

Neck extensor test

The neck extensor test (NET) is a dynamic clinical test, which targets neck extensor muscle endurance. It was performed with the participant lying prone, with arms at the sides and the head over the edge of the plinth (Figure 5), initially supported by the examiner. The participant wore headgear (a cap) with a 2 cm wide measuring tape applied to the top of the cap. A spirit level laser was placed in front of the plinth (Class 3A Laser product, Wen Zhou Xinke, China). The examiner held the participant’s head in a neutral position, with the laser pointer at the centre of the measuring tape. The test began when the examiner stopped supporting the subject’s head. The participant was instructed to maintain a neutral head posture while performing a small side-to-side head rotation. They were told to perform the rotation at a smooth and slow pace. The rate at which participants performed the movements was not strictly controlled. However, all subjects were instructed to move at a comfortable pace. Participants were allowed one short practice trial. Verbal encouragement was given (e.g., “Hold your head up”), if the participant started to change their head posture. The test was terminated when the laser moved outside either above or below - and thereby exceeded - the measuring tape due to head movement indicating fatigue (i.e., inability to maintain upper cervical flexion, increase in neck flexion or lowering of the head). The examiner recorded time to termination as the holding time in seconds. The participants performed this trial once.

Statistical analysis

Intra- and inter-rater reliability was assessed as recommended by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist [61, 67]. For assessing intra- and inter-rater reliability, ICC agreement values with 95% CI were calculated [61, 67]. ICC agreement is preferred as it takes systematic and random errors into account [61]. Bland-Altman’s LOA [68] was used for evaluating agreement between the rater’s scores. Furthermore, measurement errors were estimated by calculating the Standard Error of Measurement (SEM) using formula: SEM consistency = SDdifference/√2 (SDdifference = Standard deviation of the mean differences between examiners A and B). The Smallest Detectable Change (SDC) was calculated using the formula: 1.96 * √2 * SEM [61, 67].

Landis [69] criteria were used to interpret ICC agreement values: slight (r = 0.00-0.19); fair (r = 0.20-0.39); moderate (r = 0.40-0.59); substantial (r = 0.60-0.79); and almost perfect (r = 0.80-1.0) reliability [69]. Primary data analyses were performed for the whole group due to the small sample size. Data were analysed using SPSS version 19.0 (IBM®, SPSS, statistics). ICC agreement values (model 2.1.A) and 95% CI were calculated using ‘scale analysis’ with a two-way random effect model and ’absolute agreement’. For JPE, average measurements are reported. For the CCFT, the NFME and NET tests’ single measurements are reported. For head repositioning, no statistically significant differences were found between the three right and three left cervical rotation trials (post hoc analysis two-sample t-test, p = ≥0.70). Therefore, data from left and right cervical rotation were pooled in the final analysis. Adequate sample size is required to achieve an admissible 95% CI for ICC values and a sample size of 50 participants is recommended to assess reliability [70]. Additionally, a post hoc analysis was performed by a two-sample independent T-test to explore possible differences in mean scores between patients with neck pain and healthy subjects. This was done although the study was not designed with power to perform a strict specificity analysis. Statistical significance was accepted at P values less than 0.05.

Results

A total of 63 subjects participated in the study. The descriptive characteristics of the 33 patients with neck pain and the 30 healthy subjects are provided in Table 1 with a summary of age, gender, height, body mass, body mass index, education level, VAS and NDI scores. Thirty healthy subjects (17 females/13 males) completed the first and second assessment sessions. Thirty-three patients with neck pain (25 females/8 males) completed the first assessment session and 31 patients with neck pain (23 females/8 males) completed the second assessment session. The two drop-outs were due to increased neck pain following the first assessment session and lack of time, respectively.

Table 1 Demographic characteristics of the patients with neck pain and healthy subjects*

Full size table

Intra-rater reliability

Summarized statistics are presented for each of the muscle performance tests (examiners A and B) in Table 2. Overall, intra-rater reliability ranged from slight to almost perfect with ICC values between 0.14 and 0.82.

Table 2 Intra-rater reliability of the five muscle performance tests

Full size table

Joint position error (head repositioning)

By and large, ICC values indicate moderate to almost perfect reliability for the JPE tests, ranking from 0.50 and 0.80. The highest ICC values were found for neck flexion (0.82 (95% CI [0.71-0.89]) and neck extension (0.80 (95% CI [0.66-0.88]) (examiner A), with 95% of the LOA measurement variation ranking between −0.640-0.666 cm (Table 2). However, examiner B presented the lowest ICC values for neck flexion (0.64 (95% CI [0.40-0.79]) and neck extension (0.48 (95% CI [0.13-0.67]). Bland-Altman plots revealed that the greater part of the differences between the two examiners was less than 1 cm for neck flexion and neck rotation. For neck rotation, ICC values implied substantial reliability for both examiners (Table 2). The SDC ranked from 0.52 cm (neck rotation) to 0.72 cm (neck extension) and SEM ranked between 0.19 cm (neck rotation) and 0.26 cm (neck extension) (Table 2).