Skip to main content

The Valued Life Activities Scale (VLAs): linguistic validation, cultural adaptation and psychometric testing in people with rheumatic and musculoskeletal diseases in the UK



The Valued Life Activities Scale (VLAs) measures difficulty in daily activities and social participation. With various versions involving a different number of items, we have linguistically and culturally adopted the full VLAs (33-items) and psychometrically tested it in adults with rheumatic and musculoskeletal diseases in the United Kingdom.


Participants with Rheumatoid Arthritis, Ankylosing Spondylitis, Chronic Pain/ Fibromyalgia, Chronic Hand/ Upper Limb Conditions, Osteoarthritis, Systemic Lupus, Systemic Sclerosis and Primary Sjogren’s Syndrome were recruited from out-patient clinics in National Health Service Hospitals, General Practice and patient organisations in the UK. Phase1 involved linguistic and cultural adaptation: forward translation to British English; synthesis; expert panel review and cognitive debriefing interviews. In Phase2 participants completed postal questionnaires to assess internal construct validity using (i) Confirmatory Factor Analysis (CFA) (ii) Mokken scaling and (iii) Rasch model.


Responders (n = 1544) had mean age of 59 years (SD13.3) and 77.2% women. A CFA failed to support a total score from the 33-items (Chi Square 3552:df 464: p < 0.0001). Mokken scaling indicated a strong non-parametric association between items. Fit to the Rasch model indicated that the VLAs was characterised by multidimensionality and item misfit, which may have been influenced by clusters of residual item correlations. An item banking approach resolved a 25-item calibrated set whose application could accommodate the ‘does not apply to me’ response option.


The UK version of the VLAs failed to satisfy classical and modern psychometric standards for complete item sets. However, as the scale is not usually applied in complete format, an item bank approach calibrated 25 items with fit to the Rasch model. Suitable Computer Adaptive Testing (CAT) software could implement the item set, giving patients the choice of whether an item applies to them, or not.

Peer Review reports


Rheumatic and musculoskeletal diseases (RMDs) such as Osteoarthritis (OA), Rheumatoid Arthritis (RA), Chronic Pain (CP) and Fibromyalgia (FM), are common, and their prevalence is rising with the ageing population [1]. Many individuals with RMDs report moderate to high pain and fatigue which can lead to activity limitation and participation restriction, which affect Quality of Life (QoL) [2,3,4,5]. Therefore, European League Against Rheumatism (EULAR) recommendations for health professionals’ approach to pain management in inflammatory arthritis and OA, emphasise pain is a complex and multifaceted experience. Treatment should be guided by patient’s preferences and priorities, such as the impact on their activities and participation, in order to facilitate improved health outcomes [6]. Patient reported outcome measures (PROMs) can be used to identify such preferences and priorities. However, few include both activities and participation items.

Developed in the United States (USA), the Valued Life Activities scale (VLAs) is one such PROM, measuring both difficulty in daily activities and participation in society [7]. It was developed from the 75-item Activities Enumeration Index [8] which was derived from content analysis of diaries and telephone interviews with patients with RA or OA [7, 9,10,11].

The VLAs is based on Verbrugge and Jette’s disablement model [12]. This defines activity and participation in three domains:

  • Obligatory: required for survival and self-sufficiency, such as eating, hygiene, walking and transport

  • Committed: related to one’s principal social roles, such as paid work, child and family care and household responsibilities and

  • Discretionary: engaged in for relaxation and pleasure, such as socialising, exercise, leisure, hobbies, religious activities, travel, volunteer work, educational activities, gardening.

The VLAs developers have allocated items to these three domains based on the model’s definitions [7] (Additional File 1).

The VLAs has been used in over 10 cohort studies with large numbers of people with RA and systemic lupus erythematosus (SLE), but the way in which it has been administered varies, with different studies using different numbers of items (i.e. 33, 29, 26, 21, or 14 items) – see Additional Table 1); some items differing between versions (depending on diagnosis), and several different scoring methods being used. These methods include: the average difficulty score for all items and for each of the three domains; the average score created by adjusting scores if the person reports changing how they perform the activity (e.g. use an assistive device, have help, take more time or limit time performing), with item scores being increased by one point if the score < 2 [13]; or calculating (unadjusted) scores only for those items identified as important by the participant [7, 14]. Accordingly, we requested the definitive version and scoring method from the lead scale developer (Dr P. Katz). This was identified as the 33-item version scored on a 4-point scale (0 = no difficulty to 3 = unable to do). People are asked to record for each item: whether it is not applicable to them (i.e. the person does not normally perform the activity for reasons unrelated to their condition); their degree of difficulty performing it; and whether the item is important to them [13]. The overall score is then calculated as the mean of only those items identified as both applicable and important. As a result, different respondents’ scores are based on different numbers of items within the VLAs, as the intention is to score only those activities which are “valued” by participants.

Some psychometric testing has been conducted with the 33-item and shorter versions, although with differing scoring methods, demonstrating internal consistency and test-retest reliability. The 14-item Short-VLAs, was developed using Rasch analysis, and unidimensionality, construct and concurrent validity have also been demonstrated [15, 16]. However, the variability in how the tool has been administered (differing numbers of items) and scoring methods means there is currently limited evidence for the reliability and validity of the 33-item VLAs.

The VLAs, and the way in which is used, presents considerable challenges to deliver a robust psychometric analysis. For example, in the full 33-item set, respondents may simply respond to an item saying ‘it is not relevant to me’ then, in practice, a valid response may arise from any combination of the 33 items. As such, there are a vast number of possible combinations available (33 factorial). The current practice is to average the responses to the chosen items, giving a total score in the range 0–3. There are two major problems with this approach; the responses are ordinal and do not support mathematical operations such as averaging, which requires at least interval scaling. Even if this is unfortunately ignored, such averaging would only be interpretable if every item had the same level of difficulty. Neither of these conditions hold for items in ordinal scales [17, 18].

How then can a scale such as the VLAs be shown to be psychometrically sound? To satisfy traditional psychometric standards, the various items sets need to be shown to be reliable, valid, unidimensional and invariant for key groups (32). The items themselves need to be locally independent (conditional on the trait), although failure of this requirement often reflects a degree of item redundancy. The key issue here is that the item set, from which choices of relevant items are made, is robust from a psychometric perspective.

Nevertheless, even if the various versions of the scale are shown to be robust, there remains the challenge of the scoring associated with, potentially, a very large number of subsets as chosen by the user. It is here that a variation of Computer Adaptive Testing (CAT) can resolve the issue. With a calibrated set of items (e.g. indicating the level of difficulty associated with each of the 33 VLAs items), these can be administered to the respondent, as long as there is a ‘not relevant for me’ option, which will be treated as a missing value by the CAT, so moving on to the next item.

Consequently, the analytical strategy required is to first assess the traditional psychometric properties of the VLAs versions, and then proceed to determine if a calibrated item set suitable for CAT can be found, given any limitations observed in the traditional analysis.

Before a PROM can be used in another language, or country with the same language, it is necessary to adapt the PROM and psychometrically test it in the target group(s). Thus the aims of this study were to develop a British English version of the VLAs (using the full 33-item scale) following recommended linguistic and cultural adaptation guidelines [19, 20], and to test its psychometric properties in adults with RMDs in the United Kingdom (UK). We also investigated the psychometric properties of two shorter versions of the VLAs (26 and 14-items), embedded within the 33-item definitive version. The 26-item version, which had split the ‘physical activities’ item into two, but was included as one item, as in the 33-item version, so making it, in practice, a 25-item version, together with the 14-item short form. Thus, the adaptation, and following psychometric analysis focused on the 33, 26(25) and 14-item versions.


Study setting

Recruitment of people with RA was conducted through rheumatology outpatient clinics in 17 National Health Service (NHS) Hospitals. Participants with RA from a previous PROM study were also contacted [21]. Recruitment of people with the other seven RMDs was from 19 rheumatology or orthopaedic out-patient hospital departments, four General Practitioner (GP) surgeries, and from 10 RMD patient organisations in the UK.

Eligibility criteria

Inclusion criteria were people: aged ≥18 years; diagnosed with Rheumatoid Arthritis (RA), Ankylosing Spondylitis (AS), Chronic Pain (CP) or Fibromyalgia (FM), Chronic Hand and Upper Limb Conditions (CHUL), Osteoarthritis (OA), Systemic Lupus (SLE), Systemic Sclerosis (SS), and Primary Sjogren’s Syndrome (PSS) by either a rheumatology consultant, or an orthopaedic consultant, GP or extended-scope health professional (in the case of OA and CP/FM specifically); able to read, write and understand English; and provide written informed consent.


Phase-1: cross-cultural adaptation

We followed recommendations for linguistic and cross-cultural adaptation [19, 20]. As the 33-item VLAs is written in North American English, backward translation was not required [Additional File 1]. Two native British English speakers forward translated the VLAs; one of whom was a rheumatology occupational therapist and the other was not involved in health care and was unfamiliar with health outcome measures. Following forward translation, the two translators resolved any discrepancies. A North American speaker, with an academic background, also helped with checking that the forward translation reflected the accurate meaning of the item sets. An Expert Panel, consisting of three occupational therapists, a physiotherapist, a methodologist and a layperson with RA (all English speakers as their first language) discussed the translation to agree a prototype British English VLAs. This was then reviewed by the panel for semantic (i.e. do words mean the same thing), idiomatic (e.g. presence of colloquialism or idioms), experiential and conceptual equivalence to the original 33-item North American English version of the VLAs.

Cognitive de-briefing interviews

Cognitive de-briefing interviews were conducted with a purposive sample of participants with RA identified from the participants of a previous study residing within the Midlands and North West of England [21]. The sample included a wide range of demographic characteristics and health status (i.e. range of age, gender, disease duration and work status). The questionnaire booklet was posted for completion at home one week before a cognitive de-briefing interview conducted face-to-face or by telephone by an occupational therapist, depending on the participant’s preference.

These semi-structured interviews determined whether the VLAs items were relevant, understandable and comprehensive, and to confirm participants’ understanding of the items matches the intended use [19]. Participants were asked to rate the relevance and comprehensibility of the VLAs using a five-point likert scale (1 = not relevant to 5 = very relevant; and 1 = very easy to understand to 5 = very difficult to understand). Interviews were audio-recorded and transcribed for ease of content analyses. A preliminary report of the findings was reviewed by the Expert Panel to agree on recommended changes prior to finalisation. A final version of this report and the British English VLAs were submitted to the lead developer in the USA for review and the lead developer approved the changes.

Phase-2: psychometric testing


Participants with one of the eight RMDs as their primary diagnosis were recruited by research nurses or therapists using an eligibility checklist to screen patients. Additionally, patient organisations, such as the National Rheumatoid Arthritis Society (NRAS), Arthritis Care, National Ankylosing Spondylitis Society (NASS) and Fibromyalgia Action UK (FMA UK), mailed out study invitation letters, information sheets and a reply form to random samples of their members to help recruit participants. The reply form included the eligibility checklist items. Both rural and urban populations and a wide mix of socio-demographic characteristics were included (Fig. 1).

Fig. 1

British VLAs Overall Recruitment & Study Progress Flow Diagram

Data collection

Data were collected using postal questionnaires. The questionnaire booklet included demographic and health data (e.g. age, gender, marital, educational and employment status, disease duration, medication regimen), the 33-item VLAs and two measures of physical function: the Health Assessment Questionnaire (HAQ) [22], the SF36 v2.0 [23]; as well as a 0–10 Numeric Rating Scale (NRS) reporting disease activity.

Sample size

The sample size calculation for Rasch analysis suggested that a sample of at least 150 for each condition will give 99% confidence of the person estimate being within ±0.5 logits, irrespective of whether or not the scale is well targeted to the patients [24]. We chose to recruit a higher number of people with RA as we aimed to conduct secondary analysis with the RA data, if the VLAs demonstrated appropriate psychometric properties. We stopped recruitment once we had at least 150 sufficiently completed questionnaire booklets.

Statistical analysis

Confirmatory factor analysis

The VLAs has undergone revision over time, such that there are several versions with 33- items being the definitive version. The other versions are nested within the 33-item scale, but the 26-item version includes two items for physical recreational activities (moderate and vigorous), rather than one item, as in the 33-item VLAs. Accordingly, when testing two shorter versions of the VLAs, we derived a 25-item VLAs (rather than 26- item version) from the 33-item version, as well as testing the Short VLAs (SVLAs: 14 items).

Confirmation of the 33-item structure from a classical test perspective would follow from a Confirmatory Factor Analysis (CFA) where a priori there is evidence that the item set constitutes one, or a series of domains [25]. Following Kline, fit is determined by a non-significant chi square statistic [26]. Ancillary fit statistics include the RMSEA where a value less than 0.06 would be appropriate, the Comparative Fit Index (CFI), a comparison of final model and baseline model, and the Tucker Lewis Index (TLI), another incremental fit Index which adds penalties for increasing the parameters. Both indices would suggest good fit with values above 0.95. Thus, in the present study, the item set is fit to a CFA model in Mplus [27] and tested for the three domains (Obligatory, Committed and Discretionary) and the total score only for “important and applicable” items.

Mokken scaling

The Mokken scale is a non-parametric probabilistic model that utilises the Loevingers H coefficient to determine the ‘scalability’ of a set of items. H appears to be a measure of the degree to which the score is able to discriminate between persons in the given sample [28]. It has been argued that Mokken scaling is a natural starting point for item analysis, and it is used here in that context, to identify if any items from the VLAs display a level of discrimination inconsistent with the expectations of the Rasch model, as represented by low values (< 0.3) of H [29]. In the present study Mokken scaling is examined through the msp procedure in STATA 13 [30].

Rasch model

Data from the 33 items were fitted to the Rasch model to ascertain if a quantitative structure was present within the domain(s) being measured [31]. Described in detail elsewhere [32], the process is used to test fit to the model expectations, unidimensionality, (conditional) local item independence and invariance (Differential Item Functioning) by contextual groups of age, gender, employment and marital status, duration of disease, and where data are pooled, by condition [33, 34]. Briefly, the RUMM2030 Rasch software [35] has a summary Chi-Square Interaction statistic, which should be above 0.05 if data fit the model. It has residual item and person means and standard deviations, the latter which need to below 1.4 to ensure no individual item is beyond a ± 2.5 range. Reliability of the items set was also reported in the form of a ‘person separation Index’ which, should the data have a normal distribution, is equivalent to Cronbach’s Alpha (internal consistency) [36], else the value will deviate from Alpha. A post hoc t-test is undertaken to determine unidimensionality, contrasting two estimates derived from item sub-sets loading positive and negative on the first residual principal component [37]. The number of contrasts between estimates where the t-test < 0.05 should not exceed 5% to be indicative of unidimensionality (or the lower confidence interval of that proportion of tests).

Following this, informed by the above analysis of 33 items, a calibration of the item set was attempted to form the basis of a CAT. To avoid the potential bias caused by a breach of the local independence assumption, first a set of ‘core’ items that fit the model and were free of local dependency were identified [38]. In doing so surplus items were set aside into a series of secondary item sets, which were subsequently fit to the model, anchored to the core metric by items in the core set which were free of dependency. Fit of the core and subsequent item sets to the Rasch model were tested by repeated sampling of the total data set to ensure the Type 1 error rate of the fit is accurate [39]. In this way, a calibrated set of items became available that could be administered in an innovative fashion by appropriate CAT software. The efficacy of the CAT process was evaluated by simulation using the Firestar programme [40].

The analysis uses the RUMM2030 software utilising the partial credit parameterisation of the Rasch model [35, 41].


Phase 1: cross-cultural adaptation

Cognitive debriefing interviews were conducted with 31 participants with RA whose socio-demographic and health characteristics are detailed in Table 1.

Table 1 The British VLAs Socio-demographic Characteristics of Participants [Phase 2 – T1]

In general, all British English VLAs items were deemed important and relevant. In terms of comprehensibility, item 13 “going to social events, parties, or celebrations” and item 18 “taking part in leisure activities OUTSIDE your home, such as going to the pub, bingo, going to the cinema, club meetings, restaurants” raised the question whether these are measuring the same concept amongst most participants (n = 21) as they required similar considerations to be able to participate. For example, participants noted participation depended on location and accessibility. Several participants (n = 8) queried whether item 21 [driving or getting around your community by public transport] should be divided into separate items as they perceived “driving” and “using public transport” different transport options. However, when explained that this item measures participation (i.e. at a societal level) rather than activity limitation (i.e. at a personal level) they did not think it needed to change. Two participants suggested that item 27 (taking care of social communication, such as writing letters, sending emails, making phone calls or texting) could be separated into verbal and written communication. However, as this was raised by only two out of 31 participants, the original item remained unchanged.

Participants also struggled with the question “Do you have to make changes to how you do this activity because of your arthritis?” They were unclear whether to tick ‘no’ or just leave it blank if ‘unable to do’ the activity. This issue was resolved by adding further instructions to the VLAs to aid responder’s decision making. Item 33 “having intimate relations with your spouse/ partner” was perceived as too intrusive by some participants (n = 6). However, as the majority of the responders found this item to be relevant and appropriate, the item was retained.

Following the cognitive de-briefing interviews, no new items were added. Instead, some changes were made to the layout and wording of the items, so they are relevant and comprehensible to the British population (Additional File 2). The changes made were submitted to the lead developer who agreed to these, as these were acknowledged as differences in expression between North American and British English.

Phase-2: psychometric testing

In Phase-2, 1929 NHS patients were screened, and a further 3365 invitations were sent through Patient Organisations (Fig. 1). From both of these sources, 1946 were interested and eligible, of whom the most (97%) consented; and 1546 (81%) returned the postal questionnaire. The participants’ socio-demographic and health characteristics are detailed in Tables 1 and 2. The response options to all 33 items are shown in Table 3, including the percentage of those reporting that an item “Does not apply to me”. Only 79 (5%) respondents completed all 33 items (including the “does not apply to me” option).

Table 2 The British VLAs Participants’ Health Characteristics
Table 3 Responses to the British VLAS (n = 1545)

Construct validity (confirmatory factor analysis and Mokken scaling)

A Confirmatory Factor Analysis failed to support a unidimensional scale from the 33-item VLAs (Chi Square 3552:df 464:p < 0.0001; RMSEA 0.066(90CI: 0.064–0.068); CFI .985; TLI 0.984); the 25-item VLAs (Chi Square 2836:df 275:p < 0.0001; RMSEA 0.078(90CI: 0.076–0.081); CFI .987; TLI 0.986; or the 14-item Short VLAs version (Chi Square 1228:df 77:p < 0.0001; RMSEA 0.099 (90CI: 0.094–0.104). Based on the item classification into the three domains (Obligatory, Committed, Discretionary), the three-domain structure of the item set also failed (Chi Square 2693:df 272: p < 0.0001; RMSEA 0.076(90CI: 0.074–0.079);

CFI .987; TLI 0.986). Modification indices throughout these analyses indicated substantial cross loading, particularly between Obligatory and Committed items, and substantial local dependency among pairs of items, thus requiring correlated errors. Given the ancillary fit statistics were more supportive, the results suggest that the disturbance of structure may be strongly influenced by clusters of locally dependent items. A Loevinger Coefficient from Mokken scaling of 0.87 for all 33 items indicated a strong non-parametric association between items, and despite the lack of evidence of unidimensionality (which is an assumption of Mokken), provided sufficient evidence to move forward to a Rasch analysis of the data.

Rasch: diagnostics

Fit of the data from the VLAs to the Rasch model is shown in Table 4. An initial Likelihood Ratio test to determine if a Rating scale or Partial Credit parameterisation was appropriate supported the latter (Chi-Square 1281.3 (df 63); p = < 0.0001). For each of the eight conditions, fit is shown for the 33, 25 and 14-item versions. Only four analyses satisfied the stochastic ordering (fit) and unidimensionality assumptions (AS-25; AS-14; SS-25; PS-14). Even here, the local independence assumption was breached by clusters of residual item correlations, although of insufficient magnitude to affect the fit and unidimensionality tests. Elsewhere, the VLAs was characterised by multidimensionality and misfit, which again may have been influenced by extensive clusters of residual item correlations. While reliability was high in all cases, this could be expected to be inflated in the presence of local response dependency, as identified through the residual correlation patterns. Differential item function was occasionally present for age, gender and marital status, but not for education or duration of condition. For example, “Doing heavy housework’ was more difficult for females at any level of ability. DIF was also present for condition in 15 of the 33 items. For example, for those with RA, ‘traveling long distances’ was more difficult than other conditions at all levels of life activity. Likewise, ‘Taking care of social communication’ was more difficult for those with chronic hand/upper limb conditions, at any level of life activity. Overall, the easiest activities (difficulty rarely affirmed) were ‘eating’ and ‘taking part in leisure activities in the home’, while the hardest activities (difficulty common) were ‘minor home repairs’ and ‘gardening’.

Table 4 Rasch Analysis of Various Versions of the Scale by Condition

The clusters of locally dependent items did not necessarily conform to the Obligatory, Committed or Discretionary domains. For example, in people with RA, the items ‘doing other work around the house’ and ‘gardening or outdoor property work’ were designated as Committed and Discretionary respectfully, displayed a residual correlation of 0.506 in the 33-item version, and 0.447 in the 25-item version. Nevertheless, fit to the model constrained to within the domains showed some improvement, although the occasional misfit and multidimensionality remained (Table 5). This suggests that much of the disturbance of fit and dimensionality could be attributable to the local dependency issue.

Table 5 Rasch Analysis of the Domain Scores

An item Bank approach

Consequently, the item bank approach was applied. A core set of 15 items were shown to fit the Rasch model across most indicators (Table 6, Analyses 1–3). However, the item ‘Travelling long distances’ showed DIF by condition with, for example, RA and OA showing distinct differences in expected response at any level of difficulty, the former having more difficulty than the latter (Fig. 2). Having set aside the surplus items from the local dependency analysis, a second item set was created with 10 items, which again showed fit to the model and no DIF by condition (Table 6. Analyses 4–6). Thus, 25 of the 33 items were available for CAT, and with the second set calibration anchored by three items from the core set, all items were calibrated onto the same unidimensional interval scale metric. The mean number of items chosen (i.e. excluding “does not apply to me” responses) from the VLA-CAT25 in the main data set was 17.4, and the maximum was 24 (3.5%) (Fig. 3). Simulation of the efficacy of the CAT identified that for group use the average number of items required to achieve an alpha of ≥0.7 was 4, and 11 to achieve an alpha of 0.85 for individual use. Consequently, it would appear that given the patient choice of relevant items, the CAT can in most cases accommodate both individual and group estimates with the required reliability, should the distribution shown in Fig. 3 be replicated elsewhere.

Table 6 Rasch Analysis of Computer Adaptive Testing item sets
Fig. 2

Comparison of RA & OA on response to traveling long distances

Fig. 3

Distribution of items chosen from the VLAs-25 CAT item set

Summary of the results

The 33-item VLAs was linguistically validated and culturally adapted for British people aged ≥18 years with RMDs following recommended guidelines. The British English VLAs retained all of the original 33-items, with some changes to the wording, template and instructions to make it easily understandable by British people. Following this, the VLAs was tested in its 33, 26 and 14 item versions with British people across 8 different RMDs to verify its psychometric validity and reliability (internal consistency). The latter two versions were nested within the 33-item version, with a minor change to the 26-item version which had split an original item into two parts. The results of the statistical analysis show that the VLAs, in its various summated forms (i.e. adding together items in complete sets scored using those items identified as important to the person) was not a valid measure of valued life activities. Only 5% of the sample considered all the items applied to them.

When a calibration was made for use in a CAT, 25 of the 33 items were retained, and formed a valid unidimensional item set, largely invariant by condition. The CAT could provide sufficient reliability to accommodate both individual and group estimates. Using suitable CAT software, these items could be administered taking account of both the varying difficulty of the items, the local dependency that exists, and the DIF on the ‘travel’ item, so giving an estimate of VLA on an 0–100 interval scale, irrespective of the number of items chosen.


The Valued Life Activities scale was completed by a large number of people across eight RMDs. The VLAs was perceived as a relevant and understandable measure of activities and participation by British people with RMDs. However, robust psychometric testing of the British VLAs in the context of the current scoring method (i.e. summing items identified as important to the respondent only) of the 33, 25 and 14 item versions showed that, due to local item dependency, multidimensionality and misfit to Rasch model expectations, the VLAs had insufficient validity to enable a recommendation for its use as summated item sets in clinical evaluation or research. The usual strategy of scoring only those items that apply to the individual does not exempt the underlying item set from basic psychometric requirements, as the choices that are made deliver an almost infinite subset of items from the whole, each of which should satisfy those same requirements.

The ‘Does not apply to me’ response also raises substantial problems with how these items are scored, and how to deal with this response (in addition to any other type of missingness). The problem is similar to that observed for Goal Attainment Scaling where patients are involved with the choice of goals for their rehabilitation [42]. While Rasch analysis can deal with both structural and ordinary missingness, and multiple imputation techniques can provide complete data sets, this is unlikely to be available in routine clinical practice [43]. Also, imputation techniques are not designed to deal with ‘missing not at random’ instances, which is likely to be the case with the ‘does not apply to me’ option. Furthermore, the usual strategy for scales to provide a transformation table from raw score to interval scaled Rasch metric would also not apply, as it is only valid in the presence of complete data, which is not attainable under the present scoring method. This also affected actions to remedy the effects of local dependency, that is by creating ‘super items’ (testlets) by adding together clusters of items, as the ‘Does not apply to me’ option resulted in case-wise deletion at the testlet level. Given these problems, it was not possible to test for DIF cancellation at the scale level due to the restriction upon creating testlets [44]. Furthermore, under the current scoring method, DIF would have to be assessed across all possible combinations of items to examine if any DIF is observed, and would cancel across the chosen items, given the person estimate would be re-estimated for each unique combination of items.

Some of the above problems were accommodated through a CAT design, identifying 25 items (in two sets of 15 and 10) which demonstrated fit to the Rasch model, including unidimensionality and invariance by most contextual groups. The DIF by condition for the ‘travel’ item needed a condition-specific item location estimate for those conditions affected. The calibrated item set, given suitable CAT software, could be administered to patients, offering the option of ‘not important for me’ and ‘not applicable to me’.

Implications for clinical and research practice

The main implication for clinical and research practice is that the implementation of the above solution requires access to CAT software or some system to provide CAT-based estimates, and appropriate IT infrastructure at the clinic level, or at least that the patient has online facilities at home. One application, the smartCAT system, was designed to facilitate such an environment, but requires on-line interaction with its server, which will return an estimate in real time to the source, including an appropriate clinical setting, as required [45]. It can cope with clusters of locally dependent items, and different estimates to account for DIF where present. Another CAT solution can be found with the Concerto software, which is an open-source online adapting testing platform [46]. The former has a small charge per assessment, while the latter is free, but psychometric and technical applications can be supported as required for a fee. So, when suitable software is available, using the VLAS in this manner addresses the EULAR recommendations of assessing patient’s preferences and priorities concerning the impact upon their activities and participation.


We only conducted cognitive debriefing interviews with people with RA, predominantly from the North West and Midlands regions of England, due to budget and timeline constraints. However, we tested the psychometric properties of the VLAs amongst eight RMDs. Cognitive debriefing with people with other RMDs may have resulted in reduction or addition of new items to the British VLAs.

We intended to also examine test-retest reliability. After 4 weeks of completing the first questionnaire booklet, participants were mailed a second including the VLAs. However, as the Rasch analysis identified significant challenges in calculating scores, we did not progress to test this. There is no reason why a ‘stable’ respondent should choose the same set of items, even within a short time frame. Given the potential number of item sets that could be chosen, the retest can only be done on those who have completed exactly the same set of items across time. The question then arises as to whether or not a failure to choose the same set of items constitutes a lack of test-retest reliability. Even where the same set of items are chosen, given the possible number of combinations available, each combination should have sufficient cases for the analysis, as though they were distinct scales. Further work is required to consider how test-retest reliability may be undertaken in such circumstances.

The VLA-CAT25 item bank itself has only just been developed within this study and will require further psychometric testing and testing in clinical settings to ascertain how well it works in a day-to-day setting. The smartCAT software is at its Beta test stage but has been trialled on a fatigue item bank in a clinical setting in Sweden. It requires careful management of the CAT process, assigning unique patient identifiers, setting up CAT in clinic, or providing links and passwords for use at home. Data protection must be considered as the estimate itself is created on the smartCAT server located outside of the European Union and delivered in real time back to the source or designated setting. Decisions need to be taken as to whether or not the estimate and its associated patient identifier is stored on the foreign server, or not. Similar software programmes are likely to have the same requirements.

The simulation of fit to the Rasch model was not ideal. For example, the distribution of 100 random samples would give more accurate picture of fit of the item bank item sets, rather than just three consecutive samples with replacement. Unfortunately, this option is not available in the software used. Consequently, further work would need to verify the 15 & 10 items set fit. It is not possible to test the fit of the 25 items together, as the second set holds items which were locally dependent with the core set, and which would generate multidimensionality and misfit.


The British version of the VLAs, across various scales, failed to satisfy classical and modern psychometric standards as full item sets. A CAT solution was found that would overcome these limitations and avoid using inappropriate mathematical operations to derive a VLAs score. It provides a Rasch derived interval scaled estimate of Valued Life Activities from the set of items chosen by the participant. Increasingly available CAT software is required to implement the item bank. Further validation of the CAT application is required.



Ankylosing Spondylitis


Computer Adaptive Testing


Chronic Pain


Chronic Upper Limb Pain


Comparative Fit Index


Confirmatory Factor Analysis


Differential Item Functioning


Disease Modifying Anti-Rheumatic Drugs


European League Against Rheumatism




Health Assessment Questionnaire


National Health Service


Numeric Rating Scale


Patient Reported Outcome Measures


Person Separation Index


Primary Sjogren’s Syndrome


Rheumatoid Arthritis



SF-36v2 :

Short-Form 36


Standard Deviation


Systemic Lupus Erythematosus

SS :

Systemic Sclerosis


Tucker Lewis Index


United Kingdom


  1. 1.

    Al Maini M, Adelowo F, Al Saleh J, Al Weshahi Y, Burmester GR, Cutolo M, Flood J, March L, McDonald-Blumer H, Pile K, et al. The global challenges and opportunities in the practice of rheumatology: White paper by the World Forum on Rheumatic and Musculoskeletal Diseases. Clin Rheumatol. 2015;34:819–29.

    PubMed  Google Scholar 

  2. 2.

    Taylor PC, Alten R, Gomez-Reino JJ, Caporali R, Bertin P, Sullivan E, Wood R, Piercy J, Vasilescu R, Spurden D et al: Clinical characteristics and patient-reported outcomes in patients with inadequately controlled rheumatoid arthritis despite ongoing treatment. 2018.

    Google Scholar 

  3. 3.

    Steen Pettersen P, Neogi T, Magnusson K, Hammer HB, Uhlig T, Kvien TK, Haugen IK. Peripheral and central sensitization of pain in individuals with hand osteoarthritis and associations with self-reported pain severity. Arthritis Rheum. 2019;71:1070.

    Google Scholar 

  4. 4.

    Dailey DL, Keffala VJ, Sluka KA. Do cognitive and physical fatigue tasks enhance pain, cognitive fatigue, and physical fatigue in people with fibromyalgia? Arthritis Care Res. 2015;67(2):288–96.

    Google Scholar 

  5. 5.

    Bernard P, Hains-Monfette G, Atoui S, Kingsbury C. Differences in daily objective physical activity and sedentary time between women with self-reported fibromyalgia and controls: results from the Canadian health measures survey. Clin Rheumatol. 2018;37(8):2285–90.

    PubMed  Google Scholar 

  6. 6.

    Geenen R, Overman CL, Christensen R, Asenlof P, Capela S, Huisinga KL, Husebo MEP, Koke AJA, Paskins Z, Pitsillidou IA, et al. EULAR recommendations for the health professional's approach to pain management in inflammatory arthritis and osteoarthritis. Ann Rheum Dis. 2018;77(6):797–807.

    PubMed  Google Scholar 

  7. 7.

    Katz PP, Morris A, Yelin EH. Prevalence and predictors of disability in valued life activities among individuals with rheumatoid arthritis. Ann Rheum Dis. 2006;65(6):763–9.

    CAS  PubMed  Google Scholar 

  8. 8.

    Yelin E, Lubeck D, Holman H, Epstein W. The impact of rheumatoid arthritis and osteoarthritis: the activities of patients with rheumatoid arthritis and osteoarthritis compared to controls. J Rheumatol. 1987;14(4):710–7.

    CAS  PubMed  Google Scholar 

  9. 9.

    Katz PP, Yelin EH. Life activities of persons with rheumatoid arthritis with and without depressive symptoms. Arthritis Care Res. 1994;7(2):69–77.

    CAS  PubMed  Google Scholar 

  10. 10.

    Katz PP. The impact of rheumatoid arthritis on life activities. Arthritis Care Res. 1995;8(4):272–8.

    CAS  PubMed  Google Scholar 

  11. 11.

    Katz PP, Yelin EH. The development of depressive symptoms among women with rheumatoid arthritis. The role of function. Arthritis Rheum. 1995;38(1):49–56.

    CAS  PubMed  Google Scholar 

  12. 12.

    Verbrugge LM, Jette AM. The disablement process. Soc Sci Med. 1994;38(1):1–14.

    CAS  PubMed  Google Scholar 

  13. 13.

    Katz PP, Morris A. Use of accommodations for valued life activities: prevalence and effects on disability scores. Arthritis Rheum. 2007;57(5):730–7.

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Katz P, Morris A, Gregorich S, Yazdany J, Eisner M, Yelin E, Blanc P. Valued life activity disability played a significant role in self-rated health among adults with chronic health conditions. J Clin Epidemiol. 2009;62(2):158–66.

    PubMed  Google Scholar 

  15. 15.

    Neugebauer A, Katz PP, Pasch LA. Effect of valued activity disability, social comparisons, and satisfaction with ability on depressive symptoms in rheumatoid arthritis. Health Psychol. 2003;22(3):253–62.

    PubMed  Google Scholar 

  16. 16.

    Katz PP, Radvanski DC, Allen D, Buyske S, Schiff S, Nadkarni A, Rosenblatt L, Maclean R, Hassett AL. Development and validation of a short form of the valued life activities disability questionnaire for rheumatoid arthritis. Arthritis Care Res. 2011;63(12):1664–71.

    Google Scholar 

  17. 17.

    Forrest M, Andersen B. Ordinal scale and statistics in medical research. BMJ. 1986;292:537–8.

    CAS  PubMed  Google Scholar 

  18. 18.

    Ørnbjerg LM, Christensen KB, Tennant A, Hetland ML. Validation and assessment of minimally clinically important difference of the unadjusted health assessment questionnaire in a Danish cohort: uncovering ordinal bias. Scand J Rheumatol. 2020;49(1):1–7.

    PubMed  Google Scholar 

  19. 19.

    Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000;25(24):3186–91.

    CAS  PubMed  Google Scholar 

  20. 20.

    Acquadro C, Bayles A, Juniper E. Translating patient-reported outcome measures: a multi-step process is essential. J Bras Pneumol. 2014;40(3):211–2.

    PubMed  PubMed Central  Google Scholar 

  21. 21.

    Hammond A, Tennant A, Tyson SF, Nordenskiöld U, Hawkins R, Prior Y. The reliability and validity of the English version of the evaluation of daily activity questionnaire for people with rheumatoid arthritis. Rheumatology. 2015;54(9):1605–15

    PubMed  Google Scholar 

  22. 22.

    Kirwan JR, Reeback JS. Stanford health assessment questionnaire modified to assess disability in British patients with rheumatoid arthritis. Br J Rheumatol. 1986;25(2):206–9.

    CAS  PubMed  Google Scholar 

  23. 23.

    Ware JE Jr. SF-36 health survey update. Spine. 2000;25(24):3130–9.

    PubMed  Google Scholar 

  24. 24.

    Linacre JM. Sample size and item calibration stability. Rasch Meas Trans. 1994;4:328.

    Google Scholar 

  25. 25.

    Brown TA. Confirmatory factor analysis for applied research. 2nd ed: Guildford Press; 2006.

  26. 26.

    Kline RB. Principles and practice of structural equation modeling. 3rd ed. New York: Guildford Press; 2011.

    Google Scholar 

  27. 27.

    Muthén LK, Muthén BO. Mplus user’s guide. 6th ed. Muthén & Muthén.: Los Angeles; 1998-2011.

  28. 28.

    Christensen KB, Kreiner S. Monte Carlo tests of the Rasch model based on scalability coefficients. Br J Math Stat Psychol. 2010 Feb;63(Pt 1):101–11.

    PubMed  Google Scholar 

  29. 29.

    Mokken RJ. Nonparametric models for dichotomous responses. In: Linden WJ, Hambelton RK, editors. Handbook of modern item response theory. New Yprk: Springer; 1997.

    Google Scholar 

  30. 30.

    StataCorp. Stata statistical software: release 13. College Station: StataCorp LP; 2013.

    Google Scholar 

  31. 31.

    Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Rheum. 2007;57:1358–62.

    PubMed  Google Scholar 

  32. 32.

    Rasch G. Probabilistic models for some intelligence and attainment tests, (Copenhagen, Danish Institute for Educational Research). [Expanded edition (1980) with foreword and afterword by B.D. Wright]. Chicago: The University of Chicago Press; 1960.

    Google Scholar 

  33. 33.

    Tennant A, Penta M, Tesio L, Grimby G, Thonnard J-L, Slade A, Lawton G, Simone A, Carter J, Lundgren-Nilsson A, Tripolski M, Ring H, Biering-Sørensen F, Marincek C, Burger H, Phillips S. Assessing and adjusting for cross cultural validity of impairment and activity limitation scales through Differential Item Functioning within the framework of the Rasch model : the Pro-ESOR project. Med Care. 2004;42:37–48.

    Google Scholar 

  34. 34.

    Christensen KB, Makransky G, Horton M. Critical values for Yen’s Q3: identification of local dependence in the Rasch model using residual correlations. Appl Psychol Meas. 2017;41(3):178–94.

    PubMed  Google Scholar 

  35. 35.

    Andrich D, Sheridan BED, Luo G. RUMM2030: Rasch unidimensional models for measurement. Western Australia: RUMM Laboratory; 2009.

    Google Scholar 

  36. 36.

    Cronbach IJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–333.

    Google Scholar 

  37. 37.

    Smith EV. Detecting and evaluation the impact of multidimensionality using item fit statistics and principal component analysis of residuals. J Appl Meas. 2002;3:205–31.

    PubMed  Google Scholar 

  38. 38.

    Doğanay Erdoğan B, Elhan AH, Kaskatı OT, Öztuna D, Küçükdeveci AA, Kutlay Ş, Tennant A. Integrating patient reported outcome measures and computerized adaptive test estimates on the same common metric: an example from the assessment of activities in rheumatoid arthritis. Int J Rheum Dis. 2017;20(10):1413–25.

    PubMed  Google Scholar 

  39. 39.

    Hagell P. Testing rating scale unidimensionality using the principal component analysis (PCA)/t-test protocol with the Rasch model: the primacy of theory over statistics. Open J Stat. 2014;4:456–65.

    Google Scholar 

  40. 40.

    Cho SW. Firestar: computerized adaptive testing simulation program for polytomous item response theory models. Appl Psychol Meas. 2009;33:644.

    Google Scholar 

  41. 41.

    Masters G. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–74.

    Google Scholar 

  42. 42.

    Tennant A. Goal attainment scaling: Current methodological issues. Disabil Rehabil. 2007;29:20–1.

    Google Scholar 

  43. 43.

    Fellinghauer CS, Prodinger B, Tennant A. The impact of missing values and single imputation upon Rasch analysis outcomes: a simulation study. J Appl Meas. 2018;19(1):1–25.

    PubMed  Google Scholar 

  44. 44.

    Wyse AE. DIF cancellation in the Rasch model. J Appl Meas. 2013;14(2):118–28.

    PubMed  Google Scholar 

  45. 45.

    Elhan AH, Oztuna D, Kutlay S, Kucukdeveci AA, Tennant A. An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain. BMC Musculoskelet Disord. 2008;9:166.

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Scalise K, Allen DD. Use of Open-Source Software for Adaptive Measurement: Concerto as an R-based Computer Adaptive Development and Delivery Platform. Br J Math Stat Psychol. 2015;68(3):478–96.

    PubMed  Google Scholar 

Download references


The authors would like to thank: all the study participants for their time in completing questionnaires; the expert panel members for their time and advice: John Grogan (translator), Dr. Kris Hollands (Canadian-English speaking researcher, University of Salford); Angela Jacklin (Rheumatology occupational therapist, Mid Cheshire NHS Trust Hospitals); Graeme McLeish (English language expert, Services for Export and Language, University of Salford); Rachel Gill, Rachel Shuttleworth, Robert Peet and Kate Woodward-Nut (University of Salford) for assistance with data collection and data entry; Dr. Lynne Goodacre (North West Research Design Service) and Ruth Hawkins (patient research partner) for their invaluable contribution. We also like to thank all the Principal Investigators, rheumatology consultants, rheumatology and research nurses, research facilitators, GPs, patient organisations and occupational therapists assisting with participant recruitment and study support at the participating sites: Prof Terry O’Neill, Ann McGovern, Jennifer Green, Angharad Walker, Prof A Jones, Penny Storrs, Jennifer Green, Katherine Kinsella (Salford Royal Hospital); Prof Ian Bruce, Lindsey Barnes, Elizabeth Beswick, Sarah Evans (Manchester Royal Infirmary); Dr. Leena Dass, Dr. Sophia Naz, Lorraine Lock, Dr. Neil Snowden, Denise McSorland, Linda Kent (North Manchester General Hospital); Dr. Chris Deighton, Alison Booth, Jo Morris (Royal Derby Hospital); Melanie Arundell, Victoria Jansen (Pulvertaft Hand Centre, Royal Derby Hospital); Prof David Walsh, Debbie Wilson, Jayne Smith (Kings Mill Hospital, Sherwood Forest Hospitals NHS Foundation Trust); Dr. Chetan Mukhtyar, Loretta Dean, Susan Rowell, Karen Mills, Jane Leeder, Jennifer Perkins, Sarah Wace (Norfolk and Norwich Hospitals); Dr. Bela Szenbenyi, Carol Gray (Diana Princess of Wales, Grimsby); Dr. Mike Green, Anne Gill, Lisa Carr (York Hospital); Dr. Kirsten Mackay, Julie Easterbrook, Liz Burnett, Melanie Stone, Usha Chandra, Patricia Pickford, Christine Dixon (Torbay Hospital); Dr. Mike Green, Alison Miernik, Rachel Bailey-Hague (Harrogate District Hospital); Mr. Niall Graham, Dr. Atheer Al-Ansari, Dr. Catherine Whittall, Jayne Edwards, Julia Nicholas (Robert Jones & Agnes Hunt Hospital, Oswestry); Dr. Wendy Holden, Janet Cushnaghan, Angie Dempster, Hayley Paterson, Barbara King (Basingstoke and North Hampshire Hospital); Mr. David Johnson, Lindsey Barber, Jan Smith (Stepping Hill Hospital); Dr. Karen Douglas, Lucy Kadiki, Chitra Ramful, Daljit Kaur (Russell Hall Hospital, Dudley); Dr. Anca Ghiurlic, Christine Graver (Royal Hampshire Hospital, Winchester); Dr. Frank McKenna, Jane McConiffe (Trafford Hospitals); Dr. Sophia Naz and Lorraine Lock (Fairfield Hospital); Darren Peters, Jayne Endicott, Jamie Currie (Royal Devon and Exeter Hospital); Louise Hollister, Sarah Jervis, Dawn Simmons (Weston Hospital, Weston-super-Mare); Andrew Makaka (Whitehills Health and Community Care Centre, Forfar); Christine Duncan (Ninewells Hospital, Dundee); Sarah Jervis (Countess of Chester Hospital); Sarah Collins, Maree Leavy (Aintree Hospital); Dr. CD Ramesh, Dr. MM Aziz, Dr. M Page, Dr. SK Ali, Lesley Miller, Stephen Preston, April Weaver (North Lancashire PCT); Dr. Wan-Fai Ng, Katie Hackett (Royal Victoria Infirmary, Newcastle upon Tyne); Dr. Elizabeth Price, Susannah Pegler (Great Western Hospital, Swindon); Dr. Simon Bowman (Queen Elizabeth Hospital, Birmingham); Amanda Oswald, Pain Care Clinic Ltd., Hove, UK. UK patient organisations: Chris Maker, Chair: Lupus UK; Sally Dickinson, Information Officer, National Ankylosing Spondylitis Association; Anne Mawdsley, CEO: Raynaud’s and Scleroderma Association; Kim Fligelstone, Chair, Scleroderma Society; Dr. Adam Al-Kashi, Head of Research, Back Care Association; Lyndsey Middlemiss, Chair, Fibroaction; Pam Stewart, Chair: Fibromyalgia Association; Steve Fisher, Chair, RSI Action.

Availability of the data and materials

Data and materials can be accessed through a request from the lead author.


This paper presents independent research funded by the Versus Arthritis [Grant No: 20031] and United Kingdom Occupational Therapy Research Foundation (UKOTRF). NHS service support costs were secured from the Greater Manchester Comprehensive Local Research Network (the Lead CLRN). The views expressed of the authors are not necessarily of those of the NHS or the funder. The sponsor and the funding source had no role in the design of this study, its execution, analyses, interpretation of the data, or decision to submit results, apart from study oversight.

Author information




Alison Hammond (AH) conceived the study and was the Chief Investigator. AH, Yeliz Prior (YP), Alan Tennant (AT) and Sarah Tyson (ST) initiated the study design. YP conducted the Phase-1 Cognitive Debriefing Interviews and Content Analysis and facilitated the Phase-2 data collection. AT conducted the Rasch analysis. All authors contributed to refinement of the study protocol and approved the final manuscript.

Corresponding author

Correspondence to Y. Prior.

Ethics declarations

Ethics approval and consent to participate

As data were collected in two different studies, ethical approvals were obtained from the NRES Committee North West (Greater Manchester North) [12/NW/0841], the North West 9 (Greater Manchester West) Research Ethics Committee [11/H1014/5] and the University of Salford Research Ethics Panel prior to the start of the study. Approvals have been obtained from the Research and Development departments and therapy service managers at each hospital. Good Clinical Practice (GCP) and informed consent training were completed by all Occupational Therapists and research facilitators taking part in this study. Eligible participants had provided written, informed consent prior to the commencement of the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

VLAs Published Versions and Psychometric Testing.

Additional file 2.

Original VLAs items vs the British VLAs items.

Additional file 3.

British Valued Life Activities Scale [British VLAs].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Prior, Y., Tennant, A., Tyson, S. et al. The Valued Life Activities Scale (VLAs): linguistic validation, cultural adaptation and psychometric testing in people with rheumatic and musculoskeletal diseases in the UK. BMC Musculoskelet Disord 21, 505 (2020).

Download citation


  • RMDs
  • Participation
  • Activities
  • Leisure
  • Activities of daily living
  • Valued life activities
  • Rasch analysis
  • Validity
  • Reliability
  • RMDs (rheumatic and musculoskeletal diseases)
  • PROMS (patient reported outcome measures)