Multidimensional daily diary of fatigue-fibromyalgia-17 items (MDF-fibro-17): part 2 psychometric evaluation in fibromyalgia patients

Background The Multidimensional Daily Diary of Fatigue-Fibromyalgia-17 instrument (MDF-Fibro-17) has been developed for use in fibromyalgia (FM) clinical studies and includes 5 domains: Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function. Psychometric properties of the MDF-Fibro-17 needed to demonstrate the appropriateness of using this instrument in clinical studies are presented. Methods Psychometric analyses were conducted to evaluate the factor structure, reliability, validity, and responsiveness of the MDF-Fibro-17 using data from a Phase 2 clinical study of FM patients (N = 381). Confirmatory factor analyses (CFA) were performed to ensure understanding of the multidimensional domain structure, and a secondary factor analysis of the domains examined the appropriateness of calculating a total score in addition to domain scores. Longitudinal psychometric analyses (test-retest reliability and responder analysis) were also conducted on the data from Baseline to Week 6. Results The CFA supported the 17-item, 5 domain structure of this instrument as the best fit of the data: comparative fit index (CFI) and non-normed fit index (NNFI) were 0.997 and 0.992 respectively, standardized root mean square residual (SRMR) was 0.010 and the root mean square error of approximation (RMSEA) was 0.06. In addition, total score (CFI and NNFI both 0.95) met required standards. For the total and 5 domain scores, reliability and validity data were acceptable: test-retest and internal consistency were above 0.9; correlations were as expected with the Global Fatigue Index (GFI) (0.62-0.75), Fibromyalgia Impact Questionnaire (FIQ) Total (0.59–0.71), and 36-Item Short Form Health Survey (SF-36) vitality (VT) (0.43–0.53); and discrimination was shown using quintile scores for the GFI, FIQ Total, and Pain Numeric Rating Scale (NRS) quartiles. In addition, sensitivity to change was demonstrated with an overall mean responder score of -2.59 using anchor-based methods. Conclusion The MDF-Fibro-17 reliably measures 5 domains of FM-related fatigue and psychometric evaluation confirms that this measure meets or exceeds each of the predefined acceptable thresholds for evidence of reliability, validity, and responsiveness to changes in clinical status. This suggests that the MDF-Fibro-17 is an appropriate and responsive measure of FM-related fatigue in clinical studies.


Background
Fibromyalgia (FM) is a disorder characterized by chronic widespread pain and tenderness that is estimated to affect 0.5-10% of the worldwide population, with approximately 2-3% (greater than 5 million individuals) of the affected individuals present in the United States (US) alone [1][2][3][4][5]. Patients with FM often experience other symptoms, such as fatigue, impaired sleep, negative mood, cognitive limitations, and physical functioning limitations, leading to a reduced health-related quality of life (HRQoL) [6,7]. Beyond pain, fatigue is commonly identified as one of the most bothersome and disabling symptoms, reported by greater than 80% of FM patients [1,5,8]. Patients often describe fatigue as "disruptive or extremely disruptive" to their HRQoL [9].
There is a growing body of evidence from both clinical and regulatory communities supporting FM-related fatigue as a multidimensional concept [1,[9][10][11]. Additional research on this phenomenon is needed within the context of clinical studies to fully understand the dimensionality as well as ascertain the ability of a single measure to saturate the construct of fatigue. The Multidimensional Daily Diary of Fatigue-Fibromyalgia-17 items (MDF-Fibro-17) is being developed for this purpose; to allow for the exploration and assessment of different components of FM-related fatigue (cognitive versus physical, etc.) in clinical trials while capturing the overall complexity of this experience [12].
Existing research that had been conducted with FM patients for concept elicitation [1], cognitive debriefing and the pilot testing [9] of an initial pool of 23 items was reviewed and used to inform the development of a multidimensional assessment of FM-related fatigue [12]. Five dimensions were identified to reflect the broad experience of FM-related fatigue: Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function. Qualitative and quantitative item-level evaluation suggested that 17 of the original pool of 23 items best supported the conceptual model. This resulted in the 17 item MDF-Fibro-17 being proposed [12].
The original qualitative work confirmed the content validity of the instrument, [12] developed for use in FM clinical studies in accordance with the Food and Drug Administration (FDA) guidance for patient reported outcome (PRO) development [13]. Further work however was needed to conduct psychometric analyses to support the appropriateness of the MDF-Fibro-17 for use in FM clinical studies. The original 23 item pool were therefore administered in a Phase 2 clinical study of TD-9855 (NCT01693692), and psychometric analyses were conducted and are presented in this article. The Phase 2 clinical study (NCT01693692) was a randomized, double-blind, parallel group, placebo controlled study conducted to investigate whether an investigative product, TD-9855, was effective in treating patients with fibromyalgia. TD-9855 is a potent reuptake inhibitor with modest selectivity for inhibition of norepinephrine reuptake and good central nervous system penetration properties in humans. It was hypothesized that TD-9855 would offer the potential for robust pain relief while minimizing any putative serotonergic side effects such as nausea, somnolence, fatigue, and sexual dysfunction. In addition, the majority of fibromyalgia patients suffer comorbid fatigue, therefore reduction in serotonergic activity could be beneficial [14]. Based on this, the primary endpoint for this study was fibromyalgia pain and the exploratory endpoint was fibromyalgia-related fatigue. The Multidimensional Assessment of Fatigue (MAF) was included in the study along with the 23-item pool used to develop the MDF-Fibro-17. The study included 392 subjects treated with TD-9855 2 dose levels or placebo with a ratio of approximately 2 to 1. This quantitative analysis was conducted to confirm whether the MDF-Fibro-17 is an acceptable instrument for the measurement of FM-related fatigue in clinical trials in adult patients with FM, and includes parameters associated with the reliability and validity of the individual items and scores of the MDF-Fibro-17 as well as the responsiveness and hence, interpretability of the measure.

Methods
The original pool of 23 items developed from the qualitative work was incorporated into a Phase 2 study of TD-9855, an investigational norepinephrine and serotonin reuptake inhibitor, in patients with FM [15]. Patients were required to be diagnosed with FM according to the 1990 American College of Rheumatology criteria, [3] be aged 18-65 years, and to have a selfreported pain level of at least 4 on an 11-point Numeric Rating Scale (NRS). Each subject signed an Institutional Review Board or Independent Ethics Committee approved informed consent form prior to participating in this study. Ethical approval for the original qualitative research was provided by Copernicus, a US centralized Independent Review Board. Ethical approval for the Pfizer cross-sectional validation study was provided by the Schulman Associates Institutional Review Board, Inc. and the University of Cincinnati Institutional Review Board. Ethical approval was obtained for the Theravance validation study at a site level, with each site obtaining approval individually. The 23 items were programmed onto a personal digital assistant (PDA) handheld electronic device, to be completed by the patients at the end of each day during the placebo run-in period (Days -7 to -1), the treatment period (Days 1 to 43), and the post-treatment washout period (Days 44 to 57). Training for investigators and patients in the use of the PDA and completion of the diary in accordance with study procedures was provided in addition to a quick reference guide.
Patients were instructed to complete all items at approximately the same time every evening, and a restricted time-window for completion was programmed between the hours of 17:00 and 24:00. Retrospective completion of missed days was not allowed. The diary questions were presented sequentially and the option to skip items was not provided.
Each item was presented as a 0-10 NRS anchored by "not at all" at 0 and "extremely" at 10; higher scores indicated greater fatigue severity for 22 of the 23 items. A weekly score was calculated as the mean of the available data if greater than 4 entries were completed within the 7-day period. Observations less than 4 entries were considered missing with no imputation. All items were evaluated on an item level to confirm the hypothesized 5 domain, and a 17-item fit of the data to the conceptual model identified previously in qualitative work [12]. The 5 domain scores (Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function) were calculated as the summed average of item scores in each domain. A total score was calculated as the average of the domain scores (also ranging from 0 to 10).
A number of additional instruments were included in the study and used to inform the psychometric evaluation of the MDF-Fibro-17 (see Table 1 for further details.) The following standard set of psychometric analyses was performed [16].

Item-level evaluation
Item-level evaluation was conducted to examine data completeness, the distribution of responses per item was examined to identify any floor or ceiling effects and the pattern of missing item levels.
Confirmatory factor analysis (CFA) Initial CFA of 17-item, five-factor latent-model The factor structure of the MDF-Fibro-17 items was evaluated using the 17-item, five-factor latent-model ( Fig. 1) analyses using interim baseline data from the Phase 2 study (N = 192) to assess the degree to which the hypothetical conceptual measurement model fit the data.
Second CFA of 5 domains to create a total score Following the initial CFA conducted to explore the multidimensional domain structure of the measure, a secondary factor analysis of the domains was conducted to explore the appropriateness of calculating a total score (Fig. 1). This second CFA was conducted using full data set from the Phase 2 study (N = 381). The averaged domain raw scores were used as the manifest variables in a single-factor CFA.
For the initial and secondary CFA, the goodness of fit of the models was evaluated by several fit indices using the following pre-defined thresholds: a comparative fit index (CFI) of 0.95 or higher; a root mean square error of approximation value (RMSEA) of 0.06 or lower; a non-normed fit index (NNFI) of 0.90 or higher; and a standardized root mean residual (SRMR) of 0.08 or lower [17][18][19][20][21][22][23]. Confirmatory factor analysis was conducted using Mplus Version 6.1.19.

Item-domain relationships
The relationships between individual items and the proposed MDF-Fibro-17 domains were evaluated. Itemtotal correlations, within the hypothesized domains, were expected to be 0.4 or greater [24][25][26].

Reliability
The consistency of the items to measure fatigue at individual time points as well as the repeatability while patients were considered stable were evaluated. Reliability of the MDF-Fibro-17 domain and total scores were assessed using test-retest reliability (intra-class correlation coefficient [ICC] ≥ 0.7; Spearman Brown) and internal consistency (Cronbach's alpha > 0.8) [18,24]. The former was used specifically to determine the repeatability of the observed score in the absence of an observed change and the latter to assess the level of internal consistency ratings across a group of items within a domain. Moderate or greater correlations (>0.4) were expected to confirm convergent validity, and weaker correlations (<0.4) expected to confirm divergent validity. However, given the complex relationships between symptoms in FM, correlations with measures assessing concepts other than fatigue were not expected to be zero. These analyses were conducted on absolute scores at Baseline and repeated at End of Study using change scores calculated for each measure.

Known-groups validity
Known-groups validity was examined to provide further evidence of construct validity. Scores on measures indicative of overall severity of condition (the pain intensity NRS and FIQ total score), and the GFI, a measure of fatigue, were divided into quintiles. Mean MDF-Fibro-17 total and domain scores were computed for each quintile. A generalized linear model provided an overall Ftest for the group discrimination with effect size estimates considered as 0.2 (small), 0.5 (moderate), and 0.8 (large) [27].

Sensitivity to change and responder analysis
Effect sizes are defined as the mean change found in a variable divided by the standard deviation (SD) of that variable. Effect sizes are used to translate "the before and after changes" into a standard unit of measurement that will provide a clearer understanding the relative sensitivity and performance of each clinical variable. The ability of the MDF-Fibro-17 to detect changes observed in the clinical study was evaluated using distributionand anchor-based methods. Distribution-based methods include estimations based on observed variance in the sample such as the evaluation of ½ SD or 1 standard error of measurement. Anchor-based methods allow for the conceptual linking (e.g., discriminability) between additional known clinical or patient variables.
For the distribution-based analyses, 2 definitions for the ½ SD approach were used: ½ of the baseline SD and ½ of the change score SD; and 2 for the standard error of the mean (SEM) approach: SEM based on the ICC (test-retest coefficient and the baseline SD), and SEM based on the ICC and the change score SD [28][29][30].
For the anchor-based analyses, a collapsed PGI-C scale category of "very much improved" and "much improved" versus remaining PGI-C responses denoting minimal improvement, no change, or decline ("minimally worse" to "very much worse") was used for discrimination on the MDF-Fibro-17 (see Table 1 for further details). Additional anchors of a change of 8.0 points on the GFI, and 11.0 points on the FIQ total score were also used based upon the meaningful change established for these measures [31][32][33][34][35][36][37][38][39][40].
All analyses, unless otherwise specified, were conducted using Statistical Analysis Software (SAS) software Version 9.1.3 (SAS Institute Inc., Cary, NC, US). Values reported in text are means ± SD.

Sample characteristics
The final sample of 392 patients in the intention-to-treat (ITT) population (369 females, 23 males) had an average age of 45.7 ± 10.6 years. The majority of patients were Caucasian (82.7%) followed by Black/African American (13.0%) ( Table 2). At Baseline, patients had an average FIQ total score of 54.9 ± 14.92, which indicated moderate FM severity [32]. The average pain intensity NRS score was 6.1 ± 1.31 and average GFI score was 33.4 ± 8.09. Demographic and baseline clinical characteristics of the ITT analysis group are detailed in Table 2.
A total of 381 (97%) patients from the ITT population had data available on the DFS-Fibro at Baseline. This analysis set was used in the psychometric evaluation of the measure.

Item-level evaluation
The items were administered via electronic PDA, which did not allow items to be skipped; therefore, there no missing data were at the item level. No floor or ceiling effects at the item level were observed (0.3-1.3% and 0.3-0.5% respectively). All items showed a negative skew, with the majority of values to the right of the mean. Nine items had a z-score greater than 2.0 indicating a substantial departure from normality.
Confirmatory factor analysis (CFA) Initial CFA of 17-item, five-factor latent-model An initial CFA conducted using preliminary baseline data from the TD-9855 Phase 2 study (N = 192) concluded that the MDF-Fibro-17 fit the data well with all parameters met the pre-specified criteria. The initial CFA model was evaluated on the 17 items, 5-factor model hypothesized for the MDF-Fibro 17 and suggest that the model fit the data from both studies. These results are presented in Table 3 below, for reference also included are initial results from the existing validation study that was reviewed to inform the development of the tool, discussed elsewhere [12].
Second CFA of 5 domains to create a total score (current study) Using data collected in the full-dataset TD-9855 Phase 2 study (N = 381), the averaged domain raw scores were used as the manifest variables in a single-factor CFA to explore the appropriateness of a total score. The CFA models were evaluated in a stepwise fashion to allow for accumulation of evidence surrounding the dimensionality of the MDF-Fibro-17. The single-factor CFA model was evaluated on 5 domain scores and the total of 5 domain scores and suggest that the model fit the data. The CFI and NNFI were both 0.952, above their respective 0.95 and 0.90 required thresholds. The SRMR was 0.020, below the prespecified 0.08 threshold, which is, in part, due to the small number of parameters in this model. Due to the presence of correlated residuals between Fatigue Experience and the Physical The second-order CFA confirmed that it is acceptable to calculate a total score, which consists of all domain scores. The CFI was 0.997 and NNFI was 0.992, both well above their respective 0.95 and 0.90 required thresholds. The SRMR (0.010) was well under the required threshold, and the RMSEA (0.061) also met required standards. The path coefficients for the 17-item, 5 domain MDF-Fibro, accounting for correlated residual between Global Fatigue Experience and Physical Fatigue (0.42), were between 0.88 and 0.990. The correlation coefficients for individual items ranged from 0.92 to 0.99. The CFA results are shown in Table 4.

Item-domain relationships
Corrected item-total correlations within hypothesized domains ranged from 0.92 to 0.96 for Global Fatigue Experience, 0.96 to 0.98 for Cognitive Fatigue, 0.85 to 0.91 for Physical Fatigue, 0.94 to 0.96 for Motivation, and 0.93 to 0.97 for Impact on Function, all of which met pre-defined criteria and were considered substantial. For all items except two, observed correlations were highest with its own domain compared to with other domains. Item "How tired did your body feel today?", part of the Physical Fatigue domain, correlated more strongly with Global Fatigue Experience (0.92), Motivation (0.88), and Impact on Function (0.88) than its own domain (0.85). Item "How much did tiredness make it difficult to do things today?", part of the Impact on Function domain had a slightly higher correlation with the Motivation domain than its own domain (0.95 versus 0.93). All correlations are presented in Table 5.

Reliability
Test-retest reliability was assessed by evaluating the reproducibility of MDF-Fibro-17 scores over the time period between Baseline and Day 8 and from Week 5 to Week 6. All ICCs (Spearman Brown) exceeded the required 0.70 level, for baseline versus day 8, ICCs ranged from 0.71 to 0.82 (median of 0.74), and for Week 5 versus Week 6 all exceeded 0.90.
Internal consistency was confirmed as acceptable with strong Cronbach's alpha for the total score and all domain scores (ɑ = 0.94-0.99). Reliability data are shown per MDF-Fibro domain in Table 6.   Table 6.

Known-groups validity
All known-group difference analyses of MDF-Fibro-17scores were highly significant (p < 0.001) when performed using quintiles. Large effect sizes (>0.8), [27,41] determined by the F value, provided an indication of the differential sensitivity of the MDF-Fibro-17 scores to the crosssectional known-groups, showing the greatest ability to discriminate between the 5 quintiles on the NRS, GFI, and FIQ Total. Scores by quintiles are summarized in Table 7.

Sensitivity to change and responder analysis
Significant (p < 0.001) changes were observed in all MDF-Fibro-17 scores from Baseline to End of Study. A medium effect size (>0.5) was observed for the Cognitive Fatigue domain (-0.69). Effect sizes for the total score and all other domains were large (-0.85 to -0.95). Similar effect sizes to those observed on the MDF-Fibro-17 were also observed in the pain intensity NRS, FIQ total score, GFI, and SF-36 VT.
The responder definitions for the MDF-Fibro-17 domains were assessed using distribution and anchor-based approaches. Similar results were found with both distribution-based approaches, used to understand the lower limits of acceptable responder definitions. Anchorbased responder definitions using the PGIC ([Patients' Global Impression of Change] very much/much improved category), GFI (>11-point improvement), and FIQ total Highlighted correlations indicate same-scale item-total correlations score (>8-point improvement) were similar to those determined by selected distribution based methods (-2.55 to -2.94). However, the responder definitions determined using the PGIC much improved category, GFI, and FIQ had a broader range (-2.06 to -3.41). The mean responder score, based on the anchor-based analyses, for the MDF-Fibro-17 Total Score and the 5 domains ranged from -2.48 to -2.85. Overall, the recommended responder cutoff for the total score as well as the other domains is -2.5 (summarized in Table 8).

Discussion
The MDF-Fibro-17 is a multidimensional measure of FM-related fatigue, made up of 5 domains (Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function). The analyses confirmed the domain structure suggested by the conceptual model developed from in-depth qualitative work with FM patients, and indicated sound psychometric properties of the measure. All 17 items in the MDF-Fibro-17 performed well as individual items and as part of the 5 domain structure of the instrument. The multidimensional structure allows the MDF-Fibro-17 to capture the broad experience of FM-related fatigue, a characteristic that has been identified as important within the clinical and regulatory community [1,7,8,10]. In addition, the factor analyses confirmed that it is also appropriate to calculate a single total score informed by the in-depth measurement of FM-related fatigue. The relationships between individual items within and across domains demonstrates the complexity of fatigue in FM. There was a strong correlation observed between motivation and physical functioning items in particular, suggesting potential item redundancy. However, both the qualitative data and conceptual model [9] highlighted that these are related but distinct aspects of FM-related fatigue from the patient perspective and therefore relevant and important to include within the measure.
Tests of internal consistency and test-retest reliability were strong, indicating that this is a highly reliable   Known-groups analysis revealed that the MDF-Fibro-17 total and domain scores were able to differentiate between all groups tested. Highly significant changes were observed over the study period on all scores of the MDF-Fibro-17, with medium to large effect sizes, which reflected the changes observed on other outcomes in the study, indicate that the instrument is sensitive to detecting changes observed in a clinical study.
Responder analyses conducted using different definitions for both anchor based and distribution-based techniques produced similar estimates and the results suggested a reasonable responder cut-off to be around -2.5.
One limitation to this study is that although the MDF-Fibro-17 has the potential to assess the different components of FM-related fatigue based on data described above, this study was conducted in a particular clinical trial population in response to drug therapy intervention. Therefore, responsiveness and sensitivity to other therapies would need to be further explored in future studies.

Conclusion
The psychometric evaluation and strong evidence of content validity indicate that the MDF-Fibro-17 is a relevant, psychometrically robust, multidimensional instrument, with sensitivity to detection change and clear response definitions. Taken as a whole, the MDF-Fibro-17 has the potential to become a reliable clinical outcome assessment tool to evaluate fatigue in adult patients with FM within a clinical trial setting [12].

Acknowledgements
We thank Jamie Carroll who provided medical writing services on behalf of Clinical Outcomes Solutions Ltd. We would also like to thank Pfizer for providing access to their pilot study data.

Funding
Funding was provided from Theravance Biopharma US, Inc. to conduct this research project. Specifically, Theravance led the study design, data collection, analysis and interpretation, and were actively involved the development of the manuscript. Theravance Biopharma US, Inc. was not involved in the design or conduct of the previously Pfizer studies or analyses.

Availability of data and materials
The datasets generated during the current study and the Phase 2 clinical trial data analyzed are the property of Theravance Biopharma, Inc. and are not publicly available. The data from the original research that support the findings of this study are the property of Pfizer Inc. Pfizer Inc. are copyright owners of the measure. Applications for a copy of the measure and background data can be made through www.pfizerpatientreportedoutcomes.com.
Authors' contributions YL originally conceived of the research, and led its design, coordination, analysis, and interpretation. SM participated in the design and the planning and conduct of the psychometric analysis. JS and SD participated in the design and conduct of the study, and in the interpretation of data. JC participated in the conduct and design of psychometric analysis and WW were involved in the planning for psychometric analysis and the interpretation of data. SH, CB, and TS were involved in the interpretation of the study. All authors have been involved in drafting the manuscript or revising it critically for important intellectual content; have given final approval of the version to be published; and agree to be accountable for all aspects of the work in ensuring that questions related to accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors read and approved the final manuscript.
Competing interests SM and YL are consultants with Theravance Biopharma US, Inc. JS and SD were employees of Theravance Biopharma US, Inc. at the time of conducting this research. SH and CB are employees of Clinical Outcomes Solutions Ltd, who are paid consultants on this project. JC was an employee of Covance Inc., who was the paid to conduct the psychometric analysis. This project was funded by Theravance Biopharma US, Inc. CB was a formerly employee of Pfizer Ltd, copyright owner of the DFS-Fibro (5 items and 17 items).

Consent for publication
Not applicable.
Ethics approval and consent to participate Ethics approval and written informed consent was provided by all individuals prior to participation in these studies as part of the original published research [6,14]. Ethical approval for the original qualitative research was provided by Copernicus, a US centralized Independent Review Board. Ethical approval for the Pfizer cross-sectional validation study was provided by the Schulman Associates Institutional Review Board, Inc. and the University of Cincinnati Institutional Review Board. Ethical approval was obtained for the Theravance validation study at a site level, with each site obtaining approval individually.