Sarcopenia, frailty and cachexia patients detected in a multisystem electronic health record database

Background Sarcopenia, cachexia and frailty have overlapping features and clinical consequences, but often go unrecognized. The objective was to detect patients described by clinicians as having sarcopenia, cachexia or frailty within electronic health records (EHR) and compare clinical variables between cases and matched controls. Methods We conducted a case-control study using retrospective data from the Indiana Network for Patient Care multi-health system database from 2016 to 2017. The computable phenotype combined ICD codes for sarcopenia, cachexia and frailty, with clinical note text terms for sarcopenia, cachexia and frailty detected using natural language processing. Cases with these codes or text terms were matched to controls without these codes or text terms matched on birth year, sex and race. Two physicians reviewed EHR for all cases and a subset of controls. Comorbidity codes, laboratory values, and other coded clinical variables were compared between groups using Wilcoxon matched-pair sign-rank test for continuous variables and conditional logistic regression for binary variables. Results Cohorts of 9594 cases and 9594 matched controls were generated. Cases were 59% female, 69% white, and a median (1st, 3rd quartiles) age 74.9 (62.2, 84.8) years. Most cases were detected by text terms without ICD codes n = 8285 (86.4%). All cases detected by ICD codes (total n = 1309) also had supportive text terms. Overall 1496 (15.6%) had concurrent terms or codes for two or more of the three conditions (sarcopenia, cachexia or frailty). Of text term occurrence, 97% were used positively for sarcopenia, 90% for cachexia, and 95% for frailty. The remaining occurrences were negative uses of the terms or applied to someone other than the patient. Cases had lower body mass index, albumin and prealbumin, and significantly higher odds ratios for diabetes, hypertension, cardiovascular and peripheral vascular diseases, chronic kidney disease, liver disease, malignancy, osteoporosis and fractures (all p < 0.05). Cases were more likely to be prescribed appetite stimulants and caloric supplements. Conclusions Patients detected with a computable phenotype for sarcopenia, cachexia and frailty differed from controls in several important clinical variables. Potential uses include detection among clinical cohorts for targeting recruitment for research and interventions.


Background
Skeletal muscle weakness and poor physical performance develop with aging, complicating many chronic clinical conditions and influencing outcomes and decisions regarding the modality and aggressiveness of treatments. Terms used to describe this overall skeletal muscle decline include sarcopenia, cachexia and frailty. The terms used in clinical practice are influenced by the medical subspecialty and the location of medical care [1], though these terms may be used interchangeably despite unique mechanisms and operational definitions [2]. Sarcopenia is a condition of generalized low muscle mass and strength, resulting in poor physical performance complicating aging and many chronic diseases [3,4]. Cachexia involves catabolism and is tied to nutritional status with resulting extreme weight loss [2]. Most patients with cachexia will have loss of muscle mass and strength consistent with sarcopenia, while patients with sarcopenia may not have cachexia, such as those with sarcopenic obesity [5]. Frailty is the result of aggregate deficits impairing overall functional reserve [6,7], leading to falls, functional dependence, hospitalizations and other adverse outcomes. Although also tied to musculoskeletal function, frailty is a heterogenous syndrome involving multiple factors including balance, neuropathy, cognitive function, joint dysfunction, cardiovascular function, comorbidities, psychosocial and other factors [8]. Thus, these concepts of frailty, sarcopenia and cachexia are interrelated. Those with cachexia develop sarcopenia; sarcopenia decreases mobility resulting in frailty; and the frail state exacerbates muscular declines [5,9]. In addition sarcopenia, cachexia and frailty each contribute to functional dependence, dysmobility, disability, hospitalizations, high healthcare costs and death [10][11][12][13][14][15][16][17].
In 2000, disability due to sarcopenia cost the US healthcare system an estimated 18.5 billion dollars [18]. With increasing life expectancy, the public health costs of disability are expected to increase. Studies suggest supervised exercise, dietary supplements and pharmacologic interventions may benefit individuals with sarcopenia [19,20], frailty [21] and cachexia [22]. However, it is critically important from individual and public health perspectives to identify patients with sarcopenia, frailty and cachexia early for intervention. However these conditions are often not recognized due to lack of knowledge among providers and of equipment for objective measures (e.g. grip strength), as well as time pressures in clinical encounters [23].
Large electronic health record (EHR) datasets combining clinical text notes with coded data provide an opportunity to identify patients having specific conditions from clinical encounters. Natural language processing of text using computers can enhance capture of information by accessing unstructured data from the robust clinical note repository making up the majority of the data within EHR. A computable phenotype is a clinical condition, characteristic, or set of features that can be determined using a computer algorithm to assess its presence or absence solely from data in EHRs and ancillary data sources. Computable phenotypes may include structured data (diagnosis codes, recorded measurements, laboratory values, and medications), unstructured data (text fields or notes), or combinations of such variables.
We hypothesized that patients diagnosed or described by providers in the clinical record as having sarcopenia, cachexia or frailty could be detected using an EHR based computable phenotype combining coded and text data. As evidence of detecting a clinically important phenotype, we hypothesized patients identified based on our computable phenotype would differ regarding clinical features from randomly selected matched controls.

Study design
This was a retrospective case-control study performed using the Indiana Network for Patient Care (INPC), a large statewide clinical data exchange warehouse including over 100 separate healthcare entities including major hospitals, health networks, and insurance providers (Fig. 1). The INPC contains data on over 18 million patients in the form of 7 billion clinical data elements, 1.1 billion encounter records, over 290 million mineable text reports, and data on drug prescription and dispensing. Approximately two thirds of Indiana's population contribute data to INPC during clinical encounters. This study was conducted in accordance with the Declaration of Helsinki, and prior to the study the protocol was approved by the Indiana University Institutional Review Board. Patients were not contacted during this retrospective study in a large EHR database, and the Indiana University Institutional Review Board approved waiver of consent.
We generated a computable phenotype based on the combination of ICD codes for sarcopenia, cachexia and frailty, and text variants of the words sarcopenia, cachexia and frailty. Both ICD9 and 10 codes for frailty (797, R54) and cachexia (799.4, R64) were assessed. Sarcopenia only has an ICD10 code (M62.84), introduced in 2016. Notes were searched for text terms using locally generated natural language processing software, nDepth. Text searches included detection of variants such as misspellings and grammatical variants. Software also assessed term negation (such as "not sarcopenic").

Eligible patients
Included adult patients (18 years of age and older) having encounters and clinical notes within the Indiana University Health System and Eskenazi Health Systems during 2016-2017. The computable phenotyping algorithm was applied to these patients' records including additional INPC participating institutions during the study period. Patients having one or more positive occurrences during the study period of either the text terms or codes were considered as cases having the computable phenotype. Controls were chosen from the portion of the population assessed that had no occurrences of either the ICD codes or text terms for sarcopenia, cachexia or frailty, matched 1:1 to cases on year of birth (to control for age), sex (male, female) and on race (black, white, other), as recorded in INPC. The index date was the earliest occurrence within the study period of the computable phenotype for cases, or the earliest encounter within the study period for controls.

Manual validation of computable phenotype
Two clinician investigators reviewed the EHR text around the detected occurrences to confirm whether the computable phenotype algorithm was detecting that the author of the clinical note was attributing the condition to the patient as "present" or "absent". The two clinicians manually validated all cases detected by the computable phenotype and assigned a value of positive occurrence indicating the condition is present in the patient, negative occurrence indicating the patient does not have the condition (e.g. "not frail" or the description refers to someone else: "she is caring for her sick, frail mother") or uncertain. Occurrences were rated as positive if both reviewers rated as positive, or if one rated as positive and the other as uncertain. Occurrences were rated as negative if both reviewers rated as negative or if one rated as negative and the other as uncertain. Occurrences were rated as uncertain if both reviewers rated as uncertain or if one reviewer rated as positive and the other as negative. For feasibility, only a smaller subset of 50 randomly selected patients from among the matched controls were manually reviewed. Because controls were based on absence of the computable phenotype terms or codes, manual review of the controls' entire clinical text notes during the study period was necessary rather than just the text notes near the occurrence of terms or codes as in cases.

Variables
We extracted additional structured data from INPC on cases and controls, including demographics (used for matching and cohort description), height, weight, body mass index (BMI), diagnosis codes for comorbidities, laboratory values (albumin, prealbumin, and hemoglobin A1C using the closest value to the index date) and hospitalizations. Charlson comorbidity index was calculated to quantify a patient's overall disease burden [24]. We also assessed records for dispensing of glucocorticoids, dronabinol, megestrol, testosterone and caloric formula supplements (protein shakes, etc.) through Surescripts. Formalized assessments of muscle strength, muscle mass, gait, function, etc. were not accessible and thus could not be detected or analyzed. In EHR data it is not generally possible to distinguish whether absence of a datapoint indicates missing versus absent data.

Statistical analysis
Continuous variables were summarized by median (1st quartile, 3rd quartile), and categorical variables were summarized by frequency (percentage). For the comparisons between the cases and the controls (total cases and subgroups of cases from the diagnostic categories of sarcopenia, cachexia or frailty), Wilcoxon matched-pair sign-rank test was used for continuous variables, and conditional logistic regression was used for binary

Results
The computable phenotype detected 10,288 presumptive cases from 2016 to 2017. After manual review, 9594 (93.3%) were considered confirmed positive cases of a clinician identifying the patient as sarcopenic, cachectic or frail. The remaining 694 (6.7%) involved text term use indicating negation or referring to its presence in a separate person (e.g. relative). Most cases were detected by text terms without ICD codes n = 8285 (86.4%). All cases detected by ICD codes (total n = 1309 (13.6%); sarcopenia n = 10, cachexia n = 1011, frailty n = 329, more than one code n = 41) also had supportive text terms. All text term occurrences were manually reviewed as described in the methods, for whether the occurrence indicated a statement regarding presence or absence for the condition. When present, sarcopenia terms indicated presence of the condition 97% of the time (310/318), cachexia terms 90% of the time (3921/4364), and frailty terms 95% of the time (6821/7144). The rest of the occurrences described absence of the conditions. A subset of 50 out of 9594 matched controls were manually reviewed. None had evidence for missed detection of the terms or codes for sarcopenia, cachexia or frailty, and none had other terms sufficient to determine the presence of these phenotypes. Table 1A is a cross-tabulation indicating the number of cases with each of the individual terms/codes for sarcopenia, cachexia and frailty among the cases. Patients having either the appropriate text term or the ICD code were considered as having the medical condition (i.e. sarcopenia, cachexia and frailty). Patients with an individual text term (or code) for one of the three conditions also frequently had text terms (or codes) for the other conditions. Overall 1496 (15.6%) cases had terms or codes for two or more of the three conditions (sarcopenia, cachexia or frailty) concurrently in their record (n = 133 had all three conditions; sarcopenia plus cachexia n = 33; sarcopenia plus frailty n = 57; cachexia plus frailty n = 1273).
The median (1st, 3rd quartiles) age of cases was 74.9 (62.2, 84.8) years, with 59% being female. Most were white (69%), 10% black and 21% listed other race. Cases with sarcopenia, cachexia or frailty differed from controls in several clinical aspects (  Table 1). Cases with ICD codes also had lower BMI than cases without ICD codes. Diabetes, hypertension, and chronic kidney disease stages 3-5 associated negatively with having ICD codes among cases. However, malignancies, AIDS, osteoporosis, and higher Charlson comorbidity index associated positively with having ICD codes among cases, possibly reflecting greater recognition and priority of coding in these scenarios. Having ICD codes among cases also associated positively with treatments directed at sarcopenia such as use of dronabinol or megestrol.
To determine if the clinical difference from controls was similar in the groups diagnosed with sarcopenia, cachexia or frailty, we conducted a sub-analysis of each group separately and their matched controls, excluding patients with overlapping codes or text terms for more than one of these three conditions. In general, the differences in clinical variables between cases and their matched controls were similar in magnitude and direction when analyzing patient with sarcopenia, cachexia or frailty separately (Supplemental Tables 2a, b and c) as those seen analyzing them together (Table 2). BMI was lowest in the cachexia group. The patients diagnosed with frailty were generally older (median 80.2 years) than those diagnosed with sarcopenia or cachexia (median 64.5 and 63.1 years, respectively) and thus the frailty group had higher proportions of patients with some chronic conditions including diabetes, hypertension, cardiovascular disease, kidney disease, osteoporosis and fractures. However, the proportion of patients with Charlson comorbidity index values over 2 were similar between those with sarcopenia, cachexia or frailty, but in each category was higher than matched controls.

Discussion
We detected patients having the presence of sarcopenia, cachexia or frailty in the EHR using a computable phenotype incorporating both ICD codes and text terms. Most patients did not have an ICD code to accompany the use of text descriptors. Of note the sarcopenia code was rarely used, accounting for only 10 patients while text terms described sarcopenia in 310 patients in the same time period. This may represent underutilization of the code due to it only being introduced in 2016 [25]. Of the three terms, cachexia was the most likely to be accompanied by an ICD code. However, all terms appeared much more often as text terms than as ICD codes. Clinicians also frequently appeared to use these terms interchangeably, with overlap in their use in 15.6%, occasionally within the same note, suggesting that clinicians may perceive the clinical similarity, and consistent with the research literature surrounding these constructs [5,9]. Even when the clinicians identify sarcopenia, cachexia or frailty by terms in their notes, these diagnostic codes were only applied 13.6% of the time. The reasons for non-coding may include the clinician's perceived lesser importance of these medical conditions or a tendency to code only for the primary diseases for which they are seeing the patient. Thus, relying on ICD codes alone for detection in the EHR is insufficient. This finding has clinical relevance because failure to attribute sufficient importance to sarcopenia, cachexia and frailty in the EHR might correspond to failing to target treatment to these conditions. The cases were more likely to have ICD codes if they were male, black, or had malignancy, AIDS, or osteoporosis, or had higher Charlson comorbidity index suggesting greater recognition and coding in these conditions or with greater burden of disease. In a busy clinical practice, sarcopenia, cachexia and frailty may not be diagnosed or coded during physician encounters due to focus on other urgent issues and addressing secondary issues such as skeletal muscle loss may be deferred, delaying detection and treatment.
Cases detected had evidence of systemic disease including more frequent diabetes, hypertension, cardiovascular, peripheral vascular disease, kidney disease, liver disease and malignancy than controls. Cases also had 8.99-fold increased odds of having a Charlson comorbidity index of > 2. Given their larger disease burden, it is not surprising that cases had more hospitalizations during the study period. Similarly, individuals with these conditions of poor muscle health also had higher odds of fractures. In addition, cases had a poorer nutritional status as suggested by lower BMI, albumin and prealbumin.
Polypharmacy is well documented in frail adults and thought to have a bidirectional effect with temporal associative studies showing that high medication burden may be a cause for frailty, as well as a result [26]. We did not address overall medication burden in this analysis, but instead focused on use of caloric supplement and appetite stimulants, though overall the numbers of patients prescribed these were small. It is possible that caloric supplements (protein shakes, etc.) were not fully captured as these do not require a prescription. The use of megesterol and dronabinol in cases compared to controls is consistent with clinical efforts to manage this cachexia and weight loss [27,28]. In addition, cases having ICD codes were more likely to receive directed pharmacological treatments. Although testosterone has been used for sarcopenia treatment [29], fewer cases were receiving testosterone than controls. This implies that providers were not prescribing testosterone for this purpose in these sicker patients.
Strengths of our analysis include the large sample size, manual validation of cases and the result that our computable phenotype reliably detected patients that clinicians were diagnosing with sarcopenia, cachexia or frailty. The large sample size with a range of age, gender and race included allows generalizability of results to detect sarcopenia, cachexia or frailty diagnoses within the EHR across a wide range of ages and conditions. Our limitations include that our methods cannot detect sarcopenia, cachexia or frailty if the clinician has not made the diagnosis or documented the codes or the appropriate text in the notes. Additionally, objective assessments for sarcopenia, cachexia or frailty were not performed, therefore the occurrence of codes or terms in EHR does not guarantee that the conditions are present, but only implies that the provider detected or interpreted evidence of these conditions. This results in a potential detection bias and it is likely that our computable phenotype is detecting primarily the sicker patients with these conditions or those with more severe sarcopenia, cachexia or frailty. More mild versions of the clinical phenotypes would thus be missed. Since we are unable to capture scenarios where clinicians failed to detect or mention evidence for sarcopenia, cachexia or frailty, this results in a potentially a large number of missed cases and could introduced a misclassification bias into our analysis if some controls might have clinical features of sarcopenia, cachexia or frailty without documentation. Such misclassification would be likely to decrease the differences between cases and controls for various comparisons. Despite this our groups had significant differences in multiple clinical parameters suggesting that we are truly detecting different groups of patients.
We also found the differences in clinical variables between cases and their matched controls were similar in magnitude and direction when analyzing patients with sarcopenia, cachexia or frailty separately. Overall, our findings, including the considerable overlap in application of these diagnoses, suggest a lack of standardized approach among the general clinicians to both reliably detect these conditions or to differentiate between them. Given that the computable phenotype is dependent on what the clinician labels the patient's condition, without access to objective measurements we are not able to tell which of the conditions should be most accurately applied to the patient (or if more than one is appropriate). To overcome these biases and limitations, future studies would require further validation of the computable phenotype using objective physical measurements in recruited subjects.

Conclusions
We validated a computable phenotype to detect diagnosed sarcopenia, cachexia and frailty among patients within EHR. This computable phenotype used the text terms and grammatical variants of the words sarcopenia, cachexia and frailty along with and their associated ICD codes [sarcopenia (M62.84), cachexia (799.4, R64), frailty (797, R54)], which reliably indicated that the clinical provider was labeling the patient as having these conditions. Cases detected in the EHR differed from controls in the frequency of several important comorbidities and number of hospitalizations indicating a clinically meaningful computable phenotype is being detected. Further work is needed to increase electronic capture in the EHR itself of physical measures and components of these physical phenotypes to enable greater detection, differentiation and intervention on a population health level. Such a computable phenotype has wide ranging potential uses clinically in detecting patients at risk for disability, as well as identification for research recruitment for clinical trials.