- Research
- Open access
- Published:
Determination of individual factors associated with hallux valgus using SVM-RFE
BMC Musculoskeletal Disorders volume 24, Article number: 534 (2023)
Abstract
Introduction
This cross-sectional study aimed to determine the factors related to hallux valgus (HV) and their importance using support vector machine-recursive feature elimination (SVM-RFE).
Methods
A total of 864 participants aged ≥ 18 years were enrolled. The Manchester scale was used to determine the presence of HV (summed scores for both feet ≥ 4). The questionnaire included items such as age, sex, height, weight, and foot measurements. These internal factors were analyzed to determine if they are related to HV using SVM-RFE.
Results
The results of tenfold cross-validation using SVM-RFE revealed that the numbers of feature selections were 10, 10, and 9 for age, sex, and body weight, respectively, and these factors were shown to be related to HV. HV was found to be more common in women than in men (women, 24.9%; men, 7.6%), but the sex difference was not significant in older people.
Conclusion
Age and sex were found to be important factors associated with HV identified via feature selection using SVM-RFE.
Introduction
Hallux valgus (HV) is a joint deformity that occurs in the first metatarsophalangeal joint of the great toe, in which the first metatarsal upon birth turns inward and the great toe turns outward. HV is the most common joint deformity, with an estimated incidence of 21–65% [1, 2]. It increases with age and was reported to occur in 23.0% of individuals aged 15–65 years and in 35.7% of those aged ≥ 65 years [2]. Furthermore, it causes foot pain and is correlated with impaired gait and balance, thereby increasing the risk for falls in older people [3]. HV is caused by the displacement of the first ray in the dorsal–medial direction due to the hypermobility of the first tarsometatarsal joint and foot pronation due to loading of their body weights [4, 5].
In general, HV is quantitatively diagnosed based on the dorsal plantar radiographs of the foot during loading. Furthermore, its presence is confirmed if the angle between the long axis of the first metatarsal and the long axis of the great toe is ≥ 15° [6]. However, owing to the difficulty in performing a radiographic examination in epidemiologic studies, self-reports [7], standardized photographs [8], and line drawings are used [9]. A systematic review of the literature demonstrated a wide variation in HV prevalence estimates due to several factors, such as HV diagnosis method, sex, age, study quality, and sampling method [2]. Previous investigated on self-recognized of HV, there was an error in rate of agreement with the diagnosis using radiologically assessed, and there is a problem that self-recognized cases is more likely to be perceived as less severe than the diagnosed cases by the physician [10]. Thus, a system that can correctly determine HV severity is desirable. Graded retests and inter-rater reliability using the Manchester scale (MS) were found to be excellent [11], and MS scale method can be widely applied in the future. The mean hallux abductus angle, measured using the radiographs of participants with an MS score of 2, whom this study was based on, was approximately 15°, which is the generally accepted minimum value for HV diagnosis [12]. HV can be influenced by external factors, e.g., wearing of high heels [13, 14] and injuries of the medial collateral ligament of the first metatarsophalangeal joint [15], and internal factors, e.g., genetic predisposition, sex, age, and flat feet [14]. Thus, HV is caused by a combination of factors, which indicates the importance to rank the relevant factors. However, to the best of our knowledge, studies investigating the factors related with HV using MS are scarce.
Support vector machine (SVM) is a machine learning-based classification method that can be applied to pattern recognition and regression analysis [16]. In the 1990s, the application of SVM was extended to nonlinear discriminant methods combined with kernel methods. SVM, which has been used to construct nonlinear discriminant functions by kernel tricks, is currently the best learning method for pattern recognition. In general, many features are used for pattern recognition and discrimination, and features are identified using feature selection methods, e.g., recursive feature elimination (RFE) [17].
SVM-RFE is one of the most widely used feature selection methods owing to its flexibility and simplicity. It can be applied to any model and produces the optimal set of features to achieve the best performance. In this study, we used SVM-RFE to analyze HV-related factors. Studies have applied this two-class classification algorithms using SVM and reported that the use of evaluation and diagnosis systems in the medical field has increased in recent years [18]. The RFE algorithm for nonlinear kernel allows ranking of variables but not comparison of the performance of all variables in a specific iteration.
Therefore, this study aimed to demonstrate the importance of HV-related factors via feature selection using SVM-RFE. Foot alignment was added to the basic items of age, sex, weight, and body mass index (BMI), and HV-related factors were analyzed via SVM-RFE.
Methods
Research design
This cross-sectional study included 928 participants aged ≥ 18 years. The participants were examined as a part of a foot health survey at sports and local health events, and a corporate health project was held in Osaka Prefecture in 2018–2020. The inclusion criteria were no pain or slight foot pain during loading and ability to walk. Among the participants, 48 for whom the date the photograph was taken was unclear and 16 for whom both feet could not be photographed were excluded. As a result, only 864 participants (353 males) were finally included in the analysis.
The collected data were analyzed to evaluate the accuracy of the extraction method for HV-related factors and feature selection via SVM-RFE using machine learning. The causal relationship between HV and the selected features was statistically investigated.
The study protocol was approved by the Research Ethics Review Committee of Osaka Kawasaki Rehabilitation University (approval no. OKRU29-A019). Furthermore, the study was conducted in accordance with the principles of the Declaration of Helsinki. The purpose and methods of the study were fully explained in advance to the participants, and measurements were performed after obtaining written informed consent. In addition, this study has been reported according to the STROBE guideline [19].
Evaluation using the MS and foot measurements
The summed scores of the MS on the horizontal image of the forefoot body surface were used to evaluate the presence or absence of HV [8]. A digital camera (RX-0, SONY, Tokyo, Japan) was used to capture foot images for MS. A horizontal image of the forefoot was taken to confirm that the second toe was in the middle position of adduction and abduction as much as possible. All participants were evaluated for HV using the MS by one examiner, a physiotherapist with 21 years of general physiotherapy clinical experience.
The criteria for the presence of HV were based on the grade classification of HV in MS: 0, no deformity; 1, mild deformity; 2, moderate deformity; and 3, severe deformity (Fig. 1) [11]. If the summed score of the right and left MS was ≥ 4, the patient was considered as having HV. In this study, the standard value indicating the presence of HV was 2 points on both sides or ≥ 3 points on one side, and a total of ≥ 4 points.
Furthermore, dorsal height (DH), foot length (FL), and arch height ratio (AHR) were evaluated [20]. In this study, the DH/truncated FL (TFL) was defined as the AHR. DH, FL, and TFL were measured using a foot arch height-measuring instrument (Takei Corp, Niigata, Japan). The mean of both feet measurements was used for SVM-RFE analysis.
Discriminant evaluation using SVM-RFE
SVM is a machine learning-based classification method that can be applied to pattern recognition and regression analysis. SVM-RFE was implemented using the RFE class in the feature selection module of scikit-learn. The SVM-RFE inputs were age, height, weight, BMI, average left–right FL, average DH, and average left–right AHR, all of which are numerical data. However, because the mean and scale of all items are different, the data were first standardized, and the mean and variance were set to 0 and 1, respectively. Then, we input the binary data of sex, history of foot injury, foot pain, and exercise habits and performed feature selection.
In this study, the linear function was used as the kernel function for feature selection. The value of the cost parameter (C), which determines tolerance for misclassification, needs to be determined and the accuracy of the prediction model needs to be evaluated. In the evaluation experiment, the fit rate, reproducibility, and accuracy of the model when C = 1.0 were obtained via a tenfold cross-validation.
Statistical examination
In this study, the HV-related factors were extracted via feature selection using SVM-RFE, and their accuracy was evaluated. The SVM-RFE classification was implemented at the Tokushima University. The extracted factors were statistically processed, and normality of the explanatory variables was confirmed using the Shapiro–Wilk test. Comparison of the basic attributes between male and female participants was performed using the unpaired t-test or Mann–Whitney U test. The χ2 test or Fisher’s exact test was used for analyzing the sex-related difference in participants with and without HV. Pearson’s or Spearman’s correlation coefficients were used to examine the correlation between MS score and each explanatory variable in female participants with HV. Statistical analysis was conducted using IBM SPSS Statistics version 28.0 (IBM Corp., Armonk, NY, USA) with a significance level of 5%.
Results
Comparison of sex differences in basic attributes and foot measurements of the participants
Data were collected from 928 participants aged 18–96 years, including university students, workers, citizen athletes, and community-dwelling older people. Among them, 48 participants with unclear imaging data and 16 whose feet could not be photographed were excluded. As a result, data from 864 participants (353 men and 511 women) were used in the final analysis. The questionnaire was self-administered and contained the items age, sex, height, and weight.
Comparison of the basic attributes and foot measurements between the male and female participants is demonstrated in Table 1. The results indicated that the male participants had significantly higher height, weight, BMI, FL, DH, AHR, and total MS score than the female participants. Female participants were significantly older than male participants.
Predictive model accuracy evaluation of feature selection using SVM-RFE
The results of the tenfold cross-validation using SVM-RFE are presented in Table 2. The features selected by training the prediction model were age, sex, weight, and mean DH, with the number of features selected for the parameters being 10, 10, 9, and 1, respectively. Age, sex, and weight were mainly related to HV.
The accuracy evaluation results of the prediction model are presented in Table 3. The fit, repeatability, and accuracy of the model were determined via a tenfold cross-validation, with the results being 30%, 73%, and 43%, respectively.
Sex difference in the HV rate
Table 4 presents the HV rates by sex as an associated factor, with 24.9% of the female participants having HV compared with 7.6% of the male participants.
Comparison of the percentage of male and female participants with HV based on their age group
The proportion of the participants by age group is presented in Table 5. The age groups were classified based on the life stages (adolescence, adulthood, and old age) presented in Health Japan 21, a guideline issued by the Ministry of Health, Labor and Welfare.
The proportion of female participants with HV increased with increasing age, with 89 (34%) participants aged ≥ 65 years, followed by 20 (23.3%) aged 45–64 years, and 10 (13.3%) aged 30–44 years. Similarly, the proportion of male participants with HV increased with increasing age, with 19 (23.7%) participants aged ≥ 65 years, followed by 7 (7.4%) aged 45–64 years. The proportion of female participants aged ≥ 65 years was higher than that of male participants.
In the comparison of sex differences by age, significant differences were observed in the proportion of women with HV in the age groups of 18–29, 30–44, and 45–64 years but not in the age group of ≥ 65 years.
Correlation between HV explanatory variables and total MS score
The correlations between HV explanatory variables and total MS score for female participants with a high prevalence of HV are presented in Table 6, and significant correlations were observed for age (r = 0.343), height (r = − 0.324), and FL (r = − 0.216) in female participants with HV in all age groups. In the significant correlation analysis were found for age (r = 0.478), height (r = − 0.483) in female participants with HV MS score in the age group of 45–64 years. No item showed a significant difference in female participants aged ≥ 65 years.
Discussion
To date, few studies have used machine learning to analyze joint deformities, and to the best of our knowledge, SVM-RFE has only been used for Kashin–Back disease, which involves alteration of hands [21]. In the present study, a feature selection algorithm was used to analyze HV-related factors using SVM-RFE.
The fit rate, reproducibility, and accuracy of the HV prediction model in this study were 30%, 73%, and 43%, respectively. This indicates that the explanatory variables of age, sex, weight, and DH were significantly related in 30% of the participants with HV but not in 73% of those without HV.
The final output of the SVM-RFE algorithm is a list of variables ranked according to their relevance. SVM-RFE is essentially a backward elimination method. However, the top-ranked variables are not necessarily the most relevant variables under the most relevant conditional on the specific ranked subset in the model [18]. Thus, the importance of HV-related factors depends on the number of variables. SVM-RFE algorithm allows the classification and ranking of variables but not the comparison of the performance of all variables, In other words, it is important to interpret the results in terms of their relationship to the response variable and other variables and the magnitude of the relationship. Therefore, in this study, statistical analyses were performed for HV-related factors.
Regarding feature-selected factors of HV, age, sex, weight, and DH were selected in that order. Statistical analysis of the selected factors revealed that women were predominant in terms of the sex ratio for HV. In this study, 24.9% of the female participants had HV compared with 7.6% of the male participants. In a questionnaire survey of 4,249 cases in the UK, HV was detected in 28.4% cases, with an odds ratio of 2.64, which was higher in men than in women [9]. In a study conducted on Japanese participants, the incidence rate of definite radiographic HV was 29.8%. Female sex was significantly associated with increased risk for HV (odds ratio, 1.71) [22]. A similar trend was observed in this study.
The mean age was higher in female participants with HV than in those without HV, and total MS score was found to be positively correlated with age (r = 0.478) in the age group of 45–64 years, suggesting that prime age is an important factor related to HV. Regarding age group, the proportion of HV was the highest in the older age group (aged ≥ 65 years) for both sexes but decreased in the younger age groups. In adolescents, HV was more common in female participants than in male participants, but sex difference was not observed in the age group of ≥ 65 years. As for the relationship between age and sex and HV onset, HV began to develop in female participants when they were in primary and secondary schools [2]. These findings are similar to the findings of the present study; however, in the present study, the incidence rate of HV was not different between older men and women. In an analytical study of 11,714 Japanese individuals aged 60–79 years, the incidence of HV increased with age in both men and women, with a significant increase observed between the age groups of 40–50 years and 50–60 years [23]. To prevent worsening of HV in women, it is important to inform them of appropriate shoe selection and preventive exercises for HV, especially for those aged ≥ 45 years.
The next most frequently extracted item in feature selection was weight, which did not show a valid statistical relationship with HV. In addition to weight, age, sex, obesity, inappropriate shoes, and physical activity were reported as risk factors for foot problems [13]. Studies have demonstrated that obesity is associated with reduced foot arches [24, 25]. Thus, obesity was reported to be associated with foot problems; however, in the present study, no correlation was observed between body weight and HV.
Foot height, which was extracted less frequently in feature selection, was not found to have the same trend as body weight in the statistical analysis. Foot height was related to arch deformities, such as flat feet and high arches, and previous studies have demonstrated that HV and flat feet are related [14]. However, some studies reported that HV was not related to flatfoot [26], and the correlation between foot arch reduction and HV development is still under debate. Furthermore, height was not correlated with HV in women aged 45–64 years, and it was not extracted in feature selection. This association between height reduction and HV may be due to fact that risk of spinal degeneration increases in HV (odds ratio1.75) [27], and not only the HV but also the alignment of other body parts tend to change in older age.
This study had some limitations. First, we used a cross-sectional design, and the causal relationship with HV onset cannot be fully explained. To conduct a more accurate analysis of HV-related factors, more specific models focusing on the severity of the deformity using a large dataset should be developed. Second, there were no data on the shoes used, such as high heels, which are considered to be related to HV, or on heredity. If these data were available, the accuracy and precision of the fit would further improve. Because HV is often treated with surgery when foot pain is severe and affects the patient’s quality of life, the factors associated with HV and foot pain should be examined in the future.
Conclusion
The results of this study indicated that age, sex, and weight were the most frequently extracted features; however, age and sex showed significant differences in the subsequent statistical analysis. Although SVM-RFE is an effective method for feature selection, further analysis using a large dataset is needed to show the causal relationship between feature selection and weight.
Availability of date and materials
The dataset used and analyzed during the current study is available from the corresponding author on reasonable request.
Abbreviations
- SVM-RFE:
-
Support Vector Machine-Recursive Feature Elimination
- HV:
-
Hallux Valgus
- BMI:
-
Body Mass Index
- DH:
-
Dorsal Height
- FL:
-
Foot Length
- AHR:
-
Arch Height Ratio
- TFL:
-
Truncated Foot Length
- MS:
-
Manchester Scale
References
Menz HB, Roddy E, Thomas E, Croft PR. Impact of hallux valgus severity on general and foot-specific health-related quality of life. Arthritis Care Res. 2011;63(3):396–404.
Nix S, Smith M, Vicenzino B. Prevalence of hallux valgus in the general population: a systematic review and meta-analysis. J Foot Ankle Res. 2010;3:21.
Menz HB, Auhl M, Spink MJ. Foot problems as a risk factor for falls in community-dwelling older people: a systematic review and meta-analysis. Maturitas. 2018;118:7–14.
Biz C, Favero L, Stecco C, Aldegheri R. Hypermobility of the first ray in ballet dancer. Muscles ligaments and tendons journal. 2012;2(4):282–8.
Singh D, Biz C, Corradin M, Favero L. Comparison of dorsal and dorsomedial displacement in evaluation of first ray hypermobility in feet with and without hallux valgus. Foot and ankle surgery: official journal of the European Society of Foot and Ankle Surgeons. 2016;22(2):120–4.
Hecht PJ, Lin TJ. Hallux valgus. Med Clin N Am. 2014;98(2):227–32.
Roddy E, Zhang W, Doherty M. Validation of a self-report instrument for assessment of hallux valgus. Osteoarthr Cartil. 2007;15(9):1008–12.
Garrow AP, Papageorgiou A, Silman AJ, Thomas E, Jayson MI, Macfarlane GJ. The grading of hallux valgus. The Manchester Scale. J Am Podiatr Med Assoc. 2001;91(2):74–8.
Roddy E, Zhang W, Doherty M. Prevalence and associations of hallux valgus in a primary care population. Arthritis Rheum. 2008;59(6):857–62.
Matsumoto T, Higuchi J, Maenohara Y, Chang SH, Iidaka T, Horii C, Oka H, Muraki S, Hashizume H, Yamada H, et al. The discrepancy between radiographically-assessed and self-recognized hallux valgus in a large population-based cohort. BMC Musculoskelet Disord. 2022;23(1):31.
Menz HB, Fotoohabadi MR, Wee E, Spink MJ. Validity of self-assessment of hallux valgus using the manchester scale. BMC Musculoskelet Disord. 2010;11:215.
Menz HB, Munteanu SE. Radiographic validation of the Manchester scale for the classification of hallux valgus deformity. Rheumatology (Oxford). 2005;44(8):1061–6.
Menz HB, Roddy E, Marshall M, Thomas MJ, Rathod T, Peat GM, Croft PR. Epidemiology of shoe wearing patterns over time in older women: associations with foot pain and hallux valgus. The journals of gerontology Series A Biological sciences and medical sciences. 2016;71(12):1682–7.
Nguyen US, Hillstrom HJ, Li W, Dufour AB, Kiel DP, Procter-Gray E, Gagnon MM, Hannan MT. Factors associated with hallux valgus in a population-based study of older women and men: the MOBILIZE Boston Study. Osteoarthr Cartil. 2010;18(1):41–6.
Fabeck LG, Zekhnini C, Farrokh D, Descamps PY, Delincé PE. Traumatic hallux valgus following rupture of the medial collateral ligament of the first metatarsophalangeal joint: a case report. J foot ankle surgery: official publication Am Coll Foot Ankle Surg. 2002;41(2):125–8.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Miao J, Niu L. A survey on feature selection. Procedia Comput Sci. 2016;91:919–26.
Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics. 2018;19(1):432.
van der Noord R, Paap D, van Wilgen CP. Convergent validity and clinically relevant categories for the Dutch Central Sensitization Inventory in patients with chronic pain. J Appl Biobehavioral Res. 2018;23(2):e12119.
Nakao H, Imaoka M, Hida M, Imai R, Tazaki F, Morifuji T, Hashimoto M, Nakamura M. Correlation of medial longitudinal arch morphology with body characteristics and locomotive function in community-dwelling older women: a cross-sectional study. J Orthop Surg. 2021;29(2):23094990211015504.
Zhang Y, Wei X, Cao C, Yu F, Li W, Zhao G, Wei H, Zhang F, Meng P, Sun S, et al. Identifying discriminative features for diagnosis of Kashin-Beck disease among adolescents. BMC Musculoskelet Disord. 2021;22(1):801.
Nishimura A, Fukuda A, Nakazora S, Uchida A, Sudo A, Kato K, Yamada T. Prevalence of hallux valgus and risk factors among japanese community dwellers. J Orthop science: official J Japanese Orthop Association. 2014;19(2):257–62.
Sakou S, Sugiura H, Enishi K, Inaba R. Gender difference in first phalangeal angle after age 40. Japanese Soc Med Study Footwear. 2011;25(2):150–4. (in Japanese).
Aurichio TR, Rebelatto JR, de Castro AP. The relationship between the body mass index (BMI) and foot posture in elderly people. Arch Gerontol Geriatr. 2011;52(2):e89–92.
Faria A, Gabriel R, Abrantes J, Bras R, Moreira H. The relationship of body mass index, age and triceps-surae musculotendinous stiffness with the foot arch structure of postmenopausal women. Clin Biomech (Bristol Avon). 2010;25(6):588–93.
Coughlin MJ, Jones CP. Hallux valgus: demographics, etiology, and radiographic assessment. Foot Ankle Int. 2007;28(7):759–77.
Hsu TL, Lee YH, Wang YH, Chang R, Wei JC. Association of hallux valgus with degenerative spinal diseases: a population-based cohort study. International journal of environmental research and public health 2023, 20(2).
Acknowledgements
We would like to thank the individuals who participated in data collection and data entry and the many students at Osaka Kawasaki Rehabilitation University for their assistance with examinations and measurements.
Funding
This work was supported by the Research Foundation of Josai International University presidents. The funding agencies had no role in the design of study and analysis, interpretation of data, and in writing of the manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: Hidetoshi Nakao. Data curation: Hidetoshi Nakao, Imaoka Masakazu, Mitsumasa Hida. Formal analysis: Hidetoshi Nakao. Data analysis and interpretation: Hidetoshi Nakao, Kenji Kita, Kazuyuki Matsumoto. Methodology: Hidetoshi Nakao, Mitsumasa Hida, Kenji Kita. Investigation: Hidetoshi Nakao, Masakazu Imaoka, Ryota Imai, Mitsumasa Hida, Misa Nakamura. Project administration: Masakazu Imaoka, Misa Nakamura. Writing-original draft: Hidetoshi Nakao. Writing-review & editing: All authors.
Corresponding author
Ethics declarations
Ethics approval and consent to participant
This study was conducted after obtaining approval from the Research Ethics Review Committee of Osaka Kawasaki Rehabilitation University (Approval no. OKRU29-A019) and was performed in accordance with the Declaration of Helsinki. All participants provided written, informed consent for data collection and storage.
Competing interests
The authors declare that they have no competing interests.
Consent of publication
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Nakao, H., Imaoka, M., Hida, M. et al. Determination of individual factors associated with hallux valgus using SVM-RFE. BMC Musculoskelet Disord 24, 534 (2023). https://doi.org/10.1186/s12891-023-06303-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12891-023-06303-2