LUMINOUS database: lumbar multifidus muscle segmentation from ultrasound images

Background Among the paraspinal muscles, the structure and function of the lumbar multifidus (LM) has become of great interest to researchers and clinicians involved in lower back pain and muscle rehabilitation. Ultrasound (US) imaging of the LM muscle is a useful clinical tool which can be used in the assessment of muscle morphology and function. US is widely used due to its portability, cost-effectiveness, and ease-of-use. In order to assess muscle function, quantitative information of the LM must be extracted from the US image by means of manual segmentation. However, manual segmentation requires a higher level of training and experience and is characterized by a level of difficulty and subjectivity associated with image interpretation. Thus, the development of automated segmentation methods is warranted and would strongly benefit clinicians and researchers. The aim of this study is to provide a database which will contribute to the development of automated segmentation algorithms of the LM. Construction and content This database provides the US ground truth of the left and right LM muscles at the L5 level (in prone and standing positions) of 109 young athletic adults involved in Concordia University’s varsity teams. The LUMINOUS database contains the US images with their corresponding manually segmented binary masks, serving as the ground truth. The purpose of the database is to enable development and validation of deep learning algorithms used for automatic segmentation tasks related to the assessment of the LM cross-sectional area (CSA) and echo intensity (EI). The LUMINOUS database is publicly available at http://data.sonography.ai. Conclusion The development of automated segmentation algorithms based on this database will promote the standardization of LM measurements and facilitate comparison among studies. Moreover, it can accelerate the clinical implementation of quantitative muscle assessment in clinical and research settings.


Background
The paraspinal muscles (e.g. multifidus and erector spinae muscles) are a group of three muscles that originate from the occipital bone and continue down the spine to the sacrum [1]. Among the lumbar muscles, biomechanical studies have provided evidence for the importance of the lumbar multifidus muscle (LM) and its role in the dynamic stabilization and segmental control of the lumbar spine [2]. Over two thirds of the stiffness of the spine is attributed to the behaviour of the multifidi, establishing the LM's importance in the neutral zone [3]. The neutral zone is described the range of intervertebral motion where spinal movement can occur with minimal internal resistance from the spine [4,5]. As opposed to all the lumbar muscles, the LM has the characteristic of being a large multifascicular muscle that has a high cross-sectional area (CSA) [2,4,6]. As such, its structure allows for large forces to be generated over smaller ranges of operation [4]. This further supports the LM's role of being a unit dedicated to providing lumbar spine stability [4,7]. Therefore, the LM's morphology (e.g. size, composition, asymmetry) and function (e.g. contractile ability) have become of great interest to researchers and clinicians involved in lower back pain (LBP) and muscle rehabilitation [2].
LBP is one of the most prevalent medical complaints, and it is estimated that between 60% to 80% of the population will experience at least one episode in their lifetime [8][9][10]. More importantly, the recurrence rate is extremely high and this common musculoskeletal condition is very disabling, and it severely affects the quality of life. Furthermore, it is projected to have an even higher personal and socio-economic burden as the world's population ages [11,12]. A large body of evidence confirmed that LM muscle structural changes (e.g. atrophy and increased in fatty infiltration) and functional deficits (e.g. decreased or increased contraction) occur in patients with LBP [13][14][15][16]. Along with LM and spinal dysfunction, such changes are also associated with lower physical function [17][18][19][20], poorer surgical outcomes [21,22], and the recurrence of LBP symptoms [23,24].
To date, magnetic resonance imaging (MRI), computed tomography (CT) scan, and ultrasound (US) have been used to quantify paraspinal muscle morphology. While MRI provides excellent soft tissue contrast and resolution and is the gold-standard imaging modality, it remains costly and its accessibility is limited. US is a portable, cost-effective, and non-ionizing imaging modality, providing a non-invasive method to obtain real-time in-vivo images for the assessment of LM morphology and function [25]. More specifically, US has been used to quantify the LM CSA, and CSA side-to-side asymmetry, as well as LM thickness in resting and contracted states to assess muscle activation (e.g. contraction) [26][27][28]. Additionally, measurements of the echo intensity (EI) can also be obtained using computer-aided gray scale analysis. EI has been investigated in studies related to muscle morphology, changes related to neuromuscular disorders, and studies investigating the relationship between muscle EI and size [29,30]. Moreover, EI is used as an indicator of fatty infiltration and connective tissue which can be subsequently used to assess muscle quality [30][31][32].
Biomechanical modelling of the spine requires accurate measurements of the LM CSA for use in analytical processes that determine levels of LM wasting or injury [33]. In US, CSA measurements can be obtained by imaging the transverse section of LM [2]. The muscle's border is then delineated from the rest of the surrounding tissue through manual segmentation. US examination requires training and experience, and the analysis and interpretation of the images are prone to subjectivity. Additionally, US assessments in the clinical setting are subject to issues concerning procedural and measurement reliability [25]. Procedural and measurement reliability are defined as the ability of an examiner to consistently and repeatedly perform the imaging procedure and measurements of the region of interest in the muscle, respectively [25]. However, due to the shape of the LM varying from one patient to another, and from one spinal level to another, examiners performing manual segmentations often encounter technical challenges affecting the quality and reliability of these measures. One of the major limitations of LM segmentation in US images is to determine the boundaries between the LM and the surrounding tissues [34]. Thus, the manual segmentation process of US images is highly rater-dependent, error prone, and can be labour intensive, which can limit its clinical applicability [35]. Therefore, the development of automated segmentation methods is warranted and would strongly benefit clinicians and researchers by decreasing the workload while simultaneously producing accurate and reliable segmentations that are comparable to expert manual segmentations [36].
The advent of deep learning has introduced many tools which are currently used to carry out various diagnostic tasks in medical US analysis. Moreover, as deep learning is being widely used in medical US analysis, its application continues to benefit from the ongoing research efforts made to further its state-of-the-art performance [36,37]. Although recent efforts and studies have emphasized on US segmentation tasks using deep learning approaches, there is a limited amount of literature pertaining to the segmentation of skeletal muscle [37,38]. Thus, it would be beneficial to support US segmentation tasks of musculoskeletal muscles such as the LM. Nevertheless, the development of automated segmentation methods requires manually annotated clinical datasets, which are currently scarce. Therefore, the purpose of this work is to provide a publicly available US database with the ground truth of the left and right LM at the L5 level, in both prone and standing positions, intended for the development of automated segmentation algorithms. To the best of our knowledge, this is the first publicly available US database of LM muscle.

Subjects' description
The database contains 109 US datasets of young athletic adult volunteers who are involved in select varsity teams at Concordia University (64 males, 45 Table 1.

US image acquisition
The 109 athletes underwent a US procedure to obtain LM images at the L5 level in both the prone and standing positions. The LOGIQ e ultrasound machine (GE Healthcare, Milwaukee, WI) was used with a curvilinear probe with its imaging parameters maintained at the following values for all image acquisitions: frequency: 5 MHz, gain: 60, depth: 8.0 cm [39]. Only the LM muscle was assessed, as it is the most commonly examined muscle amongst the paraspinal muscle group using US and is the most sensitive to spinal pathology. All data collection was performed by one of the investigators (M.F.) who applied a consistent and repeated technique throughout all image acquisitions: pressure was maintained on the adjacent hand and forearm handling the probe so as to prevent tissue deformation on the region of interest through transducer pressure. The acquisition of images in the prone position consisted in having the subjects lie in the prone position on a therapy table with a pillow underneath their abdomen to decrease lumbar lordosis [8]. To assess LM CSA, transverse US images were obtained bilaterally. For subjects with larger muscles, the right and left sides were imaged unilaterally. Similarly, LM CSA measurements were obtained in the standing position, where subjects stood in their habitual standing posture [39]. The images were stored as separate datasets for each subject in *.tif format.

US image segmentation
The ground truth segmentations of LM CSA and LM EI measurements in prone and standing positions were performed on the acquired data using Fiji, a distribution of the ImageJ image processing software [40].
The ground truth segmentations for all measurements were manually obtained by one of the investigators (C.B.) who in preparation for this study, received training from another investigator (M.F.) with over 10 years of experience in spine imaging analysis. The inter-rater reliability between both investigators was examined on a set of 18 images and interclass correlation coefficient (ICC 2,1 ) varied between 0.93-0.99. Images of subjects where the characteristic structures and landmarks of the LM could not be clearly distinguished were excluded from the database. All ground truth segmentations for each subject are available as binary masks and stored as separate *.tif files.

Database availability
The database is available at http://data.sonography.ai. The B-mode images and binary segmentation masks for each subject are deposited as *.tif files.

Data organisation & file naming conventions
The database separates the B-mode images of each subject into a folder named "B-mode" and the masks into a folder named "Masks". The datasets of subjects and corresponding binary segmentation masks are labelled with the same subject ID (1 to 109). The best available images (e.g. frames) for each subject were chosen for the segmentations. Since images were acquired bilaterally in some cases and unilaterally in subjects with larger muscles, different file naming conventions were used for the B-mode images as well as their corresponding masks. Table 1 can be used to verify whether a frame corresponds to either the right or left side, as well as whether the frame is in the prone or standing position.

Unilateral file naming conventions
For the subjects where the images were acquired unilaterally, the B-mode images and masks have a one-to-one correspondence. The file names for the B-mode images and masks have the following generic format: X_Y_Bmode.tif and X_Y_Mask.tif, where X is the subject ID, and Y is the frame number. As an example, 50_3_Bmode.tif would have a corresponding mask 50_3_Mask.tif. This can be seen in Fig. 1a and b.

Bilateral file naming conventions
For the subjects where images were acquired bilaterally, the file names for the B-mode images and masks have the following generic format: X_Y_Bmode.tif and X_Y_MaskZ.tif, where X is the subject ID, Y is the frame number, and Z is a value of 1 or 2 used as an identifier to distinguish between the right and left side, respectively. As an example, 46_1_Bmode.tif would have corresponding masks 46_1_Mask1.tif and 46_1_Mask2.tif. This can be seen in Fig. 2a and b.

Discussion
Due to portability, cost-effectiveness, and efficiency, clinicians and researchers widely use US as an imaging modality in their screening and diagnostic procedures over other imaging modalities such as MRI, CT and X-ray. However, US presents its own set of disadvantages relating to the task of manual segmentation. Due to speckle noise in US images, manual segmentation is highly raterdependent and thus, is susceptible to errors which affect LM analysis and results. As such, the development of powerful segmentation algorithms can help mitigate the aforementioned issues. Deep learning techniques can be employed to extract features from the data and can then be used to perform automatic US image segmentation [41]. Although potential applications of deep learning algorithms have been demonstrated for MRI and microscopy modalities, very few have focused on algorithms applied to US [37]. Furthermore, the performance of deep learning algorithms is highly dependent on a high volume of quality data. The availability of public repositories on clinical data pertaining to LM muscle images are scarce, and thus greatly limit the development and testing of the segmentation algorithms. As such, our aim with this study was to provide the first publicly available US database of the LM.
This database is comprised of 109 subjects with the ground truth of the left and right LM at the L5 level in both prone and standing positions. The ground truth data can enable the development of deep learning algorithms used for automatic segmentation tasks related to the LM. Given the volume of the annotated data, the developed algorithm can have a better generalization capability through proper parameter tuning and data augmentation [41]. Moreover, deep learning algorithms can exploit the morphological features that trained experts use to perform their segmentations [41,42]. Furthermore, the algorithms can produce comparable results to those of the examiner [36,43]. As such, examiner subjectivity during assessment of muscle morphology can be reduced. In addition, it would greatly benefit clinicians and researchers whilst enabling them to perform assessments in a practical and time-efficient manner.
This database contributes and dedicates itself to advancing the development of automatic segmentation algorithms related to the assessment of LM muscle morphology. However, this database only includes young athletic adults aged between 18 and 26 years old. Within the dataset, there are natural variances in age, BMI, and other underlying conditions which may differ from one participant to the next. As such, algorithms which are developed using our dataset should be mindful of these limitations and foresee difficulties in accurate segmentation when subjected to samples from other populations, a problem commonly referred to as domain shift. Thus, future efforts need to be made to extend this database to include sedentary and older adults, which are more representative of the general population suffering from LBP. When viewing US image of younger muscle (e.g. higher fluid content), contrasting echogenicities of hypoechoic (toward black) muscle and hyperechoic (toward white) fascia allow for easier tissue differentiation and identification of key landmarks [44]. With ageing, there is a natural increase in fibrous tissue and thus the distinction between muscle and fascia is more difficult [25,45]. As such, this database should be treated as a platform of an ongoing process towards the automatization and standardization of LM muscle measurements from US images.

Conclusion
Herein, we presented the LUMINOUS database which contains manual segmentations of LM images at the L5 level obtained via US as well as their corresponding binary masks. The database is comprised of 109 datasets, which will enable the development of automated segmentation algorithms of the LM. This database will provide a means to support the standardization of US measurements, facilitate comparison between studies and accelerate the clinical implementation of quantitative muscle assessment in clinical and research settings.