Systematic data-querying of large pediatric biorepository identifies novel Ehlers-Danlos Syndrome variant

Background Ehlers Danlos Syndrome is a rare form of inherited connective tissue disorder, which primarily affects skin, joints, muscle, and blood cells. The current study aimed at finding the mutation that causing EDS type VII C also known as “Dermatosparaxis” in this family. Methods Through systematic data querying of the electronic medical records (EMRs) of over 80,000 individuals, we recently identified an EDS family that indicate an autosomal dominant inheritance. The family was consented for genomic analysis of their de-identified data. After a negative screen for known mutations, we performed whole genome sequencing on the male proband, his affected father, and unaffected mother. We filtered the list of non-synonymous variants that are common between the affected individuals. Results The analysis of non-synonymous variants lead to identifying a novel mutation in the ADAMTSL2 (p. Gly421Ser) gene in the affected individuals. Sanger sequencing confirmed the mutation. Conclusion Our work is significant not only because it sheds new light on the pathophysiology of EDS for the affected family and the field at large, but also because it demonstrates the utility of unbiased large-scale clinical recruitment in deciphering the genetic etiology of rare mendelian diseases. With unbiased large-scale clinical recruitment we strive to sequence as many rare mendelian diseases as possible, and this work in EDS serves as a successful proof of concept to that effect.


Recruitment
The proband (and sister) were recruited at The Center for Applied Genomics at The Children's Hospital of Philadelphia under a large-scale clinical recruitment project, A Study of the Genetic Causes of Complex Pediatric Disorders, aimed at recruiting over 100,000 Children with both common and rare medical disorders. This study was approved by the institutional review board of The Children's Hospital of Philadelphia. The proband and his family members consented to release of deidentified medical and genomic information for publication. For the proband and sister, this was in the form of parental consent. All study participants blood was drawn when they presented for a clinic visit. They consented to prospective genomic analysis and analysis of electronic medical records (EMRs). Recruitment was not targeted, and the patients came to our attention through systematic data querying of de-identified EMR records from over 80,000 individuals recruited over the last six years. Cross-referencing our biorepository with rare-disease codes (as defined by Online Mendelian Inheritance in Man (OMIM) code 225410 and International Statistical Classification of Diseases and Related Health Problems 9 (ICD9) code 756.83), we identified the EDS proband who had longitudinal EMRs that included a detailed family history and relatively extensive medical history of both parents.

EMR querying
Our broad consent protocol includes extraction of DNA, blood/sera, access to EMRs, and the permission to perform a wide range of genomic studies in consenting individuals. The EMR product, which captures over one million clinical office visits annually, is EpicCare®, (Epic Systems, WI), and includes the reason for visit, diagnosis codes (ICD-9-CM format), growth measurements, medications prescribed, and referrals made. The EMR is directly interfaced with three laboratory information systems, including one internal and two external (Quest Diagnostics, LabCorp) lab systems. All laboratory test results are available in a structured format, mapped to Current Procedural Terminology (CPT) codes, and Logical Observation Identifier Names and Codes (LOINC) as needed. All of this information is captured and moved into our repository in a structured format for consented research participants. We also receive radiology reports, surgical notes, hospital discharge summaries, and Emergency Department summaries and dictated specialist reports.
Patient data is encrypted and integrated into a custom phenotype browser, which includes all EMR, as well as additional survey data collected at recruitment (family history/diagnoses etc.). This allows a multidirectional flow of de-identified phenotype information to be directly coupled with an automated analysis pipeline, utilizing high-efficiency software tools. Only encrypted/deidentified data reaches the phenotype browser, where it can be coupled with genotyping, sequencing and other data that are available on the subjects. Of the patient DNA samples that have consented to genetic studies, we have >99 % availability of samples available for regenotyping under CLIA/CAP certification or NGS, with the~1 % loss mainly due to small amounts of blood being obtained from the youngest children or ineffective DNA extraction from saliva, but saliva samples constitute less than 5 % of our DNA sample sets, 95 % of which are blood-derived.
Diagnostic codes (e.g. 756.83) and keyword (e.g. "Ehlers Danlos, EDS") keyword searches identified a list of 119 patients, all of which included access to lab and diagnostic reports as described above. Patient data was then re-identified through a reverse-encryption process, which allowed our clinical team (who have no access to genomic data) to conduct a complete review of the proband's entire medical chart (including medical letters) and family medical history. Of these 119 patients, only in the case of the identified proband did we have access to the samples of a rather complete pedigree. For this family we had the proband being diagnosed with EDS and father and sister having EDS like phenotype the mother and maternal half-brother were normal.

Proband's medical history
The proband is currently nine years for age. He was delivered at full term through vaginal delivery. The birth was traumatic and forceps and suctioning were used. His birth weight was 4.1 kg. He was diagnosed with EDS at the age of 9 months, following admission for joint laxity and ankle crepitus. He had slightly delayed development, with ambulation at approximately 14 months. At two years of age, he was referred to a neurologist with amblyopia, who noted moderate gross motor delay, increased shoulder girdle weakness and central hypotonia. At 2.5 years of age, delays in his fine motor skills and speech articulation were recorded. He had difficulty chewing and had weak mouth muscles. Difficulties in fine motor skills were recorded again at five years of age, and abnormal gait, poor coordination, and developmental delay were noted at multiple time points. At 9 years-of-age, he was at the 99 th percentile for both height (148.4 cm) and weight (48.3 kg) based on the Center for Disease Control and Prevention (CDC) weight-for-age data. He is normocephalic, and has a fibrous anterior fontanel. He has a slightly triangular face with slight mid-face flattening. His adipose tissue in his lower jaw and neck is decreased, and he has blue sclerae.
He has a normal palate. He has a I/VI heart murmur with no clicks and regular rate and rhythm.
The proband presented with a history of recurrent joint pain and was found to have increased mobility in his metacarpal joints, proximal and distal interphalangeal joints, and significant thumb hypermobility. He has increased flexibility of his wrists, while his elbows can be slightly hyperextended. He also has a hyperextensible ear. He has decreased movement in his ankles, which are also slightly contracted and unable to be dorsiflexed beyond 90 0 . Hypermobility improved from ages 3 to 9 years (current age), during which time he received extensive occupational and physical therapy. He does not have any scoliosis or kyphosis. He has had two patella dislocations, and has been admitted for fractures (distal radius and ulna buckle) and sprain (ankle). Bone density, however, is normal (z-score = 0.3 SD) as per dual-energy X-ray absorptiometry (DXA) scanning at 9 years.
Other problems include urinary incontinence, ureterectasis (hydroureter), and profound hydronephrosis. At age six years, he had two posterior urethral valve surgeries, requiring a suprapubic tube. He has a history of antibody deficiency and has been receiving intravenous immunoglobulin (IVIG) therapy every 3 weeks. He has exotropia (his left eye drifts out) and amblyopia (commonly known as "lazy eye"). He is developmentally delayed, and a diagnosis of ADHD was recorded at 7 years of age. EEG at 2 years of age was normal. "Components of autism" were recorded once, at age 7.5 years. "Sensory integration issues" were recorded once at 6 years of age. He has decreased adipose tissue, decreased muscle mass and hypotonia. He has a history of asthma with exacerbation, and reports daily fatigue. No cardiovascular abnormalities have been recorded. His medical history was otherwise significant for dysphagia, recurrent vomiting and chronic constipation.
Hitherto, no genetic anomalies have been identified. At 2.9 years, he was screened for collagen/procollagen abnormalities. None were detectedcells synthesized and secreted types I and III procollagens normally, and the efficiency of conversion of procollagen to collagen, was similar to control cells. These findings make it unlikely that he has a form of osteogenesisimperfecta (OI) [4][5][6][7][8], as cells from more than 85 % of individuals with clinically recognizable OI have alterations in either the amount or the structure of type I procollagen synthesized and typically result in a decrease type I procollagen production by~50 %. OI types II (OMIM 166210), III (OMIM 259420), and IV (OMIM 166220) are also unlikely, as they are typically caused by mutations affecting the structure of the pro-alpha 1(I) and pro-alpha 2(I) chains of type I procollagen, and result in substitution for glycine residues in the triple-helical domains of these chains.
These findings exclude most forms of EDS type IV, which typically results from mutations that affect the synthesis, structure or secretion of type III procollagen. They also exclude some forms of EDS type VII that are caused by genomics deletions that affect the conversion of procollagen to collagen. Genetic screening tests for mutations in known EDS genes, including TNXB and COL3A1 were also negative.

Family history
A complete pedigree ( Fig. 1) is not available, but the proband's family history does record several details pertinent to this study. The proband's sister is seven years of age, and was last seen two years ago. There is evidence of EDS with incomplete expression, with relevant (endo) phenotypes including hypermobility, slow-healing scars, and urinary incontinence. Ultrasound of kidney and bladder reveal no structural abnormalities. Likewise, cardiovascular functions appear normal, and osseous structures are unremarkable.
We do not have direct access to the medical records of the father, but a detailed family history of the proband reveals evidence of an EDS-type phenotype. The father has had multiple joint dislocations, wounds that did not heal well, and multiple minimum trauma fractures. He had shoulder surgery following multiple dislocations.
The proband has a paternal uncle with suspected EDS, including skin elasticity, and poor healing. He also has a history of multiple shoulder dislocations and fractures with trauma. A paternal aunt is also mentioned in the family history, which evidences hypermobility (reportedly able to make a "praying sign behind her back"). She also has a history of resting tremors, and poor skin healing. The proband's paternal grandmother has several EDS-like symptoms, including blue sclera, and joint hypermobility. Similar to the paternal aunt, she can make a "praying sign behind her back." She also has a history of recurrent fractures (including shoulder fractures requiring surgery). The proband's mother does not have any history of EDS-like phenotypes, although she did report severe temporomandibular joint disorder. A maternal half-brother has scoliosis but no other bone or joint symptoms.
The mother (3380), father (1966) and affected male proband (2113) had their whole genome sequenced. For validation affected sister (0281) and half-brother (4736) of the proband were Sanger sequenced with the other three family members.

DNA sample collection
Blood samples from the proband, sister, maternal halfbrother and parents were collected by the Center for Applied Genomics at Children's hospital of Philadelphia, with consent for prospective genomic analyses, access to EMRs, and re-contact for further studies.

Whole genome sequencing and variant calling
Genomic DNA was isolated from the blood samples by standard methods. Whole genome sequencing was performed using Illumina®HiSeq2000 on three of the members of family (proband and both parents) following the manufacturer's instructions. All raw reads were aligned to the reference human genome build 37(UCSC hg19) using the Burrows-Wheeler alignment [9] (BWA, 0.6.2). Optical and PCR duplicates were marked and removed with Picard (v.1.73). Local realignment of reads in the indel sites and quality recalibration were performed with the Genome Analysis Tool Kit v1.6 [10]. SNPs and small INDELs were called with GATK UnifiedGenotyper [11]. ANNOVAR [12] and SnpEff [13] were used to annotate the variants. Fold coverage statistics are listed in (Table 1) and illustrated in (Fig. 2).

Variants validation through sanger sequencing
Sanger sequencing of the variants (Fig. 3) was performed in order to confirm the heterozygous mutation identified by the variant detection analysis (see below). ABI 3730 sequencer with ABI BIgDye Terminator Cycle Sequencing kit was used. The reference genome sequence was obtained from UCSC Genome browser. The PCR primers were designed for the ADAMTSL2 mutation using the PRIMER3 (http:// bioinfo.ut.ee/primer3/) open source software. PCR was performed using the following primers: 1) ADAMTS-L2-forward primer sequence: CAGGAGACCAACGA-GGTGTG; 2) ADAMTSL2-reverse primer sequence: TGGCAGCTCTTAGGAACCTC. The resulting *. ABI files from the sequencer were loaded into ABI Sequence Scanner version 1.0 for further analysis. The sequence was manually reviewed to ensure reliability of genotype calls.

Results
We recently identified an EDS patient, with Type VIIC form of EDS. Our screen for mutations in ADAMTS2, known to be associated with this sub-type and were negative. There were no mutations in any other known EDS-associated genes. Screenings for collagen/procollagen abnormalities were also negative. The patient's family history includes a father and sister with EDS-like symptoms, as well as a number of other family members with EDS-type (endo) phenotypes. This pedigree information suggests an inheritance pattern consistent with autosomal dominant inheritance, possibly with variable expressivity.
Variant-calling produced 52 candidate loci, which are listed below. While ADAMTSL2 is the most likely source of the EDS phenotype, one or more of these loci may be contributing to the disease pathogenesis.
The whole genome analysis of proband (2113) yielded 3,546,583 variants. We identified about 248,772 SNV's that are shared by the father, proband, and not present in the mother as we assumed it to be dominant model of   inheritance. These were analyzed for non-synonymous mutations, which alter the amino acid sequence of a protein. We used ANNOVAR [12] for annotation of variants and used standard variant reduction method to reduce the number of candidate variants. As such we excluded variants that were in intronic regions, synonymous changes, and those identified by the dbSNP131 and the 1000 Genome Project [14]. This filtering resulted in a subset of novel non-synonymous candidate variants ( Table 2). None of the known EDS genes (Table 3) showed mutations in the proband. However, manual curation and literature survey of the 52 variants resulted in the discovery of a mutation (c.1261G > A, p. Gly421Ser) in the ADAMTSL2 gene ( Table 4). As none of the other 52 variants resided in genes that were putatively candidates for causing EDS, we considered the (c.1261G > A, p. Gly421Ser) variant in the ADAMTSL2 gene to be the most likely causative source of EDS in the family as it segregated in keeping with a disease-causing variant in this family. This heterozygous mutation (c.1261G > A, pGly421Ser) was detected in the proband (2113), affected father (1966) and, affected sister (0281). The mother (8380) and unaffected maternal halfbrother (4376) were homozygous for the wild type allele. The mutation was confirmed by Sanger sequencing (Fig. 3). This process is illustrated in (Fig. 4). Mutation Characteristics SIFT Polyphen Scores and AA Change. We identified a nonsynomous mutation resulting in an amino acid change-G1261A-at position 13641800 on chromosome 9.

Discussion
This study illustrates the success of the research strategy where by systematic data querying of phenotype data from a large biorepository can be cross-referenced with disease codes to identify the most likely cause of a rare disease. Because we are a research/treatment hub for children with rare disease such as EDS, our biorepository is disproportionately enriched for phenotypes such as these. As such, we anticipate this approach will resolve other rare and unidentified diseases in the future.
The A Disintegrin and Metalloproteinase with thrombospondin motifs (ADAMTS) family is associated with a range of biological processes, including formation/regulation of connective tissue structure, coagulation, arthritis, angiogenesis, and cell migration [15]. The ADAMTS superfamily includes 19 distinct members that are secreted enzymes and at least seven ADAMTS-like proteins without enzymatic activity [15][16][17][18]. The ADAMTS have two domainsthe catalytic domain at the N terminus and the ancillary domain at their C terminus. These are required for interaction and are distinct from A Disintegrin and Metalloproteinases by 1+ type I repeats. They are synthesized as inactive zymogens, and post-translational modification is activated by their N-terminal propeptide, which is cleaved by furin [15]. ADAMTS-like proteins are different from ADAMTS ancillary domains in that they are lacking catalytic activity, due to the lack of protease domain.
Mutations in the ADAMTS family have been associated with EDS. Specifically ADAMTS2 mutations are associated with Type VIIC, which is characterized by dermatosparaxis. This manifests in extreme skin fragility, spontaneous ruptures of internal organs, as well as premature rupture of membranes during pregnancy [19]. Colige et al. [20] were the first to identify mutations responsible for EDS VIIC, which were found in the gene encoding the procollagen protease ADAMTS2, located on chromosome 5qter (5q35.3) (originally identified under the alias procollagen I -Proteinase). They identified six EDS VIIC individuals, five of whom were homozygous for a C → T transition that resulted in a premature termination codon, Q225X, of which four were further homozygous at three downstream polymorphic sites (His 286 (CAT), Asp 676 (GAT), and Asp 844 (GAT)). A sixth patient was homozygous for premature termination codon, Q225X. These are all consistent with recessive inheritance.
Three additional mutations have been identified in the same gene in two patients: one had a homozygous deletion of exons 205, a second was compound heterozygous for a missing exon 3 (maternal) as well as a deletion that caused in-frame skipping of exons 14-16 [21]. This second patient constituted the first example of compound heterozygosity as the cause for EDS VIIC. Three more patients from the same study had variations found in normals, which was paralleled by silent functionality. All five EDS VIIC patients presented with facial dysmorphisms, hypermobility, skin and muscle fragility, and (at least) four of were born prematurely (~32-34 weeks). Recently, a newborn with EDS VIIC was described as presenting with congenital skull fractures and skin lacerations [19].
Adapted from Le Goff & Cormier-Daire (2008), Relevant Diseases Associated with THE ADAMTS (L) family. Mutations in ADAMTS2 are associated EDS VIIC. A mutation in ADAMTSL2 has previously been associated with geleophysic dysplasia, whose phenotype includes joint limitation A mutation in ADAMTSL2 has previously been associated with geleophysic dysplasia (GD) [OMIM: 231050], where the phenotype includes joint limitation. ADA-MTSL2 encodes a secreted glycoprotein that binds the cell surface and extracellular matrix, and interacts with latent transforming growth factor beta binding protein 1. GD is autosomally recessive, and can also include severe short stature, facial dysmorphism, a 'happy' face, progressive cardiac valvular thickening, respiratory insufficiency, and lysosomal-like storage vacuoles in different tissues. Le Goff et al. [22] used in situ hybridization to human fetal tissue to demonstrate ADAMTSL2 mRNA expression in cardiomyocytes, epidermis, dermal blood vessels, tracheal wall, skeletal muscle, pulmonary arteries, and bronchioles of the lung. Strong expression was also observed in chondrocyte columns in the hypertrophic and reserve zones of the proximal femoral growth plate. The group used a yeast 2hybrid screen of a human muscle cDNA library, and found that ADAMTSL2 bound to latent TGFβbinding protein-1, which is important to regulating TGFβ availability and storing TGFβ in the extracellular matrix. While unsubstantiated by current medical records, there is a possible relationship here with the proband's antibody deficiency and history of recurrent infection.
Allali S et al. [23] recently identified 33 patients, of whom 19 had a mutation in ADAMTSL2. The cohort collectively shared 13 mutations. The authors compared phenotypes in those that did/did not carry an ADAMTSL2 mutation, finding that the main discriminating features were facial dysmorphism and tiptoe walking. These phenotypes were absent from the proband and his family. To our knowledge, ADAMTSL2 is not associated with any other rare disorder.
We note that Table 1 also includes a number of loci that may also be associated with EDS and other phenotypes reported here. Of particular interest is a heterozygous non-synonymous single nucleotide variation in A disintegrin and metalloproteinase with a thrombospondin type 1 motif, member 13 (ADAMTS13), which is also known as von Willebrand factor-cleaving protease  Mutations in ADAMTS2 are associated EDS VIIC. A mutation in ADAMTSL2 has previously been associated with geleophysic dysplasia, whose phenotype includes joint limitation (VWFCP). Although this is an exonic splicing site, it may be of functional significance. ADAMTS13 is a zinccontaining metalloprotease enzyme that cleaves von Willebrand factor (vWf )-a large protein involved in platelet adhesion and in aggregating vascular lesions. While this demonstrates a success for data-querying of de-identified EMR data, we stress the importance of clinical phenotyping, which can only be accomplished on a local level. Nevertheless, we demonstrate the utility of unbiased large-scale recruitment and biobanking in deciphering the genetic etiology of rare Mendelian diseases. Families with most other Mendelian disorders for which genetic variants have been found were biasedly recruited by clinical experts. The majority of rare Mendelian diseases may go unsequenced under such a recruitment model. With unbiased large-scale clinical recruitment we strive to sequence as many rare Mendelian diseases as possible, and this work in uncovering a new mutation underlying EDS serves as a successful proof of concept to that effect.

Conclusion
Although EDS type VII has been described as an autosomal-recessive disorder, the identification of heterozygous ADAMTSL2 mutations demonstrates a dominant form of EDS, strictly fulfilling the diagnostic criteria for EDS VII. Future studies delineating the specific function of the ADAMTSL2 protein will reveal how this novel mutation and others in N-glycan-rich module contribute to the disease phenotype.

Consent
The proband and his family members consented to release of de-identified medical and genomic information for publication.