In the present study, LCSH suggested a frequency up to 2.6% of UPDs. The UPD frequency found by SNP microarrays varies among studies. Analyzing the results of 227 individuals neurodevelopmental disorders in a highly consanguineous population of the United Arab Emirates, Alabdullatif et al. [14], using the ISCA 4X180K platform (60,000 SNPs) suggested UPD for 1%. Sasaki et al. [29], analyzing the data of 170 parent-child trios (510 samples) for 5 HapMap populations (159 Americans from Utah, USA, with ancestry from northern and western Europe; 33 Africans with ancestry in the southwestern USA; 81 Maasai from Kinyawa, Kenya; 174 Yoruba from Ibadan, Nigeria; and 63 Mexicans with ancestry in Los Angeles, California) with the Affymetrix 6.0 Array, found regions suggesting homozygous UPD in < 1% of them. However Bruno et al. [30], found potential UPDs in 4% of the microarray samples (250 K, Affymetrix) from 117 patients of Australia with neurodevelopmental disorders, considering LCSH > 5 Mbp.
It should be noted that CMA technology only shows UPD regions in isodisomy, not detecting UPDs with total heterodisomy. Single long stretches of homozygosis, may also reflect homologous repair through a break induced DNA replication mechanism [31].
Of the eleven potential UPDs (Table 1 and Fig. 2) found in our study only three were on chromosomes related to imprinting syndromes, cases # 169 and # 346 on chromosome 7 and # 312 on chromosome 14, with two of them also carrying a pathogenic CNV, as discussed below:
Case # 169 presented an LCSH of 19 Mbp in the chromosomal region 7q21.13q31.1 (90,678,991-109,653,423). It refers to a girl, 9 years old at the time of the exam, with learning difficulties, non-specific dysmorphisms, ophthalmopathies, and short stature. However, she also had a 15 Mbp deletion involving the entire small arm of chromosome 18 [18p11.32p11.21 (136,226-15,181,666) × 1], which results in monoallelic loss of more than 184 genes, including 55 genes listed in the OMIM database, undoubtedly a pathogenic CNV [32]. There was no report of intrauterine growth retardation for the patient, the most typical characteristic of UPD(7)mat, which causes Silver Russel Syndrome, therefore, if the LCSH found represents a UPD(7), it is probable that it is a UPD(7)pat, known to not cause a pathology except when it generates autozygosis for a recessive mutation [33].
Case # 346 presents an LCSH of 10.6 Mbp in 7p14.3p14.1 (29,374,797-40,699,189). The participant was a 15 years old male adolescent at the time of examination, with development delay (DD), severe intellectual disability (ID), epilepsy, short stature, absence of speech, gastroesophageal reflux and cerebellar atrophy. In the analysis of CNVs no variation of clinical relevance was detected, only the LCSH in chromosome 7 suggesting a UPD. The UPD(7)mat causing Silver Russel syndrome, besides intrauterine and postnatal growth retardation, is characterized by short stature, triangular face, as well as mild to moderate intellectual deficiency, and speech and language difficulties in a portion of the patients [33]. The speech difficulties of the participant could be consequence of the decreased expression of the FOXP2 gene, common in patients with UPD(7)mat, which causes speech apraxia [34]; otherwise, severe ID, epilepsy and cerebellar atrophy are not phenotypes associated with UPD(7)mat. Considering the hypothesis of a gene whose mutation in autozygosity could explain the patient’s phenotype, the genes present in the LCSH were analyzed. The LCSH region encompasses 151 genes, of which 47 are OMIM genes, of which 8 are involved in diseases with an AR inheritance pattern: Ehlers-Danlos syndrome (FKBP14), Hemolytic anemia (NT5C3A), Diaphanospondylodysostosis (BMPER), Primary ciliary dyskinesia (CILD6), Pyle disease (SFRP4), Glutaric acid III (C7ORF10), Bardet-Biedl syndrome (BBS9) and Trichothiodystrophy-4 (MPLKIP). Of these, only Trichothiodystrophy syndrome 4 was considered a possible cause of the patient’s available phenotype. Trichothiodystrophy 4 or Non-photosensitive Neurocutaneous Syndrome (OMIM #234050) is due to the homozygous mutation of the MPLKIP gene, a protein involved in cell growth and replication. It is a rare AR disorder with a wide variety of clinical features including cutaneous abnormalities, DD, mild to severe ID (depending on the mutation), microcephaly, short stature, ocular abnormalities and infections, which accompany the most characteristic phenotypes of trichothiodystrophy: brittle hair and low levels of sulfur [35, 36]. This result was forwarded to the patient’s physician, who shall examine whether the patient has the hallmark of this disorder, which is the brittle hair. That would give a reliable diagnostic answer without necessarily undergoing further molecular investigation.
Case # 312, suggestive of UPD(14), with an LCSH of 28.1 Mbp in 14q13.2q23.2, refers to a male adolescent, 11 years of age at the time of testing, with abnormal brain structure, speech delay, learning disability and non-specific facial dysmorphia. However, this patient also presented a microdeletion of 2882 Kbp on chromosome 22q11.21 (18,916,842-21,798,907), a pathogenic CNV that causes DiGeorge syndrome [37], which was considered to be the cause of his clinical picture. The UPD(mat)14 causes Temple syndrome, characterized by short stature, hypotonia, motor delay, precocious puberty, small hands and feet, usually a normal intellect, eventually with learning difficulties [38]. The UPD(pat)14 causes Kagami-Ogata syndrome, a serious condition characterized by marked skeletal abnormalities [39], not present in the adolescent. Given the possibility that the patient has a true UPD, it cannot be excluded without further investigation that, in addition to DiGeorge syndrome, his condition is accentuated due to an UPD(mat)14.
For chromosomes 1, 2, 10, 16, 17 and 22, where the remaining LCSH suggestive of UPD have been detected, there is no known association with imprinting syndromes related to ND in humans. Of these cases, four also presented a CNV that was considered causal, one which indicates a complex mechanism for the origin of the LCSH. For the remaining 4 cases, there is a possibility that the phenotype is caused by unmasking a AR autozygous mutation, inherited from only one of the parents. The cases are described below.
Case # 25 presented a LSCH of 15.4 Mbp in 1q25.3q31.3 (182,537,598-197,949,082), suggesting a possible UPD(1). It refers to a young male, 16 years old on the date of the examination, which presented obesity, DD, ID, speech and/or language delay and non-specific dysmorphisms. The homozygous region in question encompasses 119 genes, of which 50 are OMIM with 10 of them involved in AR disorders. However, the CMA detected a pathogenic duplication in Xq27.3q28 (arr [hg19] 146,418,810-151,604,987) × 2, including the gene FMR1 [40], that was considered the cause of the patient’s condition.
Case # 129 presented a LCHS of 15.1 Mbp at 1p31.3p31.1 (61,620,929-76,755,163) suggesting UPD(1). The participant was a 4 years old boy at the time of the test, with DD, speech delay and autism. There are 155 genes in this homozygous region, among them 54 OMIM, with 10 related to AR disorders. Congenital glycosylation disorder, type Ic (OMIM # 603147), caused by the homozygous mutation in the ALG6 gene, leads to psychomotor retardation with delayed walking and speech, hypotonia, seizures, mild to severe ID and sometimes enteropathy [41, 42]. This result was passed on to the patient’s physician, who shall examine whether the condition could explain the patient’s symptoms and, even if considered a possible cause, whole exome sequencing (WES) is advised with focus on the LCSH, more specifically, the ALG6 gene.
Case #147 refers to a 4 years old boy at the time of the test, who presented DD and autism. In this case, three blocks of LCSH on chromosome 2 were found: 2p12p11.2 (9.9 Mbp; 79,211,952-89,129,064), 2q11.1q14.3 (33 Mbp; 95,341,387-128,342,675), 2p24.1p14 (45.9 Mbp; 22,170,065-68,067,589) totaling ~ 89 Mbp suggesting strongly a UPD(2). The LCSH regions comprise 1049 genes, of which 371 are OMIM genes and 62 related to AR disorders. The informed phenotypes of this child are not specific enough to allow the suggestion of causal genes. The possibility of the homozygous segments to harbor an AR mutation was passed on to the physician. To investigate this section, the most indicated procedure would be a WES, with an analysis directed to the LCSH regions.
Case #76 refers to a boy, 12 years old at the time of the examination, with DD, mild ID, autism and nonspecific facial dysmorphia. The CMA showed a 12Mbp LCSH at 10q25.2q26.13 (112,544,654-124,513,498), a region encompassing 130 genes, of which 56 are OMIM, including 6 genes related to AR disorders. However, the etiology of the patient’s phenotypes was attributed to a pathogenic duplication found in 7q11.23 (arr[hg19]72,556,215-74,245,599)× 3, related to Williams-Beuren duplication region syndrome (OMIM # 609757) [43].
Case # 204 refers to a girl, one-year old at the time of the test, who was referred for examination because of restricted intrauterine growth, oligohydramnios, low birth weight, low stature, hypotonia, camptodactyly, DD, speech delay, facial dysmorphia (trigonocephaly, epicanthus, downslanting palpebral fissures), and atrial septal defect. The CMA showed a 12.5 Mbp LCSH at 16p13.3p13.13 (89.560–12.548,052). The suggestive finding of a UPD on chromosome 16 is quite plausible as a consequence of a trisomy rescue, since trisomy 16 is the most common autosome trisomy reported in human miscarriages, with a 1–2% incidence in clinically recorded pregnancies, unviable as a pure trisomy [44]. A mosaic trisomy 16 is possible and has been described in association with UPD(16)mat, so much that this UPD has been considered as its bioindicator. Many of the registered cases of UPD(16)mat are consequent to findings of trisomy 16 in prenatal examinations of chorionic villi. Clinical characteristics associated with UPD of 16 are heterogeneous, and the affected individuals may present intrauterine growth retardation, malformations (often severe) and dysmorphisms [45]. Scheuvens, et al. 2017 concluded that the UPD(16) probably has no phenotype by itself and that the deleterious phenotypes found are caused by an often undetectable mosaicism of the trisomy 16 (including a possible effect of a trisomic placenta).
In case # 47, a 12.3 Mbp LCSH in the 17q22q24.2 (53,332,043-65,633,600) was detected in the CMA of a girl, 8 years old at the time of the test, with short stature and anomalies of the upper and lower limbs. The LCSH suggesting the UPD(17) contains 238 genes, including 93 OMIM genes of which 15 are involved in AR disorders. However, the microarray in this case also showed a deletion on the X chromosome (arr[hg19]Xp22.33[679,520-950,907]× 1) involving the pseudo-autosome SHOX gene, that in haplo-insufficiency is the cause of the Leri-Weill dyschondrosteosis syndrome [46], which was considered the cause of the patient’s # 209 refers to a boy, 5 years old at the time of the examination, presenting DD and speech delay. A 13.5 Mbp LCSH at 22q12.1q13.1 (26,504,838-40,021,614), suggestive of UPD(22) was detected on the patient’s CMA. The LCSH encompasses 289 genes, among these 154 OMIM, with 17 genes related to AR disorders, however no obvious candidate gene was identified. WES with focus on the LCSH region is advised.
Case # 443 refers to a 2-year-old boy referred for examination with low weight, low stature, DD, Mongolian spots, poor ear formation, speech delay, autism, aggression and behavior. The genomic findings of this case call attention to the peculiarity of having a region of approximately 13.2 Mbp in homozygosis on chromosome 22q13.1q13.33 (37,977,281-51,157,531), accompanied by a micro-triplication of 2.8 Mbp resulting in four copies of the affected segment 22q12.3q13.1(35,888,588-38,692,765)× 4, which includes part of the homozygous region. Additionally the CMA also reveals a mosaic gain in the homozygous regions 22q13.1q13.33 (37,933,985-51,197,766)× 2–3. Two similar cases with an interstitial triplication (resulting in four copies) followed by uniparental isodisomy (isoUPD) for remainder of the chromosomal arm were described earlier [47]. The authors explained the origin of this type of alteration through a microhomology-mediated break-induced replicational DNA repair mechanism inducing copy number gains and segmental isoUPD in tandem. However, in the presented cases the breakpoint was the same for the copy-number gain and the isoUPD, whereas in our case the tetrasomic region is partially isozygotic (for about 650 Kb). In addition, virtually the whole LCSH segment is in a 2–3 copy mosaicism, challenging the understanding of the mechanism that originated this alteration, although mosaicisms are occasionally found with genetic alterations and with UPD mosaicisms [48]. It seems likely that the much-increased gene dosage is responsible for the patient’s phenotype, however the contribution of an AR mutation in autozygosis cannot be ruled out.
The LCSH detected by CMA with SNPs are mostly explored to communicate to the requesting clinician the possibility and probable grade of parental kinship when excessive homozygosity is found, to alert about the higher possibility of a recessive disorder and the risk of recurrence for future pregnancies. They still are not widely explored in the clinical analysis to investigate potential UPD and eventual derived imprinting disorders or for the search for genes related to AR disorders. The main reason for this is that the interpretation of single or a few LCSH is very speculative and most findings will require further molecular investigations, for instance, methylation analysis in case of suggestive UPD in an imprinted chromosome. In some cases, the in silico analysis of the disease-causing recessive genes in the LCSH region will indicate a clear candidate gene whose mutation can explain the clinical features of the patient. For most cases, however, when the phenotype is not very pronounced and/or the LCSH region contains a high amount of disease-causing genes, WES would be the logical option, since the LCSH directs the focus of analysis, with a higher probability of identification of the causal gene [6, 48, 49]. However, in Brazil and most other countries, WES is a high-cost exam that rarely is covered by medical insurance.
The LCSH pattern of ~ 18% of the individuals indicated distant (sixth or seventh grade) descent, which may be due to regional characteristics of immigration and marriages within the same ethnic group of immigrants in the south of Brazil. In the State of Santa Catarina there is a substantial German and Italian immigration, and until recently there was well known resistance among Germans and Italians to marry outside their ethnic groups and religious beliefs, especially in the rural region. When the kinship indicated by the LCSH is distant and more related to endogamic characteristics of the population, it decreases the probability of a clinical relevance.
For 8% of the individuals, the LCSH suggested a parental relationship from first to fifth grade; these are more likely to suffer a clinical impact, since the closer the kinship the greater the proportion of shared alleles, and therefore the risk of inheriting two copies of an AR mutation [6]. When the parental relationship is very close, such as second or even first grade, the patient is most likely affected by an AR mutation. However, the extraordinary high percentage of homozygosity in their genomes (12 to 25%) does not greatly restrict the target region in the genome to speculate for about causal genes, and WES will be the most likely tool to identify the mutation.
In clinical setting the greatest difficulty in analyzing LCSH is not to know if they are relevant enough to be analyzed or reported. When they are so numerous as to indicate parental consanguinity, this excess of homozygosity usually is reported. When the analysis of LCSH goes further than that, those that suggest UPD are also reported. The greatest doubt is what to do with LCSH that are not as large as to raise a UPD hypothesis, but still, are over 3–5 Mb. Should they be ignored or checked for recessive genes? To do that with every sample is very time consuming and very speculative. Knowing which LCSH are common and potentially can be considered a characteristic of the population, allows to focus the analysis on the most relevant LCSH. And that seems to be the approach of groups which analyze larger samples. The LCSH that are found recurrently in patients and unaffected parents are considered common variation and ignored from a certain point on [13, 25,26,27,28]. Following the same rationale and using similar criteria, LCSH ≥3 Mbp found in a frequency of 5% or higher, we identified 11 LCSH that were considered as common variation in our population and now report them to contribute with the growing evidence.
Of the LCSH identified as frequent in our data-set (Table 3) all, except 7q11.22q11.23 (71,997,278 -76,128,151), have been found to be common LCSH by other groups in clinical investigation of patients with developmental disorders [25,26,27,28]. Those LCSH are assumed to represent regions of low recombination (ancestral haplotype blocks) and interpreted as potentially nonpathogenic.
The common LCSH regions found in our population, 16p11.2p11.1 (31,957,367-35,220,544), 11p11.2p11.12 (47,178,984 -51,550,787) and 3p21.31p21.2 (48,712,421-52,852,488) were also reported by Wang et al. (2015) [26] as recurrent LCSH with no clinical relevance in a cohort of patients with ND including unaffected parents; by Kearney H. M. [28] in a frequency > 5% in CMA (CytoScan® HD, Affymetrix) reads from affected individuals; and also by Sanchez P. [27], who considered common LCSH to be at > 3% frequency in the CMA (CytoScan® HD, Affymetrix) samples from a cohort of 278 affected Hispanics. Pajusalu et al. (2015), also reported the same regions on chromosome 3 and 11 as recurrent LCSH with a frequency of 9.3 and 6%, respectively, using as minimal cut-off the size of 5 Mb, in the investigation of 2110 consecutive Estonian patients (including pre-natal samples and parents).
The regions 1q21.2q21.3 (147,215,796-150,287,884), 2q11.1q11.2 (95,341.387-98,776,856), 1p33p32.3 (49,205,205-53,121,054) and 6p22.2p22.1 (26,340,871-30,006,805) were not reported by Wang [26], but are common findings in the Kearney H. M. [28] and Sanchez P. [27]. The region on chromosome 6 was also a common finding (4.9%) by Pajusalu et al. (2015). Moreover, our common LCSH on 20q11.21q11.23 (31,940,638-36,081,725) was also reported by Sanchez P. [27] and Pajusalu et al. (2015). Finally, regions 15q15.2q21.1 (42,423,100-45,726,314) and 10q22,1q23,31 (73,824,169-77,212,167), were only reported previously in the samples of Sanchez P. [27].
We found no previous report of the LCSH at 7q11.22q11.23 (71,997,278 -76,128,151), encountered in the population of Santa Catarina at a frequency of 5%. This homozygous region is not related to any gene with known imprinting pattern in humans [50] and covers 102 known genes, of which 46 are listed on OMIM, including with 3 genes related to AR disorders: a less common form of chronic granulomatous disease (#233700), Antley-Bixler syndrome with genital anomalies and disordered steroidogenesis (# 201750), and also disordered steroidogenesis due to cytochrome oxidoreductase P450 (# 613571).
The LCSH considered as frequent and common in the present study partially corroborate studies of Yang et al. [7], who emphasized that frequent regions of LCSH occur in the vicinity of centromeric gaps, exemplifying their findings on chromosomes 8, 9, 11, 16, 19 and 20 in the Asian population, and chromosomes 8, 9, 11, 16 and 19 in the Caucasian population. Our study found common LCSH in the vicinity of the centromeric gaps in above-mentioned chromossomes 11 and 16, but also on chromosomes 1, 2, 7 and 20, and intersticial LCSHs on 1p, 3p, 6p, 10q and 15q (Fig. 3).
As mentioned before, athough a pathogenic implication of the LCSH reported as common seems unlikely, these results should be considered with caution, in particular because we have no knowledge of the haplotypes that are segregating. To affirm unambigously that the recurrent LCSH found in affected cohorts have no clinical implication, they should appear in similar frequencies in the non-affected population. This only will be known in the future, as data from affected and normal populations accumulate. Haplotype investigation of common LCSH found in our cohort is a future goal and will provide additional relevant information.