Source: TEXAS A&M UNIVERSITY submitted to NRP
EXPLORING THE GENOMIC COMPONENT OF EQUINE SEX DEVELOPMENT AND REPRODUCTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030115
Grant No.
2023-67015-39784
Cumulative Award Amt.
$645,000.00
Proposal No.
2022-08309
Multistate No.
(N/A)
Project Start Date
May 1, 2023
Project End Date
Apr 30, 2026
Grant Year
2023
Program Code
[A1211]- Animal Health and Production and Animal Products: Animal Reproduction
Recipient Organization
TEXAS A&M UNIVERSITY
750 AGRONOMY RD STE 2701
COLLEGE STATION,TX 77843-0001
Performing Department
(N/A)
Non Technical Summary
Normal development of gonads and sex characteristics are complex genetically and hormonally regulated processes that form the foundation for animal production and reproduction. Disorders of sex development are clinically heterogeneous conditions with negative implication on reproduction by affecting sex determination, development of gonads, and/or the formation of internal and external male and female sex characteristics. Many cases and forms of disorders of sex development have been described in horses, of which most appear spontaneously, while a few are familial. Despite of causing infertility, several forms of disorders of sex development in horses occur recurrently or in families, suggesting the involvement of inherently unstable regions in the horse genome. Disorders of sex development present an important problem both to the animals and their owners due to the negative impact on fertility, behavior, performance, health and well-being, and possible spreading in families. They can also be of considerable financial concern to the owners and breeders, particularly if they occur in elite pedigrees.At the same time, current knowledge about the molecular causes of equine disorders of sex development is sparse and no candidate or causative genes or risk factors are known for most. The situation is similar in other domestic species. Here we initiate systematic research on the genomic component of three recurrently observed equine disorders of sex development: (i) the first group comprises normal-looking and chromosomally normal (64,XX) mares who are sterile due to underdeveloped (infantile) ovaries and uterus. Clinical description of these mares is identical or very similar to the mares that have X-monosomy (63,X) - a single X chromosome instead of the normal two; (ii) the second group comprises female-like horses who are genetically male with a normal 64,XY male karyotype and intact Y chromosome with a normal 'maleness' gene, SRY, and (iii) the third group comprises horses with normal 64,XX female karyotype, but clinical characterization as intersex, hermaphrodite or of ambiguous sex. We will carry out a comprehensive study of the genomes of these horses by using three advanced and complementary whole genome analysis platforms combined with detailed hormonal profiling. The immediate goals are to identify a set of candidate DNA sequence variants for horse disorders of sex development, relate those with the corresponding hormonal profiles and clinical phenotypes, and design genetic tests for molecular diagnostics. The improved knowledge about the genomic regulation of sex development in the horse will have likely translational impact on other domestic animals.
Animal Health Component
25%
Research Effort Categories
Basic
75%
Applied
25%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30138101080100%
Knowledge Area
301 - Reproductive Performance of Animals;

Subject Of Investigation
3810 - Horses, ponies, and mules;

Field Of Science
1080 - Genetics;
Goals / Objectives
We will initiate systematic research on the genomics of cytogenetically normal equine disorders of sex development (DSDs) with a focus on three clinical phenotypes:i) 64,XX females with X-monosomy-like gonadal dysplasia;ii) 64,XX intersex horses, andiii) SRY-positive 64,XY individuals with female-like or ambiguous sex phenotype.We will refine clinical phenotypes by hormonal profiling and will use three different whole genome (WG) analysis platforms, viz., short-read WG sequencing, long-read WG sequencing, and optical genome mapping, for candidate variant discovery in individual cases.Our long-term goal is to advance knowledge regarding molecular factors regulating equine biology, development, and reproduction with the important application to improve the methods for genomic predictions and clinical diagnostics. We hypothesize that these DSDs are caused by mutations affecting one or several sex determination and differentiation pathway genes.The immediate goals of the proposed research are to identify a set of candidate sequence variants for horse DSDs, relate those with the corresponding hormonal profiles and clinical phenotypes, and identify variants for further genetic analysis to determine causation. These goals will be achieved through three objectives:Objective#1: Discover candidate sequence variants for DSDs by whole genome multi-platform analysis and comparison to a large catalog of normal variation. Whole genomes of horses with DSDs will be studied by short- and long-read WGS and optical genome mapping. The WG data will be bioinformatically analyzed against our comprehensive equine variant database for the discovery of rare candidate variants for DSDs.Objective#2: Generate endocrine profiles of equine DSDs for refined phenotyping and to complement sequence variant discovery. Endocrine profiles will be generated for horses with DSDs to characterize gonadal status and hormonal function. The data will refine clinical phenotypes of the cases but will also provide complementary information for genome analysis.Objective#3: Candidate variant genotyping in large DSD cohorts. TaqMan and PCR assays will be designed for the sequence variants of interest from Objective #1 and genotyped in a large cohort of DSD horses for the discovery of additional carriers.
Project Methods
Samples and phenotypes. The project will use a collection of 169 clinically phenotyped and karyotyped horses with three DSD phenotypes:64,XX females with gonadal dysplasia, n=77;SRY-positive 64,XY female-like horses, n=33;64,XX intersex horses, n=59.Of these, 30 horses, 10 from each phenotype group, will be analyzed by Illumina short-read whole genome sequencing (WGS) and optical genome mapping (OGM). For each group, 10 cases with the most similar and detailed clinical phenotype description and availability of additional samples (serum, frozen blood, primary fibroblasts, hair, gonads) will be selected. Long-read PacBio WGS will be conducted on 6 cases, two most characteristic cases per phenotype group. Animals selected for OGM and PacBio must have cryopreserved blood samples or primary fibroblasts for high molecular weight DNA extraction.Whole genome sequence (WGS) data generation and analysisWe will generate WGS data by three complementary platforms to identify variants, genomic regions, and pathways that are consistent across the platforms and cases in efforts to improve the currently limited knowledge about the molecular causes of DSDs in horses. Short-read Illumina WGS and analysis: 30 horses, 10 per phenotype group. Genomic DNA (gDNA) samples will be checked for quality and quantity and used for individually barcoded libraries with 450 base-pair (bp) fragment size. The libraries will be sequenced using 2x150 bp across 5 lanes (600-750 Gb per lane) on the NovaSeq 6000 S4 platform. The sequences will be aligned with our custom alignment and variant calling pipeline that leverages SpeedSeq.0.1.2. to our trio-binned assembly ECAnp4 concatenated with horse Y chromosome reference eMSYv3.1. Single nucleotide and small indel variants will be called with GATK HaplotypeCaller (GATK) leveraged in the GPU-enabled rapid-calling algorithm Parabricks. Larger structural variants will be called using a series of callers including Lumpy Delly2, and Manta and re-genotyped using Paragraph to build a union callset. The data will be analyzed for regions of homozygosity and examined for divergent linkage disequilibrium (LD) patterns. All this will be compared to similarly obtained data from the genomes of ~750 non-DSD horses. Variants of interest are expected to be found in DSD horses but not in general non-DSD population. Unique variants discovered in cases will be analyzed for functional consequences and potential link to the DSD phenotype using a suite of prediction tools such as Provean, Ensembl's Variant Effect Predictor, snpEff, and published mutations for DSDs in human and other species.Long-read PacBio WGS and analysis: 6 horses, 2 per phenotype group. High molecular weight (HMW) gDNA will be isolated from blood or primary cell lines with the Nanobind CBB kit (Circulomics) and evaluated for fragment size >25 kb by pulse field electrophoresis. gDNA isolation, QC, PacBio library preparations, and CLR sequencing on Sequel II platform (2 SMRT cells per horse) will be done at the PacBio facility of University of Maryland. CLR data will be aligned to the horse reference with Minimap2 and variants called with Sniffles. Unique/rare and functionally relevant variants will be identified by alignments with horse reference genome and available equine variant database.Optical genome mapping (OGM) and analysis: 30 horses, 10 per phenotype group. HMW gDNA will be isolated from frozen blood or fibroblast cell lines using Bionano SP Blood & Cell Culture DNA Isolation Kit. Genomic HMW DNA will be labeled with methyltransferase DLE-1 at the recognition motif CTTAAG using Bionano DLS DNA Labeling Kit. This generates ~15 labels per 100 kb. Labeled DNA molecules will be applied to Bionano G1.2 flow cells, linearized in nanochannels, and scanned with a fluorescence microscope. The captured images will be converted to electronic representations of the DNA molecules. The data will be filtered to a minimum molecule length of 150 kb and minimum 9 labels per molecule. To identify structural variants, the filtered molecules will be de novo assembled, and the consensus maps of the molecules were aligned to horse references EquCab3 and ECAnp4 (which will be in silico cleaved with the same DLE-1 enzyme). Additionally, OGM maps of individual horses will aligned directly to each other. Structural variants will be called by Bionano software module "SV pipeline", as well by manual inspection of the generated data using the browser-based interface of the Bionano Access version 1.7.1.1 and Solve version 3.7.Generation of endocrine profiles of 85 DSD horses: i) n=45 mares with X-monosomy-like gonadal dysplasia,ii) n=30 64,XX intersex, andiii) n=10 SRY-positive 64,XY female-like or intersex horses.The cohort includes the 30 horses subjected for WG analysis described above.Gonadal status and hormonal function will be characterized by analyzing 4 hormones: anti-Müllerian hormone (AMH), inhibin -B, testosterone, and progesterone. Collaborators at UC Davis have established normal reference ranges for these hormones in stallions (n=145) and mares (n=1100). Hormone analyses for AMH and Inhibin-B will be conducted using commercial ELISA platforms. Testosterone will be measured by radioimmunoassay after ether extraction of serum or plasma using polyclonal antiserum and a tritiated tracer. Progesterone will be measured after ether extraction by an enzyme-linked immunoassay using polyclonal antisera. The results will be compared with known normal ranges of these hormones in stallions and mares. The obtained endocrine profiles will refine clinical phenotypes of the three groups of DSD horses but are also expected to provide additional information for the search of candidate genomic regions, thus complementing the WGS analysis.Candidate variant genotyping in all 169 DSD horsesScreening tests will be designed using the WGS data and depending on the type of variant identified. TaqMan assays will be designed for ~ 20 SNVs and indels and conventional PCR tests for ~ 10 structural variants.TaqMan assays for SNVs and indels. Custom TaqManTM SNV genotyping assays will be designed for SNVs and indels according to manufacturer's specification (Applied Biosystems). We will use CFX-96 Real Time-PCR machine (Bio-Rad) and corresponding software for PCR amplifications, genotyping and allelic discrimination. Genotyping will be done in 8 µL reactions containing 0.208 µL of TaqManTM assay, 30 ng template DNA, and 4.2 µL of ABI TaqMan Universal Master mix, no UNG (Applied Biosystems).Conventional PCR: Where applicable, conventional qualitative PCR assay will be designed with Primer3 to flank medium size structural variants (deletions, inversions), so that different size PCR products can discriminate between homozygotes and heterozygotes.

Progress 05/01/24 to 04/30/25

Outputs
Target Audience:International equine and animal research community; undergraduate and graduate students; the Equine Industry; horse breed associations; veterinarians/theriogenologists; horse breeders and horse owners. Changes/Problems:The use of thesingle, containerized pipeline (WAGS) to standardize variant calling when aligning short read data to a reference genome and call variants across more than 1100 individual horse genomes to identify novel candidate genes and variants is critical for the progress of this project. However, running this pipeline by our collaborators at the University of Minnesota has taken longer time than initially planned. Therefore, we realistically anticipate that we will request a non-cost extension of this project for 12 months. What opportunities for training and professional development has the project provided?During this reporting period, the PI and co-PIs of the project have trained one postdoctoral fellow, one graduate student and five undergraduate students working on parts of the project. The findings have been presented at several international and national meetings, symposia and conferences. How have the results been disseminated to communities of interest?Communities of interest for this research include but are not limited to veterinarians, horse breeders and owners. Through our Molecular Cytogenetics and Animal Genetics services which provide clinical karyotyping and parentage and disease testing, respectively, we stay in touch via emails and phone calls with the communities of interest. The fact that research on the genomics of equine disorders of sex development is ongoing, already encourages collaboration through which new cases have been identified, and additional samples procured. Ongoing research and the preliminary findings have also been disseminated through invited online video presentations at British Equine Veterinary Association (BEVA) Advanced Reproduction Discussion Forum. What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period we will accomplish the following: Short-read whole genome data analysis continues with a hypothesis-free genome-wide variant discovery to identify additional potential variants not captured by the candidate gene investigation. We will use an expanded equine genome variant database to identify genome-wide candidate variants in cases. We will use a single, containerized pipeline (WAGS) to standardize variant calling when aligning short read data to a reference genome. This pipeline drastically reduces resource usage and run time and is amenable to very large datasets. Variant calling will be done across more than 1100 individual horse genomes including the 91 cases and 200 in-house controls already enrolled in this project. This will potentially identify new genes involved in the observed phenotypes. Long-read WGS data is analyzed for the discovery of impactful structural variants through the ongoing horse pangenome project which includes 84 haplotypes of which 36 (18 long-read genomes) have been generated in course of this project. Briefly, all horse genomes will be aligned to each other to create a pangenome graph using minigraph-cactus. Identified unique sequence variation and differences in genome organization between the haplotypes will then be analyzed for potential phenotypic impact, especially in cases where variation overlaps with coding sequence and regulatory regions. A previous graph version identified over 32,000 structural variants meet these criteria. Short-read whole genome data will then be aligned to the pangenome graph for combined analysis using the variant graph (vg) toolkit to increase the likelihood for identification of novel variation and understanding of genome organization. Continue evaluation of the identified candidate variants/mutations for effects on protein function and structure using predictive modeling. Present the findings at Plant and Animal Genome, PAG (annual, San Diego, CA); biennial International Havemeyer Foundation Horse Genome workshop; annual meetings of Texas Genetics Society and Texas Forum for Reproductive Sciences.

Impacts
What was accomplished under these goals? WGS data generation. By now, short-read WGS data has been generated for 91 cases, of which 18 cases have also long-read WGS data. With this, sequence data generation is complete and exceeds initial plans. WGS data analysis and candidate variant discovery. We are investigating chromosomally normal DSD cases of three clinical phenotypes: (i) females with X-monosomy-like gonadal dysgenesis (n=31); (ii) 64,XY SRY-positive female-like (n=19), and (iii) 64,XX intersex (n=41). All 91 case genomes have been aligned to the recently available T2T Thoroughbred horse reference genome assembly and genotyped with a cohort of 200 in-house horse genomes as controls. The genomic variants were evaluated for potential impact on 194 candidate genes involved in sex determination, sex differentiation, and ovarian dysgenesis. The variants were filtered for high predicted impact on a gene known to cause similar phenotypes in other mammals, rare (< 2%) in the overall population, and homozygous in the case. The findings are: 48 missense, nonsense, frameshift, or indel mutations were identified in 34 genes across 31/91 (34%) DSD cases. Four genes (AR, FREM2, FRAS1, STAG3) had mutations in two or more cases. 10 cases had multiple mutations, either across different genes or within the same gene but at different positions. Notably, the NR0B1 gene had the same mutation in two cases from different phenotype groups - intersex and X-monosomy-like. Mutations in the Androgen Receptor (AR) gene were exclusively found in 64,XY SRY-positive female-like horses. Altogether, we identified 7 different AR mutations in 7/19 (37%) cases of this phenotype group. All mutations affected AR function resulting in androgen insensitivity syndrome. 13 mutations were identified in genes associated with human ovarian dysgenesis in 7/31 (23%) mares with X-monosomy-like phenotype. Identified variants are currently being evaluated for effects on protein function and structure using predictive modeling. Endocrine profiles for anti-Müllerian hormone (AMH), inhibin B, and testosterone have been obtained for 64 horses: 22 intersex, eight 64,XY SRY-positive female-like, and 30 X-monosomy-like. With this, the goals of Objective #2 are accomplished. The endocrine data are combined with variant discovery to better understand the effect of candidate genes and mutations. Current findings suggest that: Equine DSDs are implicated by many genes. Most cases have mutations in two or more genes suggesting additive effect. Different mutations in the same gene (AR) can affect protein function and phenotype in a similar way (androgen insensitivity).

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2025 Citation: 4. Hailey Anderson, Sam Stroupe, Alan J. Conley, Casey Caruso, Rebecca Cotterman, Rytis Juras, Brian W. Davis, Terje Raudsepp. 2025. Equine Disorders of Sex Development Involve Mutations in Several Sex Development Key Genes. Plant and Animal Genome 32, January 10-15, 2025, San Diego, USA. Platform and Poster presentations.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2025 Citation: 5. Hailey Anderson, Sam Stroupe, Rytis Juras, Brian W. Davis, Terje Raudsepp. 2025. Mutations in Key Genes Involved in Equine Disorders of Sex Development: A Focus on the Androgen Receptor Gene. VMBS Trainee Research Symposium, February 6, 2025, College Station, USA. Poster and Flash Talk presentations.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2025 Citation: 6. Hailey Anderson, Sam Stroupe, Rytis Juras, Brian W. Davis, Terje Raudsepp. 2025. Mutations in the Androgen Receptor Gene and Other Sex Development Key Genes Are Associated with Equine Disorders of Sex Development. Texas Genetics Society, March 20-22, 2025, College Station, USA. Platform presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2025 Citation: 7. Hailey Anderson, Sam Stroupe, Rytis Juras, Brian W. Davis, Terje Raudsepp. 2025. Mutations in the Androgen Receptor Gene and Other Key Genes Associated with Equine Disorders of Sex Development. Equine Science Society Symposium 2025, June 3-6, Fort Collins, Colorado, USA. Platform presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2025 Citation: 8. Hailey Anderson, Sam Stroupe, Rytis Juras, Brian W. Davis, Terje Raudsepp. 2025. Candidate Gene Investigation for Equine Disorders of Sex Development. 40th International Society for Animal Genetics (ISAG) Conference, July 20-25, 2025, Daejeon, South Korea.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2025 Citation: 9. Sam Stroupe, Jonah N. Cullen, Sian A Durward-Akhurst, Matteo Paini, Massimo Delledonne, Jessica Petersen, Terje Raudsepp, Ted Kalbfleisch, Molly McCue, Brian W. Davis. 2025. Moving Towards Personalized Pangenomic Veterinary Medicine in Equids. 40th International Society for Animal Genetics (ISAG) Conference, July 20-25, 2025, Daejeon, South Korea.


Progress 05/01/23 to 04/30/24

Outputs
Target Audience:International equine and animal research community; undergraduate and graduate students; the Equine Industry; horse breed associations; veterinarians/theriogenologists; horse breeders and horse owners. Changes/Problems:No problems to report. The project is progressing as planned. What opportunities for training and professional development has the project provided?During this reporting period, the PI and co-PIs of the project have trained two undergraduate students and one graduate student working on parts of the project. The findings will be presented at 14th International Havemeyer Foundation Horse Genome Workshop, May 12-15, 2024, Caen, France. How have the results been disseminated to communities of interest?Communities of interest for this research include but are not limited to veterinarians, horse breeders and owners. Through our Molecular Cytogenetics and Animal Genetics services which provide clinical karyotyping and parentage and disease testing, respectively, we stay in touch via emails and phone calls with the communities of interest. Even though, we do not have, yet, candidate genomic regions or variants identified, the fact that research on the genomics of equine disorders of sex development is ongoing, already encourages collaboration through which new cases will be identified, and additional samples procured. What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period we will accomplish the following: Short-read whole genome data analysis continues. We will use an expandedequine genome variant database to identify candidate variants in cases. The second step of sequence analysis uses a single, containerized pipeline (WAGS) developed by researchers in Minnesota (authored by Dr. Jonah Cullen) to standardize variant calling when aligning short read data to a reference genome. This pipeline drastically reduces resource usage and run time and is amenable to very large datasets. In summer of 2023 we began rolling out the pipeline at Texas A&M to adapt it for the use in horses. Dr. Cullen has implemented more than 25 fixes to WAGS to facilitate its use on the Texas A&M High Performance Computing Cluster, requisite to variant calling across more than 1100 horses that will be included in this variant database. Currently, variant calling is proceeding with datasets from approximately 1,100 horses (including our ~200 horse genomes). The results should provide unprecedented resolution regarding the DNA sequence variation across more than 100 horse breeds. Long-read whole genome sequence data analysis continues and will be combined with the analysis using the above-mentioned WAGS pipeline. Upon the completion of variant calling through the WAGS pipeline across 1100 horses of 100 breeds, we will explore the frequency of each tagging variants in the context of all sampled breeds. Present the findings at Plant and Animal Genome, PAG (annual, San Diego, CA), annual meetings of Texas Genetics Society and Texas Forum for Reproductive Sciences and invite collaborators from UC Davis (Prof. Alan Conley) to present the findings of endocrine profiling on a seminar of the Interdisciplinary Faculty of Reproductive Biology.

Impacts
What was accomplished under these goals? RESEARCH SAMPLE COLLECTION: Through our Karyotyping service and communication with veterinarians, breeders, and owners, we procured samples from 11 additional cases. These included four 64,XX intersex horses, one 64,XY SRY-positive female-like horse, and six 64,XX female horses with X-monosomy-like gonadal dysplasia. Objective #1: Genomic DNA has been isolated from 88 case horses: 31 64,XX females with X-monosomy-like phenotype, 41 64,XX intersex horses, and 16 64,XY SRY-positive female-like horses. Because short-read sequencing at Texas A&M core facilities became cheaper, Illumina short-read sequence data was generated for all 88 horses (initially, we proposed to sequence 30 horses, 10 per each phenotype group). All raw sequence data passed QC and was initially aligned to the current horse reference genome EquCab3. In the meanwhile, through collaborations in equine genomics community (Dr. Ted Kalbfleisch), we gained access to the most recent and most comprehensive, near-gapless telomere-to-telomere (T2T) horse reference genome (unpublished and not yet publicly available) and are currently aligning the 88 case genomes to the T2T genome together with 108 in-house generated non-case horse genomes of 21 horse breeds. We have a variant database (vcf file) for these 196 samples ready to be used for candidate variant discovery in cases during the next stage of the project. Also, we will expand the current equine variant database with over 600 whole genome short-read equine sequences available from NCBI. High Molecular Weight (HMW) genomic DNA has been isolated for 18 cases: six for each of the three phenotype groups. Long-read High-Fidelity PacBio sequences have been obtained for 16 cases and passed QC. We are waiting for the last two case long-read genomes. Processing long-read data is in progress and will be used for the discovery of CNVs and complex structural variants. Objective #2: Endocrine profiles have been obtained from blood serum of 64 horses: 22 intersex horses, eight 64,XY SRY-positive female-like horses, thirty 64,XX females with X-monosomy-like phenotype, and four horses with cytogenetically confirmed X-monosomy (as controls). We measured levels of the Anti-Müllerian hormone (AMH), Inhibin B, and Testosterone. The most informative data was obtained for AMH and testosterone and we have identified several individuals in the three phenotype groups with highly abnormal endocrine profiles in relation to their genetic sex and gonadal phenotype. With this, the goals of Objective #2 are accomplished. The endocrine data will be combined with genomics analysis to pinpoint candidate genes and genomic regions.

Publications

  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Brian W. Davis, Rytis Juras, Melanie Barbee, Austin Byrom, Terje Raudsepp. Exploring the genomic component of equine sex development and reproduction. 14th International Havemeyer Foundation Horse Genome Workshop, May 12-15, 2024, Caen, France (platform presentation).