Source: TEXAS A&M UNIVERSITY submitted to NRP
REFINING THE SEQUENCE AND ANNOTATION OF COMPLEX REGIONS IN THE HORSE SEX CHROMOSOMES TO ENHANCE KNOWLEDGE OF FUNCTIONALLY IMPORTANT GENES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1018825
Grant No.
2019-67015-29322
Cumulative Award Amt.
$416,000.00
Proposal No.
2018-06521
Multistate No.
(N/A)
Project Start Date
May 1, 2019
Project End Date
Apr 30, 2023
Grant Year
2019
Program Code
[A1201]- Animal Health and Production and Animal Products: Animal Breeding, Genetics, and Genomics
Recipient Organization
TEXAS A&M UNIVERSITY
750 AGRONOMY RD STE 2701
COLLEGE STATION,TX 77843-0001
Performing Department
Vet Integrative Biosciences
Non Technical Summary
Advancing knowledge of equine health, development and reproduction through genomic prediction and clinical diagnoses is highly dependent upon an accurate reference genome. Our goal is to improve the quality and completeness of the reference sequence of equine sex chromosomes (~8% of the genome), with a focus on their unique structural and functional features that cannot be resolved by current whole genome sequencing approaches. These specifically include structurally complex repetitive gene arrays on both sex chromosomes. Our rationale for undertaking this research is that these complex genomic regions comprise substantial real estate on the sex chromosomes, and harbor biologically important genes and regulatory elements that influence traits of economic interest, including reproduction, neurobiology, growth, and immunity. A high-quality genome assembly is also a prerequisite for more complete and accurate functional annotation of the equine genome (FAANG initiative). We will produce highly accurate sex chromosome sequences by utilizing a suite of cost-effective and complementary approaches, including cDNA selection and long-read sequencing of clone tiling paths. The proposed activities build and expand upon our productive prior and recently completed research on mammalian sex chromosomes. The project directly addresses Animal Health and Production and Animal Products priority area #1201 Tools and Resources by generating tools and experimental protocols which can be applied to advance basic biology and improve animal health and production.
Animal Health Component
0%
Research Effort Categories
Basic
100%
Applied
0%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30438101080100%
Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3810 - Horses, ponies, and mules;

Field Of Science
1080 - Genetics;
Goals / Objectives
Here we propose to comprehensively improve the sequence assembly and annotation of horse sex chromosomes in structurally complex and functionally essential regions. Our long-term goal is to advance knowledge regarding factors regulating equine biology, development and reproduction with the important application to improve the methods for genomic predictions and clinical diagnoses. This requires access to high quality and well-annotated sequence assembly for the horse genome that includes complex regions missing from the current version. Our immediate goal is to improve the quality and completeness of the reference sequence of equine sex chromosomes, with a particular focus on the structurally complex, testis-expressed amplicons in both sex chromosomes, where assembly and annotation cannot be resolved by conventional genomics approaches. The rationale for undertaking this research is that complex genomic regions, such as amplicons, carry biologically important functions and regulate traits of economic interest (e.g., reproduction, disease resistance),. At the same time, a complete and high-quality genome assembly is the primary resource needed for development of improved methods for precision breeding and clinical diagnostics, and is a prerequisite for Functional Annotation of Animal Genomes (FAANG) - an ongoing international initiative in all domestic species.We will achieve our goals through the following objectives:1. Discover, sequence and annotate ampliconic regions in the horse X chromosome.2. Refine the assembly of ampliconic regions in the horse Y chromosome.3. Annotate sex chromosomal amplicons for genes and copy numbers.The proposed activities build and expand upon our prior and recently completed research on horse and other mammalian sex chromosomes.
Project Methods
OBJECTIVE #1: Discover, sequence and annotate ampliconic regions in the horse X chromosome.X chromosome flow sorting: we will obtain 20,000 horse X chromosomes by flow sorting fibroblast metaphase chromosome suspensions on a dual laser cell sorter (FAC-Star Plus, Becton Dickinson).Direct cDNA selection: We will isolate total RNA from adult horse testis with RNeasy Mini kit (Qiagen). We will use NEXTflex™ Poly(A) Beads and NEXTflex™ Rapid Directional qRNA-Seq™ Kit (BIOO) to select mRNA, convert it into double-stranded cDNA, and ligate it with adaptors for PCR amplification and preparation Illumina sequencing libraries. We will amplify flow sorted X chromosome DNA with REPLIg (Qiagen) single cell whole genome amplification system. One microgram of amplified X chromosome DNA will be labeled with biotin using bio-nick translation kit (Roche). Adaptor-ligated testis cDNA will be pre-hybridized with horse Cot-1 DNA to block repetitive elements and hybridized for 40 hours with biotin-labeled X chromosome. We will capture X chromosome and testes cDNA hybrids with paramagnetic streptavidin coated beads (Dynabeads MyOne; Life Technologies), elute cDNA and amplify it by PCR using NEXTflex adaptor primers. The final cDNA will be sequenced on Illumina MiSeq platform (2 x 300 bp paired-end reads). The cDNA sequences will be assembled with the Trinity package, aligned with EquCab2/EquCab3 reference genomes with mimimap2 and analyzed by BLAST.Mapping and construction of BAC tiling paths: we will design PCR primers for ampliconic genes/transcripts using Primer3 and isolate corresponding BAC clones by screening by PCR superpools and plate pools of CHORI-241 library. If any putative ampliconic sequences align with the X reference sequence, the corresponding BAC IDs will be retrieved from NCBI Genome horse genome clone track. We will use the High Pure Plasmid Isolation kit (Roche Applied Science) for BAC DNA isolation. We will construct BAC tiling paths over individual ampliconic regions by BAC end sequence (BES) analysis and chromosome walking. The BACs representing tiling paths of individual amplicons will be FISH mapped metaphase spreads to determine their location in the X chromosome and confirm ampliconic nature.Sequencing and assembly of ampliconic sequences: We will sequence tiling path ampliconic BACs on Illumina MiSeq and PacBio SMRT Sequel™ platforms to combine high accuracy short reads with lower fidelity long read data. The latter are necessary to join shorter sequences into longer contigs and scaffolds. For each BAC, we will prepare paired-end barcoded libraries with 800-1000 bp inserts using TruSeq DNA PCR-Free Sample Preparation Kit (Illumina). The libraries of ~50 BACs will be pooled and sequenced on the Illumina MiSeq platform. We will also construct a 10-20 kb PacBio sequencing library by pooling DNA from the ~50 BACs in equimolar quantities, and sequence it in PacBio Sequel™ system. The initial sequence analysis from the two platforms will use proprietary software packages. Next, sequences from the two platforms will be individually assembled de novo and select parameters yielding the longest contigs/scaffolds. For Illumina assemblies we will use A5 Miseq software packages. The sequences obtained from PacBio Sequel™ platform will be processed and assembled using Celera fork Canu. A hybrid assembly from short-and long reads will be generated with MaSuRCA. Finally, we will attempt to incorporate ampliconic regions into the reference genome by aligning the assembled sequences with BWA against the X chromosome sequence in EquCab3.OBJECTIVE #2: Refine the assembly of ampliconic regions in the horse Y chromosome.BAC culturing, DNA isolation, QC on the Agilent Tape station, size selection > 15kb on a SageScience blue Pippin, preparation of PacBio sequencing libraries by pooling DNA of 38 BACs, and PacBio sequencing will follow the same protocols as described in details under Objective #1. In addition, we will sequence the 38 MSY BACs on the ONT platform. For each BAC, we will use 1 microgram of high molecular weight (>15 kb) and high quality DNA to prepare indexed sequencing libraries using components from the Genomic DNA Sequencing Kit (Oxford Nanopore Technologies) and the manufacturers detailed protocol. The libraries will be loaded together with Fuel Mix (Oxford Nanopore Technologies) onto ONT R9.4 Flow Cell and sequenced for 48 hours, reloading of the flow cell with the same library after 24 hours. We experience, and anticipate ~ 10-15 Gb of sequence data per run and plan to sequence the 38 MSY BACs in 3 ONT runs, providing over 2000X coverage per BAC. Sequence analysis of PacBio and ONT data will be done similarly to that described in Objective #1 (Celera Assembler Canu). We will reconstruct the eMSY ampliconic regions de novo, as well as align the reads to our eMSY reference.OBJECTIVE #3 Annotate sex chromosomal amplicons for genes and copy numbersTestis IsoSeq: We will generate long-read IsoSeq data for three male embryonic and three adult gonads. Adult testis will be used from our RNAlater-preserved -80°C samples. RNeasy Mini extraction kit (Qiagen), and DNA removed with RQ1 DNase (Promega M6101). One microgram of total RNA per sample will be reverse transcribed using the Clontech SMARTer cDNA synthesis kit and barcoded oligo dT (PacBio 16-mer barcodes) to generate barcoded full-length cDNA. Products will be purified with AMPure PB beads with a subsequent integrity check on the BioAnalyzer (Agilent). Equimolar ratios of the six cDNA libraries will be pooled. To avoid loading bias, which favors sequencing of shorter transcripts, size-fractionated libraries (<1 kb, 1-2 kb, 2-3 kb, 3-5 kb, and >5 kb) will be separated using the BluePippin (Sage Science). Following PCR cycle optimization, large-scale PCR will produce adequate cDNA for subsequent Iso-Seq SMRTBell library preparation. Four barcoded SMRTBell libraries (1-2 kb, 3-5, 4-6 and >5 kb) will be size-selected using the BluePippin (Sage Science) to remove small inserts. We will use a total of 1 microgram input DNA per sample across 5 SMRT cells to generate 125,000 -208,333 full-length transcripts per sample for a total of 750,000 -1,250,000 across all, per PacBio IsoSeq specifications. We will use the IsoCon algorithm for de novo transcriptome assembly. Annotation via transcript mapping to the whole genome with updated eMSY and X ampliconic regions will be done using mimimap2. Additional annotation with Maker will utilize short read data from SRA, including lincRNA from mature and immature testis, microRNA from testis and brain, and stranded RNAseq from brain and testis from 10 breeds.Copy number quantitation by Droplet Digital PCR (ddPCR): We will determine absolute copy numbers of MSY ampliconic genes by ddPCR using the QX200TM (Bio-Rad) platform. We will study 3 diverse breeds - Thoroughbred, Standardbred, and Icelandic, 30 individuals each. We will design primers for 10 MSY ampliconic genes to generate PCR products of 75-200 bp. Fluorescently labeled hydrolysis probes will be designed with IDT PrimerQuest tool. The ddPCR will include droplet generation, amplification by PCR on C1000Touch (Bio-Rad) platform and data analysis with QuantaSoft software as part of the ddPCR system. The PCR reactions will contain cleaved gDNA, primers specific for the query and control genes, VIC- or FAM-conjugated hydrolysis probes for the query and control genes, and QX200TM ddPCR Supermix for Probes (Bio-Rad). The results will be presented as number of copies per uL of the final 1 x ddPCR reaction. Significance of copy number differences between individuals and breeds will be analyzed by the parametric Student's t-test with a P-value cutoff at 5%. The P-values will be corrected and false discovery rates will be determined by Bonferroni and Benjamini-Hochberg multiple testing methods in R (Version 3.0.1; R Statistical Project).

Progress 05/01/19 to 04/30/23

Outputs
Target Audience:Graduate and undergraduate students through classroom and laboratory: the project involved 2 graduate level students and 3 undergraduate students; the PI and co-PIs are teaching undergraduate and graduate classes in biomedical genetics, cytogenetics, and genomics. International research community through various forums: (i) presentations at national and international conferences - total 12 conference abstracts/presentations; (ii) International research community through peer-reviewed publications: total 8 publications and 1 book chapter. Horse owners, breeders and veterinarians through clinical cytogenetics and animal genetics services, layman presentations/publications and direct communication. Research on horse sex chromosomes addresses genetic causes of stallion and mare subfertility/infertility and various disorders of sex development (DSDs) in horses. The PI has had 8 invited talks and webinars, one talk requested by Theriogenology Board study group. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project involved 2 PhD level graduate students, Caitlin Castaneda and Matthew Jevit. Both PhD students presented their results on multiple occasions on local, national, and international meetings and published in peer-reviewed international journals. The graduate students successfully defended their Dissertations and graduated in 2022. The project also involved 3 undergraduate students who shadowed the PhD students, learned wet lab skills and bioinformatics analysis of data. One of the undergraduate students, Oriana Garcia Ramos, supervised by the graduate student Matthew Jevit, completed her training with an Undergraduate Thesis. How have the results been disseminated to communities of interest?The results have been disseminated to communities of interest through conference abstracts and presentations (12), publications in peer-reviewed international journals (8), book chapters (1), invited talks and webinars (8), PhD Theses (2), and Undergraduate Thesis (1). What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Objective 1 Improved assembly and annotation of the horse X chromosome Improved horse X chromosome assembly. We essentially improved the assembly and annotation of the horse X chromosome. The final horse X chromosome, ECAnp4-X, was 143,200,399 bp exceeding the current reference EquCab3-X by almost 15 mega base-pairs (Mb). The main improvement was resolving complex sequences at the pseudoautosomal boundary (PAB) and incorporating multiple copies of the DXZ4 macrosatellite and Equine Testis Specific Transcript Y7 (ETSTY7) ampliconic array in the long arm (Xq). Demarcation of the horse pseudoautosomal region (PAR). Correction of an assembly error in the current reference EquCab3-X, allowed to precisely map horse PAB and determined the size of the horse PAR as 1.8 Mb. We show that the horse PAB spans a protein coding gene XKR3Y so that the first two exons of the gene are in the Y and the third exon is in the PAR. The gene has 2 promoters allowing independent transcription of exon3 in the PAR. XKR3Y is the first example of a mammalian PAB-spanning gene which is intact in the Y and truncated in the X - thus being differently expressed between sexes. The gene is a strong candidate for spermatogenesis and hybrid male sterility. Improved assembly of DXZ4 macrosatellite. DXZ4 plays a role in X chromosome inactivation and hybrid sterility. The sequence is collapsed and not annotated in the current reference EquCab3-X. In the improved X assembly, we identified 9 full copies and two partial copies of DXZ4 spanning ~71 kb. The largest and most abundant DXZ4 monomer was ~8 kb. The CTCF binding site, a characteristic functional feature of the DXZ4 repeat, is present three times throughout the DXZ4 sequence. The findings establish a foundation for the study of DXZ4 functions and structural and copy number variation. Improved assembly of Equine Testis Specific Transcript Y7 (ETSTY7). A total of 238 full and partial ETSTY7 copies were incorporated in the new horse X assembly ECAnp4-X. A novel finding was assigning a smaller number and more diverged ETSTY7 copies to three horse autosomes - chrs2, 26, and 31. The sequence originates from intestinal parasite, Parascaris, by horizontal transfer and has been massively amplified in both horse sex chromosomes, suggesting a role in sex-linked meiotic drive. Significant by-products were improved assemblies of all horse autosomes and substantially improved assembly of the donkey autosomal genome. In summary, we successfully accomplished and significantly exceeded the goals of Objective 1. Objective 2 Refined assembly of the ampliconic region of the male specific region of the horse Y chromosome (MSY). We re-sequenced 49 MSY ampliconic BAC clones and 3 flanking single-copy BACs (total 52 clones) on long-read Oxford Nanopore Technology (ONT). The obtained read length was from 10 Kb to over 100 Kb. The new assembly of the MSY ampliconic region was 1.53 Mb, thus considerably smaller than the ~ 4 Mb assembly of this region in the published Y chromosome reference eMSYv3. Gene copy number analysis suggested that the old assembly in eMSYv3 is over-assembled, and the new 1.53 Mb assembly is under-assembled. We concluded that more efforts are needed to produce a highly accurate assembly of horse MSY amplicons. This requires the use of combined strategies (PacBio Hi-Fi, ONT, Bionano) that have been successfully employed to produce T2T (telomere-to-telomere) assemblies for human chromosomes. This, however, remained beyond the scope and funding of this project. Filling gaps in the horse MSY assembly. Alignment of publicly available whole genome sequence (WGS) data of male horses with eMSYv3, identified 88 novel MSY contigs not present in eMSYv3 and resulted in identification of 15 new Y BAC clones. BAC end sequence data indicated that all 15 novel Y BACs localize in or around Gap1 and none to Gaps2 and 3. Long-read ONT sequencing of 10 of the 15 novel MSY BACs allowed closing the largest Gap1 (1 Mb) in horse MSY single copy region. Clone overlaps were validated by interphase and fiber FISH. Sequencing and FISH analysis of this newly added MSY region showed that the region contains transposed sequences from autosomes chr16 and chr18, illustrating the complexity of horse MSY sequence. In summary, the goals of Objective 2 were partially accomplished owing to the complexity of MSY ampliconic and single-copy sequences. Objective 3 Copy number analysis of MSY multicopy/ampliconic genes by droplet digital PCR (ddPCR). We designed and optimized ddPCR assays for 7 MSY multicopy genes: four amplified gametologs - TSPY, RBMY, HSFY, and UBA1Y, and three novel Y-born testis-specific transcripts - ETSTY1, ETSTY2, and ETSTY5. ddPCR assay was successfully designed for the single-copy gene SRY. Copy numbers of the 7 multicopy genes and SRY were determined in a multi-breed cohort of 209 normal male horses and compared with the CNs in eMSYv3. Five genes had almost half less copies by ddPCR compared to eMSYv3 and we conclude that eMSYv3 multicopy region is partially over-assembled. MSY gene CN analysis in a cohort of 209 normal male horses of 22 breeds showed statistically significant CN differences between breeds for ETSTY1, ETSTY2, RBMY, and TSPY and the single-copy SRY. The most CN variable was TSPY. SRY was a single-copy gene in most breeds and individuals but had 2 or 3 copies in 21 individuals from 4 indigenous breeds. Comparison of MSY CN data with MSY single nucleotide variation-based haplotypes showed no correlation between the two forms of variation. We investigated MSY gene CNs in 3 groups of abnormal male horses: (i) 24 American Quarter Horses with cryptorchidism (CO); (ii) 29 individuals from 7 breeds with various forms of XY disorders of sex development (DSD), including 12 XY SRY-negative females, and (iii) 14 male horses of 6 breeds with variable subfertility/infertility phenotypes. We observed significant CN variation of TSPY and ETSTY2 between cryptorchid and normal males, but no significant CN variation in subfertile/infertile males or horses with XY DSD, except XY SRY-negative females that had lost SRY and one copy of RBMY. Comparison of MSY gene CNs with MSY haplotypes in equine patrilines showed that while haplotypes are conserved between male generations, gene CNs are not. We also observed gene CN variation in genetically identical males produced by somatic cell nuclear transfer. In summary, we have successfully completed copy number analysis of those MSY multicopy genes for which ddPCR assay design was bioinformatically feasible. The study of the remaining genes depends on the improvement of the MSY ampliconic assembly. Improved annotation of sex chromosome genes by testis Iso-Seq. We generated high quality RNA Iso-Seq data for horse testis from four developmental stages: 9 months gestation, 10 months gestation, 2 years adult, and 3 years adult. Testis Iso-Seq data was successfully used to annotate, characterize isoforms, and determine direction of transcription of the XKR3Y gene that spans the PAB in the Y chromosome.

Publications

  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Castaneda C, Radovi? L, Felkel S, Juras R, Davis BW, Cothran EG, Wallner B, Raudsepp T. Copy number variation of horse Y chromosome genes in normal equine populations and in horses with abnormal sex development and subfertility: relationship of copy number variations with Y haplogroups. G3 (Bethesda). 2022 Dec 1;12(12): jkac278. doi: 10.1093/g3journal/jkac278. PMID:36227030.
  • Type: Journal Articles Status: Submitted Year Published: 2023 Citation: Matthew J. Jevit, Caitlin Castaneda, Nandina Paria, Pranab J. Das, Don Miller, Doug Antzack, Theodore S. Kalbfleisch, Brian W. Davis, and Terje Raudsepp. 2023. Trio-binning of a hinny refines the comparative organization of the horse and donkey X chromosomes and reveals novel species-specific features. Submitted to Scientific Reports, June 2023.
  • Type: Theses/Dissertations Status: Published Year Published: 2022 Citation: Castaneda, C. 2022. Dissecting Genomic Factors of Stallion Fertility. Ph.D. Thesis. Biomedical Sciences, Texas A&M University, College Station, pp. 230. Defence April 18, 2022.
  • Type: Theses/Dissertations Status: Published Year Published: 2022 Citation: Jevit, M.J. 2022. Dissecting the Most Complex Regions of the Mammalian Genome; The Sex Chromosomes. Ph.D. Thesis. Genetics, Texas A&M University, College Station, pp. 210. Defence August 22, 2022.
  • Type: Other Status: Published Year Published: 2022 Citation: Terje Raudsepp. Invited webinar for International Symposium on Equine Reproduction (ISER)  Global Education initiative HOW TO. Cytogenetic Testing in Stallion. July10, 2022; 36 minutes.
  • Type: Other Status: Published Year Published: 2022 Citation: Terje Raudsepp. Invited webinar for International Symposium on Equine Reproduction (ISER)  Global Education initiative HOW TO. Cytogenetic Testing in Mare. July 10, 2022; 35 minutes.
  • Type: Other Status: Published Year Published: 2022 Citation: Terje Raudsepp. Invited webinar for International Symposium on Equine Reproduction (ISER)  Global Education initiative HOW TO. Genomics of Equine Reproduction. July 10, 2022; 25 minutes.
  • Type: Theses/Dissertations Status: Published Year Published: 2021 Citation: 3. Oriana Garcia Ramos. 2021. Isolation of High Molecular Weight DNA from Stallion Sperm. GENE491W Undergraduate Thesis. Texas A&M University, College Station, pp 18.


Progress 05/01/21 to 04/30/22

Outputs
Target Audience:Graduate and undergraduate students through classroom and laboratory: the project involves 2 graduate level students and 3 undergraduate students; the PI and co-PIs are teaching undergraduate and graduate classes in biomedical genetics, cytogenetics and genomics. International research community through various forums: (i) presentations at national and international conferences - total 12 conference abstracts/presentations, 3 during this reporting period; (ii) International research community through peer-reviewed publications: total 5 publications and 1 book chapter, 2 during this reporting period. Horse owners, breeders and veterinarians through clinical cytogenetics and animal genetics services, layman presentations/publications and direct communication. Research on horse sex chromosomes addresses genetic causes of stallion and mare subfertility/infertility and various disorders of sex development (DSDs) in horses. 3 invited talks during this reporting period, one talk requested by Theriogenology Board study group. Changes/Problems:This 3-years project (original end date 04/30/2022) has been overall progressing as planned with regards the scientific contents of deliverables and we have accomplished most objectives. Though, not all. Like many other researchers, we experienced COVID-19 pandemic implemented delays in almost all technical aspects of the project in 2020-2021. These included delays of delivery by companies (e.g., copy number assays) and service core labs (next generation sequencing, digital droplet PCR services), delays of sample collection and delays of timely acquisition of ordered lab supplies (i.e., pipet tips, cell culture flasks and media). The delays were most pronounced during the university closedown in 2020, though backorders for certain categories of supplies continue till present, mainly because of increased demand for cell culture supplies. In addition, during the closedown, the two graduate students assigned to this project, Caitlin Castaneda and Matt Jevit, had to stay away from the lab and this slowed the progress of wet lab experiments. Delays in lab work continued because, occasionally, the students or PIs test positive for COVID-19 and had to quarantine. Delayed experiments also delay acquisition of results and data analysis. Therefore, we request a 12-months non-cost extension for the project which was approved and the new end date of the project will be 04/30/2023. What opportunities for training and professional development has the project provided?The project involves 2 PhD level graduate students and 3 undergraduate students. Both PhD students have presented the results on local and international platforms and in peer-reviewed publications. Products generated during this project form an essential part of the PhD Dissertations of both students. The two graduate students will graduate in 2022 or latest, early 2023. How have the results been disseminated to communities of interest?The results have been disseminated to communities of interest through conference abstracts and presentations (12), publications in peer-reviewed international journals (5), book chapters (1), invited talks (5), and PhD Theses (2). What do you plan to do during the next reporting period to accomplish the goals?Due to COVID-19 implemented delays in multiple technical aspects of the project, we requested and got approved a 12 months non-cost extension until 04/30/2023. The extra time will be used for data analysis (testis Iso-seq data for annotations and isoform discovery) and to finalize the two manuscripts in preparation.

Impacts
What was accomplished under these goals? Objective 1 Improvement of the horse X chromosome assembly and annotation is completed. Approach. The approach with best results for set goals was trio-binning where we generated long-read (PacBio Sequel II) whole genome sequence of a female hinny - an F1 interspecific hybrid of a male horse and female donkey. Sequences were assembled with the trio-binning function of CANU. Horse sequences were scaffolded using equine Hi-C contact map and Bionano optical map data (generated by us for 2 Thoroughbreds during this project). Final polishing of the chromosome-level assembly was done using short-read (Illumina NovaSeq as 2x150 bp reads; 30 x genome coverage) of parental (horse and donkey) genomes. The final polished assembly of the horse was designated as ECAnp4. Improvement of horse X chromosome assembly. Trio-binning essentially improved the assembly and annotation of the horse X chromosome. The final polished assembly of the horse X chromosome, ECAnp4-X, comprised of 5 scaffolds and was 143,200,399 bp in size, which is an over 15 megabase-pair (Mb) improvement over the current horse reference EquCab3-X. The main improvements of the horse X chromosome involved the pseudoautosomal region and ampliconic sequences DXZ4 and ETSTY7. Improvement of PAR. We detected that PAR was mis-assembled in the current EquCab3-X. Thanks to the correction in the new assembly, we determined the horse PAR size as 1.8 Mb. We show that the horse pseudoautosomal boundary (PAB) spans a protein coding gene XKR3Y so that the first two exons of the gene are in the Y and the third exon is in the PAR (XKR3XY). The gene has 2 promoters allwing exon 3 transcription from the X chromosome. XKR3Y/XKR3XY is the first example where a gene spans the PAB in mammalian Y chromosome and is truncated in the X chromosome. Improvement of the assembly of X-linked ampliconic sequences. Macrosatellite DXZ4 is of interest because of likely involvement in X chromosome inactivation and hybrid sterility. In the improved X assembly, we identified 9 full copies and 2 partial copies of DXZ4 in Xq between the genes PLS3 and AGTR2. The DXZ4 sequence has been collapsed in the current reference EquCab3-X. Equine Testis Specific Transcript in Y7, ETSTY7. In the trio-binning assembly, we identified three X chromosome scaffolds containing a total of 238 full and partial ETSTY7 copies. In addition, we identified 6 ETSTY7 copies in chr2, 26 copies in chr26 and 14 copies in chr31. These all represent novel findings and additions to the horse X chromosome and autosomes. Significant side-products Improved horse genome assembly. Substantially improved donkey genome including the X chromosome. In summary, we successfully accomplished and significantly exceeded the goals of Objective 1. Objective 2 A. Refinement of the assembly of the ampliconic region of the horse Y chromosome is, for now, completed and summarized as follows: We used long-read Flongle Oxford Nanopore technology (ONT) and high molecular weight DNA (from 10 Kbp to over 100 Kbp) to re-sequence 49 MSY ampliconic BAC clones and 3 flanking single-copy BACs (total 52 BACs). The new assembly of the MSY ampliconic region is 1.53 Mb, thus considerably smaller than the ~ 4 Mb assembly of the ampliconic region in eMSYv3. Gene copy number analysis suggests that the old assembly in eMSYv3 is over-assembled and the new 1.53 Mb assembly is under-assembled. We conclude that more efforts are needed to produce a highly accurate assembly of horse MSY amplicons. This requires the use of combined strategies (PacBio Hi-Fi, ONT, Bionano) that have been successfully employed to produce T2T (telomere-to-telomere) assemblies for human chromosomes. This, however, remained beyond the scope and funding of this project. B. Filling gaps in the horse MSY assembly is, for now, completed. Identification of new Y BAC clones. Alignment of publicly available whole genome sequence (WGS) data of male horses with eMSYv3 identified 88 novel MSY contigs not present in eMSYv3 and resulted in identification of 15 new Y BAC clones. Y Gaps and new BACs. BAC end sequence data indicated that all 15 novel Y BACs localize in or around Gap1 and none to Gaps 2 and 3. Closing Gap1. We sequenced on ONT Flongle 10 of the 15 novel BACs and successfully closed the largest Gap1 (1 Mb) in horse MSY single copy region. Clone overlaps were validated by interphase and fiber FISH. Sequencing and FISH analysis of this newly added MSY region showed that the region contains transposed sequences from autosomes chr16 and chr18, illustrating the complexity of horse MSY sequence. In summary, the goals of Objective 2 were accomplished only partially owing to the complexity of MSY ampliconic and single-copy sequences. Objective 3 A. Copy number analysis of MSY multicopy/ampliconic genes is completed and summarized as follows: Droplet digital PCR (ddPCR) assays for MSY multicopy genes. We designed and optimized ddPCR assays for 7 MSY multicopy genes: four amplified gametologs - TSPY, RBMY, HSFY, and UBA1Y, and three novel Y-born testis-specific transcripts - ETSTY1, ETSTY2, and ETSTY5. ddPCR assay was successfully designed for the single-copy gene SRY. Validation eMSYv3 assembly by CN analysis. Copy numbers of the 7 multicopy genes and SRY were determined in a multi-breed cohort of 209 normal male horses and compared with the CNs in eMSYv3. Five genes had almost half less copies by ddPCR compared to eMSYv3 and we conclude that eMSYv3 multicopy region is partially over-assembled. MSY CN analysis across horse breeds. We analyzed MSY CNs in a cohort of 209 normal male horses of 22 breeds. Statistically significant CN differences between breeds were found for ETSTY1, ETSTY2, RBMY, and TSPY and the single-copy SRY. The most CN variable was TSPY. SRY was a single-copy gene in most breeds and individuals but had 2 or 3 copies in 21 individuals from 4 indigenous breed. SRY and RBMY CNs were interdependent. MSY CN variation vs. single nucleotide variation. Comparison of MSY CN data with MSY single nucleotide variation-based haplotypes showed no correlation between the two forms of variation. MSY CN analysis in abnormal males. We investigated MSY gene CNs in 3 groups of abnormal male horses: (i) 24 American Quarter Horses with cryptorchidism (CO); (ii) 29 individuals from 7 breeds with various forms of XY disorders of sex development (DSD), including 12 XY SRY-negative females, and (iii) 14 male horses of 6 breeds with variable subfertility/infertility phenotypes. We observed significant CN variation of TSPY and ETSTY2 between cryptorchid and normal males, but no significant CN variation in subfertile/infertile males or horses with XY DSD, except XY SRY-negative females that had lost SRY and one copy of RBMY. MSY CN analysis in related males. We compared MSY gene CNs and MSY haplotypes in equine patrilines and observed that while haplotypes are conserved, gene CNs vary between male generations. We also observed gene CN variation in genetically identical males produced by somatic cell nuclear transfer. In summary, we have successfully completed copy number analysis of those MSY multicopy genes for which ddPCR assay design was bioinformatically feasible. The study of the remaining genes depends on the improvement of the MSY ampliconic assembly. B. Improving annotation of sex chromosome ampliconic genes by testis Iso-Seq (ongoing) Testis RNA Iso-seq. We have generated and quality checked RNA Iso-Seq data on PacBio Sequel II platform for 4 testes samples. Gene annotation. Testis Iso-Seq data was successfully used to characterize isoforms and direction of transcription of the XKR3Y/XKR3XY gene that spans the horse pseudoautosomal boundary. To do. During the non-cost extension of this project, testis Iso-Seq data will be used to annotate newly assembled X and Y ampliconic sequences and horse MSY single-copy genes.

Publications

  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Caitlin Castaneda, Agustin J. Ruiz, Ahmed Tibary, Terje Raudsepp. 2021. Molecular cytogenetic and Y copy number analysis of a reciprocal ECAY-ECA13 translocation in a stallion with complete meiotic arrest. Genes (Basel, IF 4.096), 12, 1892, PMID: 34946841, PMCID: PMC8701272, DOI: 10.3390/genes12121892 (Epub November 26, 2021).
  • Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Caitlin Castaneda, Andrew Hillhouse, Sabine Felkel, Barbara Wallner, Terje Raudsepp. 2019. Equine Y chromosome research post sequencing. Plant & Animal Genome XXVII, January 12-17, San Diego, CA, USA. Workshop presentation and poster.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: C Castaneda, A Hillhouse, B. W. Davis, R Juras, A. Ruiz, A Tibary, T Raudsepp. 2020. Insights to the Y chromosome components of stallion fertility. 26th Annual Meeting of Texas Forum for Reproductive Sciences (TFRS), April 16-17, 2020, College Station.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Caitlin Castaneda, Brian W. Davis, Andrew Hillhouse and Terje Raudsepp.. 2022. Ongoing Improvements to the Horse Y Chromosome. Workshop presentation and Poster PO0497. XXIX PAG, January 8-12. 2022, San Diego, CA, Online.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Matthew Jevit, Brian W. Davis, Donald Miller, Douglas Antczak, Ted Kalbfleisch and Terje Raudsepp. 2022. Trio-Binning of Horse-Donkey F1 Hybrid Improves Horse and Donkey Reference Genomes. Workshop presentation and Poster PE0498. XXIX PAG, January 8-12. 2022, San Diego, CA, Online.
  • Type: Journal Articles Status: Submitted Year Published: 2022 Citation: Caitlin Castaneda, Lara Radovic, Sabine Felkel, Rytis Juras, Barbara Wallner, E. Gus Cothran, Terje Raudsepp. Copy number variation (CNV) of horse Y chromosome genes in normal equine populations and in horses with abnormal sex development and subfertility; Relationship of CNVs with Y haplogroups. Submitted to G3.
  • Type: Other Status: Other Year Published: 2019 Citation: Terje Raudsepp. Texas A&M, College of Veterinary Medicine, VIBS Bioscience and Genomics Seminar Series. Invited talk The X Chromosome: Old Tricks, New Insights. Invited talk 2019 September 18.
  • Type: Other Status: Other Year Published: 2020 Citation: Terje Raudsepp. Agricultural University of Peru La Molina. Invited talk Genomics of equine disorders of sex development and reproduction. Invited talk 2020 January 23
  • Type: Other Status: Other Year Published: 2021 Citation: Terje Raudsepp. Invited video presentation From Karyotyping to Whole Genome Sequencing; CRU (Centre for Reproductive Biology in Uppsala) websymposium Genetics of Reproduction (https://www.slu.se/en/ew-calendar/2021/9/cru-websymposium-genetics-of-reproduction/); Invited talk 2021 September 7, Uppsala, Sweden.
  • Type: Other Status: Other Year Published: 2022 Citation: Terje Raudsepp. Invited presentation Cytogenetics and Reproductive Anomalies to Theriogenology Board study group. Webinar 2022 January 20.
  • Type: Other Status: Other Year Published: 2022 Citation: Terje Raudsepp. Invited presentation Refining the sequence and annotation of complex regions in the horse sex chromosomes to enhance knowledge of functionally important genes at USDA-AFRI Annual Project Director Meeting (Zoom virtual meeting) 2022, March 9.


Progress 05/01/20 to 04/30/21

Outputs
Target Audience:Target Audience Graduate and undergraduate students through classroom and laboratory: the project involves 2 graduate level students and one undergraduate student; the PI and co-PIs are teaching undergraduate and graduate classes in biomedical genetics, cytogenetics and genomics. International research community through various forums: (i) during this reporting period presentations at national and international conferences have been limited due to the COVID-19 situation. Nevertheless, there is one graduate student abstract submitted and presentation planned for The Equine Science Society in June 2021; (ii) International research community through peer-reviewed publications: 2 publications and one book chapter published during this reporting period (in addition to 2 publications during the previous reporting period); (iii) Horse owners, breeders and veterinarians through clinical cytogenetics services, layman presentations/publications and direct communication. Research on horse sex chromosomes addresses genetic causes of stallion and mare subfertility/infertility and various disorders of sex development (DSDs) in horses. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project involves 2 PhD level graduate students and 1 undergraduate student. Both PhD students have presented the results on local and international platforms and in peer-reviewed publications. Unfortunately, due to COVID-19 situation, student presentations on all platforms have been reduced during this reporting period. How have the results been disseminated to communities of interest?The results have been disseminated to communities of interest through conference abstracts and presentations (1), publications in peer-reviewed international journals (2), and book chapters (1). What do you plan to do during the next reporting period to accomplish the goals?Objective 1 Further analysis of the Bionano optical mapping data for scaffolded X assembly. Continue systematic bioinformatic comparison of the improved X assembly with EquCab3 X assembly to reveal assembly errors and missing regions. Objective 2 Complete sequence assembly of GAP1 and continue search for BAC clones located in GAPs 2 and 3. Assemble the re-sequenced ampliconic region of MSY. Objective 3 Complete MSY copy number analysis in the cohort of 84 reproductively and/or developmentally abnormal male horses. Annotate the improved X and MSY assemblies using testis Iso-Seq data.

Impacts
What was accomplished under these goals? Objective 1 A. X chromosome transcript capture from testis and sequencing is completed B. X chromosome transcriptome assembly and analysis (ongoing) A total of 189,723 transcripts were assembled and annotated, including identification of functional domains and protein family information. We identified 646 transcripts that were not in the reference genome and analyzed those by BLAST. X-captured raw reads and raw reads from two additional RNA seq libraries were aligned to EquCab3 with HiSat2. RNAseq library reads were quantified per chromosome using featureCounts. The number of reads for each chromosome was determined, as well as the average number of reads for all autosomes for each library The ratio of X mapped reads to the average amount of autosomal mapped reads was 1.11 for the X-captured library. The ratio of X mapped reads to the average amount of autosomal mapped reads was 0.46 and 0.41 respectively for the 2 other libraries. This indicates the successful capture of X-specific reads. C. Generation of long-read sequence data for the X chromosome (ongoing) To improve the horse X chromosome assembly, we utilized trio-binning, which uses long read sequences from F1 interspecific hybrids and short reads from parent species. Trio-binning resulted in 2,527,288,541 bp assembly in 1,757 contigs. The minimum contig length was 1,115bp, the maximum contig length was 93,345,358 bp. The N50 was 41,516,585 bp and the L50 was 23 contigs. Contigs were aligned to EquCab3 with HiStat2. A total of 67 contigs were determined to have homology with the X chromosome. 26 contigs were confirmed to be the X chromosome and span complete X chromosome sequence from EquCab3. Chromosomal rearrangements were identified by analysis of dot-plots. Presently, 118 structural rearrangements have been identified in 25 of the 26 contigs analyzed. Most indicate errors in EquCab3 X chromosome assembly. 5 contigs show wide-spread duplications and may be indicative of collapsed ampliconic sequences in EquCab3. A spurious duplication and inversion at the pseudoautosomal boundary in EquCab3 has been resolved and corrected. Objectives 1 and 2 We generated optical maps from high molecular weight DNA of two male horses - the DNA donor of the current MSY assembly, the Thoroughbred named Bravo, and Quarter Horse (Valentine). Initial hybrid assembly using optical map of one of the male horses to scaffold the hinny trioCANU assembly has been completed. The scaffolded assembly is 2.68 Gb in 336 scaffolds; N50 is 49. Objective 2 A. Filling gaps in horse MSY assembly Generated Y chromosome sequence data from whole genome sequences of male horses and identified 148 informative contigs which are absent in the published Y assembly. Designed 278 primer pairs spanning the 148 informative contigs. Primer pairs were screened against the CHORI-241 equine BAC library pools to identify the corresponding BAC clones. To date, 102 contigs have been paced: 34 contigs in BACs within the published Y assembly; 68 contigs in 16 new Y BACs;46 contigs under screening. Y origin of 16 new BACs was confirmed by FISH; STS content analysis byPCR locates the new BACs in GAP1 spanning 1 Mb. We sequenced a tiling path of 11 BACs in GAP1 on Oxford Nanopore MinIon Flongle platform. Data analysis and BAC assembly is currently ongoing. We found that a portion of GAP1 BACs FISH map in chr18 (chr18:13587221-13894241) indication a novel autosomal transposed region in Y. B. Re-sequencing MSY ampliconic region We have sequenced using the long-read MinIon Flongle protocol 47 multicopy/ampliconic BACs, 2 BACs from the heterochromatic region, and 3 BACs immediately distal to the multicopy region. For this, 4 BACs were pooled and barcoded for each Flongle cell, and generated over 100x coverage per BAC in each flongle run. Individual BAC raw sequences were cleaned from E. coli and plasmid vector sequences using the bbduk function of the BBTools software. Once filtered, BACs were de novo assembled on an individual level using Canu. For the 47 ampliconic BACs and 3 BACs located just distal to this region in contig I, individual de novo assembled BACs were put through a multiple alignment software (MAFFT) and a consensus sequence was generated using the corresponding software lamassemble. Once a consensus sequence was generated, individual de novo assembled BAC sequences were compared to the consensus sequence using the megablast function in BLASTn. The BAC assembly is ongoing. The data has been iterated 4 times in attempt to generate larger, contiguous sequences which contain entire BACs. To verify and create the highest quality of the consensus sequences, the individual de novo assembled BACs have been repeat masked using RepeatMasker and will go through the same pipeline as mentioned above. To increase quality of consensus sequences, another form of multiple aligner, MAUVE, will be used. Objective 3 A. Improving annotation of sex chromosome ampliconic genes by testis Iso-Seq (ongoing) We havegenerated and quality checked RNA Iso-Seq data on PacBio Sequel II platform for 4 testes samples. Testis Iso-Seq data will be used to re-annotate the improved X (Objective 1) and MSY (Objective 2) sequences. B. Copy number analysis (completed for 8 genes and general horse populations) DNA samples have been obtained from 301 male horses, of which 217 are reproductively normal and 84 present a broad spectrum of disorders of sex development (DSDs) and reproduction. We have completed copy number (CN) assays for 8 MSY multicopy genes. Optimization required that females do not amplify during the ddPCR experiments and confirmed the CNs observed in males are consistent between individuals and experiments. We have completed CN analysis of these 8 genes in a large multi-breed cohort of 217 horses of 22 breeds, and some wild equids. Of the genes analyzed, TSPY shows the largest range of CN variation between breed groups; CN variation for RBMY and SRY is seen in several indigenous horse breeds and wild equids. Interestingly, wild equids show lower CNs than domestic horses for ETSTY1, ETSTY2, and RBMY but higher CNs than domestic horses for HSFX/HSFY. MSY CN analysis in general population is complete and we are finalizing the data for natural Y CN variation in horses and its relation to breeds and Y haplogroups. MSY CN analysis has provided novel information and revealed CN variation patterns of individuals genes, as well as different breeds. A manuscript is in preparation. Importantly, CN analysis has pinpointed 3 most CN variable regions in the horse Y chromosome: i) around the SRY and RBMY genes, associated with disorders of sex development (DSDs); ii) region involving testis specific transcripts ETSTY1, ETSTY2 and ETSTY5, with significant variation of ETSTY2 in cryptorchid horses and subfertile stallions, and iii) the TSPY gene region, which is the most CN variable gene; TSPY CN is significantly decreased in some cryptorchid horses and in one subfertile stallion. In addition, preliminary analysis of HSFY/HSFX gametologs indicates that there is more CN variation of the X-linked HSFX than the Y-linked HSFY, but more research is needed. In order to associate CN variation with stallion fertility, we have conducted partial CN analysis in a cohort of 84 male horses with reproductive problems and abnormal sex development. For a few genes, we observe significant CN differences in some cryptorchid stallions and one infertile stallion, compared to fertile controls, though these differences are not consistent across individual cases. This part of the research is ongoing. Despite of devoted time, efforts and funds, we were not been able to design CN evaluation assays for one Y multicopy gene (CUL4BY) and 4 transcripts (ETY1, ETY4, ETSTY7 and YIR2). For now, the work on additional design and optimization is on pause until we have a better sequence assembly for the ampliconic region of the Y (Objective 2).

Publications

  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2021 Citation: 8. Matthew J. Jevit, Brian W. Davis, Andrew Hillhouse, Caitlin Casta�eda, Kevin Bredemeyer, William J. Murphy, Rytis Juras, Donald Miller, Terje Raudsepp. 2021. Genomic improvement of the horse X chromosome and characterization of the pseudoautosomal boundary. Abstract for the 27th The Equine Science Society (ESS) Symposium, June 1-4, 2021, virtual.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: 3. Mendoza MN, Schalnus SA, Thomson B, Bellone RR, Juras R, Raudsepp T. 2020. Novel complex unbalanced dicentric X-autosome rearrangement in a Thoroughbred mare with a mild effect on the phenotype. Cytogenet Genome Res (Published online: November 5, 2020); DOI: 10.1159/000511236; PMID: 33152736.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: 4. Bugno-Poniewierska, M. and Raudsepp, T. 2021. Horse Clinical Cytogenetics: recurrent Themes and Novel Findings. Animals 2021, 11, 831. https://doi.org/10.3390/ani11030831; (published online March 16, 2021).
  • Type: Book Chapters Status: Published Year Published: 2020 Citation: 1. Raudsepp, T. (2020). In: Equine Genetic Diseases (Carrie Finno, Stephen Coleman, Eds.). Chapter 15: Genetics of Reproductive Diseases. Elsevier; Vet Clin North Am Equine Pract. 2020 Aug;36(2):395-409. doi: 10.1016/j.cveq.2020.03.013. Epub 2020 Jun 10.


Progress 05/01/19 to 04/30/20

Outputs
Target Audience:Graduate and undergraduate students through classroom and laboratory: the project involves 2 graduate level students; the PI and co-PIs are teaching undergraduate and graduate classes on biomedical genetics and cytogenetics. International research community through various forums: during this reporting period students and PIs have had presentations resulting from this project at Texas Forum For Reproductive Sciences,Texas Genetics Society, International Society for Animal Genetics, Plant and Animal Genome XXVIII and Havemeyer Equine Genomics Workshop. Total 7 presentations. International research community through peer-reviewed publications: 2 publications and 1 accepted book chapter during this reporting period. Horse owners, breeders and veterinarians through clinical cytogenetics services, layman presentations/publications and direct communication. Research on horse sex chromosomes addresses genetic causes of stallion and mare subfertility/infertility and various disorders of sex development (DSDs) in horses. Changes/Problems:There are no specific problems to mention, except for the current COVID-19 situation, which affects research globally and may cause delays for this project as well. What opportunities for training and professional development has the project provided?The project involves 2 PhD level graduate students and 1 undergraduate student. Both PhD students have presented the results on local and international platforms and in peer-reviewed publications. The involvement of undergraduate students is expected to increase next year. How have the results been disseminated to communities of interest?The results have been disseminated to communities of interest through conference abstracts and presentations (7), invited talks (2), publications in peer-reviewed international journals (2), and book chapters (1). Conference presentations (2019-2020): Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner, Terje Raudsepp. 2019. Genomic studies of stallion fertility: comparing fertility records with known and putative stallion fertility genes. 46th Annual meeting of Texas Genetics Society, College Station April 4-6, 2019. Abstract & Poster presentation. Alyssa Dubrow, Josefina Kjollerstrom, Caitlin Castaneda, Matt Jevit, Rytis Juras, Terje Raudsepp. 2019. New insights into X-monosomy in the horse. 46th Annual meeting of Texas Genetics Society, College Station April 4-6, 2019. Abstract & Platform presentation. Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner, Terje Raudsepp. 2019. Comparing stallion fertility records with FKBP6 genotype and copy numbers of Y ampliconic genes. Texas Forum For Reproductive Sciences, 25th Annual Meeting, College Station, April 11-12, 2019. Abstract & Platform presentation. Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner and Terje Raudsepp. 2019. Genomic studies of stallion fertility: comparing fertility records with FKBP6 genotype and copy numbers of Y multi-copy genes. 37th International Society for Animal Genetics Conference, July 7-12, 2019 Lleida, Spain. Abstract & Platform presentation. Terje Raudsepp, Caitlin Castaneda, Andrew Hillhouse, Alyssa Dubrow, Matt Jevit, Rebecca Bellone, Rytis Juras, Brian W. Davis. 2019. The horse X chromosome: old tricks, new insights. 37th International Society for Animal Genetics Conference, July 7-12, 2019 Lleida, Spain. Abstract & Platform presentation. Caitlin Castaneda, Andrew Hillhouse, Sabine Felkel, Barbara Wallner, Terje Raudsepp. (2020) Equine Y chromosome Variability. Plant & Animal Genome XXVIII, January 11-15, San Diego, CA, USA. Abstract & Equine Workshop platform presentation. Matthew J. Jevit, Brian W. Davis, Andrew Hillhouse, Caitlin Castañeda, Kevin Bredemeyer, William J. Murphy, Rytis Juras, Malcolm Ferguson-Smith, Donald Miller, Terje Raudsepp. 2020. Refining the sequence assembly of complex regions in the horse sex chromosomes. 13th Dorothy Russell Havemeyer International Horse Genome Workshop, July 2020, Ithaca, NY, USA. Graduate students in bold; undergraduate student - underlined Invited talks (2019-2020) Terje Raudsepp. 2019 September 18. VIBS Bioscience and Genomics Seminar Series. Invited talk "The X Chromosome: Old Tricks, New Insights". Terje Raudsepp. 2020 January 23. Agricultural University of Peru La Molina. Invited talk "Genomics of equine disorders of sex development and reproduction". Journal Articles Published (2019-2020) with NIFA support acknowledged Ruiz A, Castaneda C, Raudsepp T, Tibary A. 2019. Azoospermia and Y chromosome-autosome translocation in a Friesian stallion. Journal of Equine Veterinary Science 2019 Nov;82:102781. doi: 10.1016/j.jevs.2019.07.002. Epub 2019 Jul 11. Raudsepp, T., Finno, C., Bellone, R., Petersen, J. 2019. Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Animal Genetics, invited Review, 2019 Dec;50(6):569-597. doi: 10.1111/age.12857. Epub 2019 Sep 30. Graduate students in bold Book Chapters Raudsepp, T. (2020). In: Equine Genetic Diseases (Carrie Finno, Stephen Coleman, Eds.). Chapter 15: Genetics of Reproductive Diseases. Elsevier (accepted). What do you plan to do during the next reporting period to accomplish the goals?Objective 1 Complete the analysis of X chromosome transcripts that missing from the EquCab3 and MSY assemblies, and identify those of interest as potentially ampliconic. Identify BAC clones from the CHORI-241 library corresponding to the transcripts, validate their location in the X chromosome by FISH and prepare DNA for sequencing. Assemble de novo the horse X chromosome from trio-binning sequence data and align with the reference to find differences. Proceed with the analysis of Bionano optical mapping data for the X. Objective 2 Proceed with the analysis of Bionano optical mapping data for the Y. Continue filling gaps in the current MSY assembly by analyzing large Y-specific contigs present in individual male horses but not in the assembly: isolation and sequencing of specific BAC clones. Objective 3 Complete design and optimization of copy number detection assays for all MSY ampliconic genes. Continue copy number analysis of MSY ampliconic genes in individual horses of diverse breeds. Start Iso-Seq data analysis

Impacts
What was accomplished under these goals? IMPACT: the research project will improve the quality and completeness of the reference sequence of equine sex chromosomes, the X and the Y. Our particular focus is on structurally complex, testis-expressed amplicons in both sex chromosomes, where assembly and annotation cannot be resolved by conventional genomics approaches. Such regions, however, carry biologically important functions and regulate traits of economic interest such as reproduction, sex development, and disease resistance. Improved knowledge of such complex but functionally important regions is an important prerequisite for the development of improved methods for clinical diagnostics and precision breeding. Objective 1. Discover, sequence and annotate ampliconic regions in the horse X chromosome. A. X chromosome transcript capture from testis and sequencing (completed) Fibroblast cell lines were established from a female horse and used to flow-sort 20,000 horse X chromosomes. Flow-sorted X chromosome DNA was amplified with Repli-g Whole Genome Amplification kit (Qiagen) and 33 micrograms of WGA amplified X chromosome DNA was obtained. DNA size was ~ 19 kilo base-pairs (kb) as evaluated with Agilent Tape Station. Amplified flow-sorted X chromosome material was labeled with biotin by nick-translation and tested for X chromosome origin by fluorescence in situ hybridization (FISH) to horse metaphase chromosomes. Clear and consistent painting-like hybridization signal was obtained all over the X chromosome. The biotin-labeled X chromosome-specific DNA was used as a probe to capture X-specific transcripts from horse testis cDNA library. Briefly: total RNA was extracted from horse testes; mRNA was isolated from total RNA with poly-A magnetic beads (BIOO); cDNA and cDNA libraries were produced with Rapid Directional qRNA Seq library kit (BIOO). The biotin-labeled X chromosome DNA was hybridized with testis cDNA library. X-specific testis cDNA libraries were separated with streptavidin-linked magnetic beads and sequenced as 2x150bp reads on an Illumina HiSeq at Texas Institute of Genomic Science and Society core lab. Sequence quality control (QC) was done with FastQC where an average of Q30 was chosen as the cut off before moving on with the assembly. B. X chromosome transcriptome assembly and analysis (ongoing) Transcripts were assembled de novo with Trinity using default parameters. A total of 189,723 (77,230 unique) transcripts were assembled with a size range from 201 bp to 9,212 bp and an average of 417 bp. In order to identify transcripts that are present in the de novo assembly but missing from the horse genome EquCab3, concatenated EquCab3 with MSY assembly made it into a BLAST database. BLAST analysis of the de novo assembly against EquCab3/MSY identified 646 transcripts (size range from 201 bp to 2160 bp, average 275 bp) that were not in the reference genome. Detailed analysis of these transcripts is ongoing. C. Generation of long-read sequence data for the X chromosome (ongoing) For further improving the horse X chromosome assembly, we utilized a relatively novel approach - trio binning, which uses long read sequences from F1 interspecific hybrids and short reads from parent species. We extracted high molecular weight blood DNA from a female hinny (male horse x female donkey F1 hybrid) using Qiagen MagAttract HMW DNA kit. The DNA was tested on pulse-field electrophoresis showing that large portion of it was larger than 100 kb. Hinny DNA was sequenced on 2 PacBio Sequel cells resulting in high quality long-read data. The long-read PacBio data from the hinny and short-read Illumina HiSeq2000 data of a horse and a donkey are currently assembled with tri-binning function of Canu software package. Objectives 1 and 2: Refine the assembly of ampliconic regions in the horse Y and X chromosomes. Optical mapping (ongoing) To simultaneously improve the assemblies of both horse sex chromosomes, we are utilizing a novel cutting-edge optical mapping technology - the Saphyr® system by Bionano Genomics. Genomic assemblies tend to break contigs at repetitive elements, but optical mapping scaffolds can span entire fragments of DNA and have the potential to span these large repetitive elements. This allows a hybrid assembly to have greater power to correctly assemble and span repetitive elements. We believe that these approaches combined will resolve complex and repetitive portions of the horse sex chromosomes. We generated optical maps from high molecular weight DNA of two male horses - the DNA donor of the current MSY assembly, the Thoroughbred named Bravo, and Quarter Horse (Valentine). The X: using Bionano Access software, we are in the process of creating hybrid scaffolds between optical maps and haplotype refined trio-assembly for the horse X chromosome The Y: using Bionano Access software, we will create hybrid scaffolds between optical maps and the MSY reference. Objective 2: Refine the assembly of ampliconic regions in the horse Y chromosome Generated Y chromosome sequence data from whole genome sequences of individual male horses and identified 14 large and multiple smaller contigs that were not present in the MSY reference assembly. Designed PCR primers for the 14 largest contigs missing from MSY assembly to identify corresponding BACs and start filling gaps. Objective 3: Annotate sex chromosomal amplicons for genes and copy numbers. A. Improving annotation of sex chromosome ampliconic genes by testis Iso-Seq Testis tissue has been obtained fetal testes of 6 gestational stages: 4m, 6m, 7m. 9m, 10m and 11m. The tissue is preserved frozen in RNAlater. RNA was isolated from 4 fetal testes and 4 adult testes, of which two fetal testes (4m and 10m) and two adult testes passed QC with RIN~8.0 or higher. These were selected for Iso-Seq library preparation and sequencing on PacBio Sequel II platform. RNA Iso-Seq was completed in March 2020; the QC reports were good and the data has been downloaded to Texas A&M server. Assembly and analysis of the data will start shortly. B. Copy number analysis (ongoing) DNA samples have been obtained from 278 male horses, of which 203 are reproductively normal and 75 present a broad spectrum of reproductive and sex development disorders. Design and optimization is completed for gene copy number quantification assays (digital droplet PCR assays) for 7 horse Y chromosome ampliconic genes (ETSTY1,2,3,5, RBMY, TSPY, UBA1Y) and the male sex determination gene SRY. Design is finished but optimization is still ongoing for 4 ampliconic genes. Assays for 7 ampliconic genes and SRY have been tested in 79 normal stallions and in 9 subfertile/infertile stallions. So far, three genes (TSPY, RBMY, ETSTY2) show copy number changes between individuals while others show little variation. In contrast to the Thoroughbred-based MSY assembly, we find only 1 copy for HSFY in all males analyzed, but see variable CNs for it's the gametolog HSFX. We have obtained consistent and repeatable data that the single copy SRY gene, which is flanked by at least two copies of the RBMY gene represent a structurally unstable area of the male specific region of Y (MSY) in the horse. We see SRY and RBMY copy number variation in some indigenous breeds, such as Mongolian, Estonian Native and Yakutian horses.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Raudsepp, T., Finno, C., Bellone, R., Petersen, J. 2019. Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Animal Genetics, invited Review, 2019 Dec;50(6):569-597. doi: 10.1111/age.12857. Epub 2019 Sep 30.
  • Type: Book Chapters Status: Accepted Year Published: 2020 Citation: Raudsepp, T. (2020). In: Equine Genetic Diseases (Carrie Finno, Stephen Coleman, Eds.). Chapter 15: Genetics of Reproductive Diseases. Elsevier (accepted).
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2019 Citation: Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner, Terje Raudsepp. 2019. Genomic studies of stallion fertility: comparing fertility records with known and putative stallion fertility genes. 46th Annual meeting of Texas Genetics Society, College Station April 4-6, 2019. Abstract & Poster presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2019 Citation: Alyssa Dubrow, Josefina Kjollerstrom, Caitlin Castaneda, Matt Jevit, Rytis Juras, Terje Raudsepp. 2019. New insights into X-monosomy in the horse. 46th Annual meeting of Texas Genetics Society, College Station April 4-6, 2019. Abstract & Platform presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2019 Citation: Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner, Terje Raudsepp. 2019. Comparing stallion fertility records with FKBP6 genotype and copy numbers of Y ampliconic genes. Texas Forum For Reproductive Sciences, 25th Annual Meeting, College Station, April 11-12, 2019. Abstract & Platform presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2019 Citation: Caitlin Castaneda, Andrew Hillhouse, Sheila R. Teague, Charles C. Love, Dickson D. Varner and Terje Raudsepp. 2019. Genomic studies of stallion fertility: comparing fertility records with FKBP6 genotype and copy numbers of Y multi-copy genes. 37th International Society for Animal Genetics Conference, July 7-12, 2019 Lleida, Spain. Abstract & Platform presentation.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2019 Citation: Terje Raudsepp, Caitlin Castaneda, Andrew Hillhouse, Alyssa Dubrow, Matt Jevit, Rebecca Bellone, Rytis Juras, Brian W. Davis. 2019. The horse X chromosome: old tricks, new insights. 37th International Society for Animal Genetics Conference, July 7-12, 2019 Lleida, Spain. Abstract & Platform presentation.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Ruiz A, Castaneda C, Raudsepp T, Tibary A. 2019. Azoospermia and Y chromosome-autosome translocation in a Friesian stallion. Journal of Equine Veterinary Science 2019 Nov;82:102781. doi: 10.1016/j.jevs.2019.07.002. Epub 2019 Jul 11.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2020 Citation: Caitlin Castaneda, Andrew Hillhouse, Sabine Felkel, Barbara Wallner, Terje Raudsepp. (2020) Equine Y chromosome Variability. Plant & Animal Genome XXVIII, January 11-15, San Diego, CA, USA. Abstract & Equine Workshop platform presentation.
  • Type: Conference Papers and Presentations Status: Under Review Year Published: 2020 Citation: 188. Matthew J. Jevit, Brian W. Davis, Andrew Hillhouse, Caitlin Casta�eda, Kevin Bredemeyer, William J. Murphy, Rytis Juras, Malcolm Ferguson-Smith, Donald Miller, Terje Raudsepp. 2020. Refining the sequence assembly of complex regions in the horse sex chromosomes. 13th Dorothy Russell Havemeyer International Horse Genome Workshop, July 2020, Ithaca, NY, USA.