Source: AGRICULTURAL RESEARCH SERVICE submitted to NRP
PAN-GENOME AND STRUCTURAL VARIATION ANALYSES IN DOZENS OF DE NOVO ASSEMBLED DIPLOID CATTLE GENOMES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1018802
Grant No.
2019-67015-29321
Cumulative Award Amt.
$500,000.00
Proposal No.
2018-06524
Multistate No.
(N/A)
Project Start Date
Apr 1, 2019
Project End Date
Mar 31, 2024
Grant Year
2019
Program Code
[A1201]- Animal Health and Production and Animal Products: Animal Breeding, Genetics, and Genomics
Recipient Organization
AGRICULTURAL RESEARCH SERVICE
RM 331, BLDG 003, BARC-W
BELTSVILLE,MD 20705-2351
Performing Department
U.S. Department of Agriculture
Non Technical Summary
While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs), similar realizations for larger, more complex forms of genetic variation are not fully achieved. Many papers revealed that genomic structural variations or structural variations (SVs) in short, including copy number variations (CNVs), are common and associated with human health and animal production. However, detecting SVs from either microarray or short-read sequencing often suffers from low sensitivity (30-70%) and up to 85% false discovery. Another problem is that most SV discovery methods do not indicate which haplotype (a.k.a. phase block) background a given SV resides on. Therefore, sequence-based resolution of SVs (with both boundary/breakpoint mapped and haplotype-phased) and the development of high throughput genotyping platforms are needed to assess their association with economic traits. Despite tremendous progress in genome sequencing, de novo assembling a phased (haplotype-resolved) genome is still expensive until recently. Cattle and other livestock, like human, are witnessing fast increasing numbers of de novo assembled genomes. Nevertheless, how to efficiently analyze, store and utilize this rich information is a big challenge. Pan-genome was defined as the nonredundant collection of all DNA sequence present in the entire population, including a "core" genome, shared in all individuals, as well as a "variable" genome (a.k.a. individual-specific sequences) presented only in a subset of them. In line with the USDA NIFA initiatives, This proposal outlines a plan to utilize the 10x Genomics Chromium Linked-Read sequencing and mapping technologies, as well as their associated pipelines to de novo assemble dozens of representative and phased cattle diploid genomes; it will attempt to evaluate new cattle diploid genomes using computational and experimental approaches from a pan-genome perspective; it also aims to map and phase structural variation, including copy number variation, and further test their associations with cattle production and health traits.
Animal Health Component
20%
Research Effort Categories
Basic
70%
Applied
20%
Developmental
10%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043499108040%
3043399108040%
3043999108020%
Goals / Objectives
This proposal outlines a plan to utilize the 10x Genomics Chromium Linked-Read sequencing and mapping technologies, as well as their associated pipelines to de novo assemble dozens of representative and phased cattle diploid genomes; it will attempt to evaluate new cattle diploid genomes using computational and experimental approaches from a pan-genome perspective; it also aims to map and phase structural variation, including copy number variation, and further test their associations with cattle production and health traits.The specific research objectives of this project are to:1. Map and phase structural variations in diploid cattle genomes;2. Perform de novo diploid cattle genome assembly, phasing and comparison;3. Test associations of structural variations with cattle production and health traits.
Project Methods
From a pan-genome perspective, we will construct and compare 60 de novo diploid genome assemblies and enhance structural variationmapping, phasing and associations in multiplecattle breeds (Holstein, Jersey, Angus, Hereford, and Brahman/Nelore). Focusing on the applications of 10x Genomics Linked-Read technology, we will map and phase structural variations in diploid cattle genomes; perform de novo diploid cattle genome assembly, phasing and comparison; and test associations of structural variations with cattle production and health traits.

Progress 04/01/19 to 03/31/24

Outputs
Target Audience:Farmers, breeders, scientists, livestock industries, and policy planners, who need to improve animal health and production based on genome-enabled animal selection. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?1. Reuben Anderson, technician, contributed to HMW DNA isolation and Nanopore sequencing. 2. Yahui Gao, postdoc, contributed to SV and CNV calling algorithms. 3. Liu Yang, postdoc, contributed to SV calling and pangenome algorithms. 4. Nayan Bhowmik, postdoc, contributed to tissue collection. How have the results been disseminated to communities of interest?1. Present a pig and cattle pangenome and SV talk at the PAG Cattle/Swine workshop (January 2024). 2. Organized the FarmGTEx workshop at the PAG meeting (January 2024). 3. Organized the USDA ARS SCINet Translational Omics Working Group webinar (September 2023 to now) What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? For Objectives 1 and 2, we selected Holstein and Jersey cows from two USDA herds (Beltsville Agricultural Research Center and Dairy Forage Research Center) based on their genetic diversity. Utilizing optimized PacBio HiFi sequencing platforms and advanced bioinformatics tools, we sequenced 20 Holstein and 10 Jersey cattle to 20× coverage, resulting in the assembly of 31 genomes, each with an average size of 3.25 Gb and a contig N50 of 69.36 Mb. Using the cattle ARS-UCD1.2 reference assembly, we integrated five read-based and one assembly-based SV caller, creating Holstein and Jersey SV catalogs containing 74,068 and 54,689 events, spanning 202 Mb (7.43% of the genome) and 135 Mb (4.97% of the genome), respectively. Our analysis revealed that SVs are enriched in less conserved, non-coding, and non-regulatory regions. When comparing Holsteins with high and low feed efficiency (FE), we found that high FE-specific SVs were linked to energy metabolism and olfactory receptors, while low FE-specific SVs were associated with material transport. We constructed Holstein and Jersey pangenome graphs with 148,598 and 105,875 nodes, and 208,891 and 147,990 edges, representing 47,028 and 37,137 deletions, insertions, and complex biallelic and multi-allelic events, along with 63.75 Mb and 42.34 Mb of novel sequence, respectively. Notably, we observed SV count saturation with 20 Holsteins, while adding Jersey samples significantly increased the SV count, highlighting breed-specific SV events. Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought. These advancements demonstrate our commitment to exploring and understanding genetic diversity in cattle and other livestock, providing valuable data for future research and breeding programs. Two manuscripts are in preparation. For Objective 3, we conducted and published multiple studies on CNV discovery and its association with traits. For example, Yang L. et al. (2024) mapped and characterized structural variations in 1,060 pig genomes, highlighting their impact on traits, while Yang J. et al. (2024) examined structural variant landscapes in sheep and goats, uncovering convergent evolutionary signatures. Liu et al. (2023) investigated DGAT1 copy number variations in goats, finding significant correlations with milk production traits. Zhao et al. (2023) identified the impact of young SINEs on gene regulation and genetic diversity in pigs. Other related studies have significantly advanced our understanding of genetic and genomic factors in livestock. Teng et al. (2024), published in Nature Genetics, mapped genetic regulatory effects across pig tissues and provided insights into the regulatory networks controlling gene expression. Xiang et al. (2023) revealed that gene expression and RNA splicing account for the substantial heritability of complex traits in cattle, emphasizing the importance of transcriptomic data. Gao et al. (2024) used transcriptomic profiling to uncover molecular adaptations in the gastrointestinal tracts of dairy cattle during lactation, offering insights into efficient milk production. Further, Sun et al. (2024) employed meta-omics to explore lipid metabolism regulation between pig breeds, identifying key genetic and environmental factors. Collectively, these studies enhance our understanding of genetic diversity, trait association, molecular QTL mapping, and evolutionary processes in livestock, offering valuable resources for genomic research and breeding programs.

Publications

  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Gao, Y., Liu, G.E., Ma, L., Fang, L., Li, C.J. and Baldwin, R.L.t. 2024. Transcriptomic profiling of gastrointestinal tracts in dairy cattle during lactation reveals molecular adaptations for milk synthesis. J Adv Res. 10.1016/j.jare.2024.06.020.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Sun, J., Xie, F., Wang, J., Luo, J., Chen, T., Jiang, Q., Xi, Q., Liu, G.E. and Zhang, Y. 2024. Integrated meta-omics reveals the regulatory landscape involved in lipid metabolism between pig breeds. Microbiome, 12(1): 33. 10.1186/s40168-023-01743-3.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Sun, X., Guo, J., Li, R., Zhang, H., Zhang, Y., Liu, G.E., Emu, Q. and Zhang, H. 2024. Whole-Genome Resequencing Reveals Genetic Diversity and Wool Trait-Related Genes in Liangshan Semi-Fine-Wool Sheep. Animals (Basel), 14(3). 10.3390/ani14030444.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Teng, J., Gao, Y., Yin, H., Bai, Z., Liu, S., Zeng, H., Bai, L., Cai, Z., Zhao, B., Li, X., Xu, Z., Lin, Q., Pan, Z., Yang, W., Yu, X., Guan, D., Hou, Y., Keel, B.N., Rohrer, G.A., Lindholm-Perry, A.K., Oliver, W.T., Ballester, M., Crespo-Piazuelo, D., Quintanilla, R., Canela-Xandri, O., Rawlik, K., Xia, C., Yao, Y., Zhao, Q., Yao, W., Yang, L., Li, H., Zhang, H., Liao, W., Chen, T., Karlskov-Mortensen, P., Fredholm, M., Amills, M., Clop, A., Giuffra, E., Wu, J., Cai, X., Diao, S., Pan, X., Wei, C., Li, J., Cheng, H., Wang, S., Su, G., Sahana, G., Lund, M.S., Dekkers, J.C.M., Kramer, L., Tuggle, C.K., Corbett, R., Groenen, M.A.M., Madsen, O., G�dia, M., Rocha, D., Charles, M., Li, C.-j., Pausch, H., Hu, X., Frantz, L., Luo, Y., Lin, L., Zhou, Z., Zhang, Z., Chen, Z., Cui, L., Xiang, R., Shen, X., Li, P., Huang, R., Tang, G., Li, M., Zhao, Y., Yi, G., Tang, Z., Jiang, J., Zhao, F., Yuan, X., Liu, X., Chen, Y., Xu, X., Zhao, S., Zhao, P., Haley, C., Zhou, H., Wang, Q., Pan, Y., Ding, X., Ma, L., Li, J., Navarro, P., Zhang, Q., Li, B., Tenesa, A., Li, K., Liu, G.E., Zhang, Z., Fang, L. and The Pig, G.C. 2024. A compendium of genetic regulatory effects across pig tissues. Nat Genet, 56(1): 112-123. 10.1038/s41588-023-01585-7.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Yang, L., Yin, H., Bai, L., Yao, W., Tao, T., Zhao, Q., Gao, Y., Teng, J., Xu, Z., Lin, Q., Diao, S., Pan, Z., Guan, D., Li, B., Zhou, H., Zhou, Z., Zhao, F., Wang, Q., Pan, Y., Zhang, Z., Li, K., Fang, L. and Liu, G.E. 2024. Mapping and functional characterization of structural variation in 1060 pig genomes. Genome Biol, 25(1): 116. 10.1186/s13059-024-03253-3.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Yang, J., Wang, D.F., Huang, J.H., Zhu, Q.H., Luo, L.Y., Lu, R., Xie, X.L., Salehian-Dehkordi, H., Esmailizadeh, A., Liu, G.E. and Li, M.H. 2024. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol, 25(1): 148. 10.1186/s13059-024-03288-6.
  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Boschiero, C., Neupane, M., Yang, L., Schroeder, S.G., Tuo, W., Ma, L., Baldwin, R.L., Van Tassell, C.P. and Liu, G.E. 2024. A pilot detection and associate study of gene presence-absence variation in Holstein cattle. Animals, 14(13): 1921. 10.3390/ani14131921.
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Liu, M., Cheng, J., Chen, Y., Yang, L., Raza, S.H.A., Huang, Y., Lei, C., Liu, G.E., Lan, X. and Chen, H. 2023. Distribution of DGAT1 copy number variation in Chinese goats and its associations with milk production traits. Anim Biotechnol, 34(4): 980-985. 10.1080/10495398.2021.2007118.
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Xiang, R., Fang, L., Liu, S., Macleod, I.M., Liu, Z., Breen, E.J., Gao, Y., Liu, G.E., Tenesa, A., Mason, B.A., Chamberlain, A.J., Wray, N.R. and Goddard, M.E. 2023. Gene expression and RNA splicing explain large proportions of the heritability for complex traits in cattle. Cell Genom, 3(10): 100385. 10.1016/j.xgen.2023.100385.
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Zhao, P., Gu, L., Gao, Y., Pan, Z., Liu, L., Li, X., Zhou, H., Yu, D., Han, X., Qian, L., Liu, G.E., Fang, L. and Wang, Z. 2023. Young SINEs in pig genomes impact gene regulation, genetic diversity, and complex traits. Commun Biol, 6(1): 894. 10.1038/s42003-023-05234-x.
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Zhao, P., Peng, C., Fang, L., Wang, Z. and Liu, G.E. 2023. Taming transposable elements in livestock and poultry: a review of their roles and applications. Genetics Selection Evolution, 55(1): 50. 10.1186/s12711-023-00821-2.


Progress 04/01/22 to 03/31/23

Outputs
Target Audience:Farmers, breeders, scientists, livestock industries, and policy planners, who need to improve animal health and production based on genome-enabled animal selection. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?1. Reuben Anderson, technician, contributed to HMW DNA isolation and Nanopore sequencing; 2. Yahui Gao, postdoc, contributed to SV and CNV calling algorithms; 3. Clarissa Boschiero, postdoc, contributed to SV and CNV calling algorithms. How have the results been disseminated to communities of interest?1. Present a cattle pangenome andSV talk at the PAG Cattle/Swine workshop (January 2023). 2. Organized the FarmGTEx workshop at the PAG meeting (January2023). What do you plan to do during the next reporting period to accomplish the goals?Using the PacBio HiFi platform, we will sequence the remaining 10 diploid (20 haploid) genomes based on the obtained results. All bioinformatic analyses will be performed with these data.

Impacts
What was accomplished under these goals? During the last year, for Objectives 1 and 2, a total of 30 Holstein/Jersey cows from two USDA herds (BARC and U.S. Dairy Forage Research Center) were chosen based on their diversity. Using optimized PacBio HiFi sequencing platforms and bioinformatics tools, 20Holstein diploid (40 haploid) genomes were sequenced and analyzed. For Objective 3, by assembling a pangenome for global cattle using 900 short-read samples, we recovered 83 Mb of novel sequences, revealed novel structural variations, and provided new insights into their diversity, function, and evolutionary history (Zhou et al.). In addition, we also conducted and published one CNV discovery and their association with coat color study in goats (Guo et al.).

Publications

  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Guo et al. A 13.42-kb tandem duplication at the ASIP locus is strongly associated with the depigmentation phenotype of non-classic Swiss markings in goats. BMC Genomics. 2022; 23(1):437. doi: 10.1186/s12864-022-08672-9.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Zhou et al. Assembly of a pangenome for global cattle reveals missing sequences and novel structural variations, providing new insights into their diversity and evolutionary history. Genome Res. 2022; 32(8):1585601. doi: 10.1101/gr.276550.122.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Yao et al. Comparative transcriptome in large-scale human and cattle populations. Genome Biol. 2022; 23(1):176. doi: 10.1186/s13059-022-02745-4.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Liu et al. A multi-tissue atlas of regulatory variants in cattle. Nat Genet. 2022; 54(9):1438-1447. doi: 10.1038/s41588-022-01153-5.


Progress 04/01/21 to 03/31/22

Outputs
Target Audience:Farmers, breeders, scientists, livestock industries, and policy planners, who need to improve animal health and production based on genome-enabled animal selection. Changes/Problems:We revised the research plan to sequence 30 Holstein cow genomes using PacBio HiFi to 20x. All the objectives will stay roughly the same. What opportunities for training and professional development has the project provided?1. Reuben Anderson, technician, contributed to HMW DNA isolation and Nanopore sequencing; 2. Yahui Gao, postdoc, contributed to SV and CNV calling algorithms; 3. Clarissa Boschiero, postdoc, contributed to SV and CNV calling algorithms. How have the results been disseminated to communities of interest?1. Presented cattle GTEx talk at the Breeding and Genetics Section of the ADSA 2021 Virtual Annual Meeting (June 2021); 2. Present cattle SV and pangenome talk at USDA NIFA PD virtual meeting (March 2022); What do you plan to do during the next reporting period to accomplish the goals?Teaming up with other researchers from the Human Genome Structural Variation Consortium, the Human Pangenome Reference Consortium, and the Bovine Pangenome Consortium, we will finalizean integrated sequencing (platforms, coverage, and their combinations) and computational approach to achieve the best results for this project. Working with the Bovine Pangenome Consortium, we will implement the best sample selection, sequencing, and analysis strategies to minimize the sample overlap and maximize the benefits for the taxpayers.

Impacts
What was accomplished under these goals? Due to the COVID-19, the project was extended with no additional cost for 12 months. For Objectives 1 and 2, we finished the evaluation of the long-read sequencing platforms and bioinformatics tools and found that long reads outperformed short reads in terms of SV detections. We revised the research plan to sequence 30 Holstein cow genomes using PacBio HiFi to 20x. All the objectives will stay roughly the same. We chose to do so because CDCB hosts the largest genotype and phenotype databases for Holsteins. We can easily obtain blood and other tissues from cows. Using them, we can survey SV and construct draft assemblies and a Holstein graph pangenome. Based on the principal component analysis (PCA) results using SNP information, we chose 30 diverse Holstein cows from BARC, UMD, and ISU herds. Blood collection, high molecule weight DNA isolation, library construction, and PacBio HiFi sequencing protocols have been finalized. Additionally, using short-read sequencing data, we also improved the cattle genome reference assembly by recovering74 Mb of novel sequences. We further conducted several CNV discovery and association studies in cattleand goats, providing new insights into their diversity and evolution history.

Publications

  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Yang, L. et al. Insights from initial variant detection by sequencing single sperm in cattle. Dairy, 2(4), 649-657. doi:/10.3390/dairy2040050 (2021).
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Guo, J. et al. Genome-wide association study reveals 14 new SNPs and confirms two structural variants highly associated with the horned/polled phenotype in goats. BMC Genomics, 22(1), 769. doi:/10.1186/s12864-021-08089-w (2021).
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Yang, L. et al. Towards the detection of copy number variation from single sperm sequencing in cattle. BMC Genomics, 23(1), 215. doi:/10.1186/s12864-022-08441-8 (2022).
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Gao, Y. et al. (2022). Initial analysis of structural variation detections in cattle using long-read sequencing methods. Genes (Basel), 13(5). doi:10.3390/genes13050828 (2022).


Progress 04/01/20 to 03/31/21

Outputs
Target Audience:Farmers, scientists, livestock industries and policy planners who need improve animal health and production based on genome-enabled animal selection. Changes/Problems:Due to the COVID-19, the project progress is significantly delayed. We will have to extend the project at no cost for additional one or two years. Because of the discontinuation of 10x linked read reagents, we will proceed with a new integrated sequencing (PacBio, ONT, and Hi-C platforms, coverage, and their combinations) and computational approach to achieve the best results. As an Associate Member of the Human Pangenome Reference Consortium (HPRC), our team plan to implement the current HPRC's best practice in constructing a diploid phased human genome as described in https://www.nature.com/articles/s41587-020-0711-0 and https://www.nature.com/articles/s41592-020-01056-5. These methods (i.e. hifiasm and DipAsm) produced haplotype-resolved human assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. But its long-read sequencing (PacBio or ONT) and Hi-C reagent cost are 7-fold higher than the original 10x linked read (Illumina short-read sequencing) cost estimate. Therefore, for Objectives 1 and 2, we will likely need to decrease the sample number from 60 to 9 or 10, according to the funding budget. Working with the Bovine Pangenome Consortium, we will employ the best sample selection, sequencing, and assembly strategies to minimize the sample overlap and maximize the benefit for the taxpayers. In the next two or three years, we are confident to achieve the goals of this project, which include (1) generate haplotype-resolved reference assemblies for selected cattle breeds from a pan-genome perspective; (2) map and phase structural variation, including copy number variation; and (3) further test their associations with cattle economic traits. What opportunities for training and professional development has the project provided?1. Reuben Anderson, technician, contributed to HMW DNA isolation and Nanopore sequencing; 2. Shuli Liu, postdoc, contributed to SV and CNV calling algorithms; 3. Yahui Gao, postdoc, contributed to SV and CNV calling algorithms; 4. Clarissa Boschiero, postdoc, contributed to SV and CNV calling algorithms. How have the results been disseminated to communities of interest?1. Presented cattle gene atlas talk at the Breeding and Genetics Section of the ADSA 2020 Virtual Annual Meeting (June 2020); 2. Present cattle DNA methylation talk at Cell Biology Symposia of the 2020 ASAS-CSAS-WSASAS Virtual Meeting and Trade Show (July 2020); 3. Co-founded the FarmGTEX consortium and co-organized its first workshop (May 2021). What do you plan to do during the next reporting period to accomplish the goals?Due to several patent lawsuits, 10x Genomics stopped providing reagents of the GEM microfluidic chips for linked read sequencing. We are also watching closely the winddown of the COVID-19 pandemic, which has severely impacted the collection of additional samples outside of the USDA. At the same time, teaming up with other researchers from the Human Genome Structural Variation Consortium, the Human Pangenome Reference Consortium, and the Bovine Pangenome Consortium, we are customizing and optimizing an integrated sequencing (platforms, coverage, and their combinations) and computational approach to achieve the best results for this project. As described previously in https://www.nature.com/articles/s41467-018-08148-z, we are actively probing and testing the other sequencing alternatives to 10x Genomics linked reads, such as Pacific Biosciences (PacBio) Sequel II (CLR, CCS or HiFi reads), Oxford Nanopore Technologies (ONT) PromethiION, and Hi-C technologies.

Impacts
What was accomplished under these goals? Due to the COVID-19, the project progress is significantly delayed. Even so, for Objectives 1 and 2, we collected tissues from 4 Holstein cattle and 2 Hereford cattle. We tested various high molecule weight DNA isolation protocols on different tissues, including lung, liver, and sperm. We conducted 10X Genomics linked read (55x) and performed Pacific Biosciences (PacBio) Sequel II (CLR - Continuous long read 40x and CCS - circular consensus sequencing 6x) and Oxford Nanopore Technologies (ONT) sequencing (11x) on selected lung DNA samples. Using these data, we tested various mapping programs, including LongRanger, NGMLR, and pbmm2. We then evaluated structural variation (SV) detection pipelines, such as LongRanger, LinkedSV, Sniffles, and PBSV. We then compared and contrasted these results using the overlapping software SURVIVOR. When focusing on large insertions and deletions (i.e. copy number variations - CNVs), our results indicated that there exists a total of 26,962 CNV events with a length of 188.29Mb, corresponding to 6.93% of the cattle genome. From only 10x Genomics linked reads, we detected 10,439 CNVs (53.89Mb, 1.98% of the genome). From all long reads, we found 24,730 CNVs (143.05Mb, 5.27% of the genome). When comparing these two datasets, we found only 8,207 CNV overlapped (8.67 Mb, representing 4.60% of the CNV region, or 0.32% of the genome) and long reads consistently performed better than short reads, in terms of CNV detection. Recently, a human study obtained a similar result (https://pubmed.ncbi.nlm.nih.gov/32887686/), where they found hundreds of known copy number variants were detectable only through long-read sequencing and long-read-based SV inference generally outperforms that of short reads, across various SV sizes and types. For Objective 3, we successfully conducted five CNV discovery and association studies with economic traits in taurine, indicine cattle, and goats using short-read sequencing data.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Kang, X., Li, M., Liu, M., Liu, S., Pan, M.G., Wiggans, G.R., Rosen, B.D. and Liu, G.E. 2020. Copy number variation analysis reveals variants associated with milk production traits in dairy goats. Genomics, 112(6): 4934-4937.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Hu, Y., Xia, H., Li, M., Xu, C., Ye, X., Su, R., Zhang, M., Nash, O., Sonstegard, T.S., Yang, L., Liu, G.E. and Zhou, Y. 2020. Comparative analyses of copy number variations between Bos taurus and Bos indicus. BMC Genomics, 21(1): 682.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Yang, L., Niu, Q., Zhang, T., Zhao, G., Zhu, B., Chen, Y., Zhang, L., Gao, X., Gao, H., Liu, G.E., Li, J. and Xu, L. 2020. Genomic sequencing analysis reveals copy number variations and their associations with economically important traits in beef cattle. Genomics, 113(1): 812-820.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Guo, J., Zhong, J., Liu, G.E., Yang, L., Li, L., Chen, G., Song, T. and Zhang, H. 2020. Identification and population genetic analyses of copy number variations in six domestic goat breeds and Bezoar ibexes using next-generation sequencing. BMC Genomics, 21(1): 840.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Berton, M., Lemos, M., Chud, T., Stafuzza, N., Kluska, S., Amorim, S., Lopes, L., Pereira, A., Bickhart, D., Liu, G., Albuquerque, L., Baldi, F. 2021. Genome-wide association study between copy number variation regions and carcass and meat quality traits in Nellore cattle. Animal Production Science, 61(8):731-744.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Gao, Y., Fang, L., Baldwin, R.L., Connor, E.E., Cole, J.B., Ma, L., Li, C., Liu, G.E. (2020) Single-cell transcriptomic analyses of cattle ruminal epithelial cells before and after weaning. Genomics, 113(4):2045-2055.


Progress 04/01/19 to 03/31/20

Outputs
Target Audience:Farmers, scientists, livestock industries and policy planners who need improve animal health and production based ongenome-enabled animal selection. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?1. Reuben Anderson, technician, contributed to HMW DNA isolationand Nanopore sequencing. 2. Shuli Liu, Ph. D. student, contributed to SVand CNVcalling algorithms. 3. Yahui Gao, postdoc, contributed to SVand CNVcalling algorithms. How have the results been disseminated to communities of interest?Attended the initial Bovine Pan-Genome Consortium meeting at Plant and Animal Genome Conference, San Diego, CA, January 15, 2020 and helped to set up its website. What do you plan to do during the next reporting period to accomplish the goals?There are a few pending patent lawsuits between Bio-Rad Laboratories vs. 10x Genomics, which may affect the availability of linked read reagents, i.e. the GEM microfluidic chips from 10 x Genomics. We are watching closely the impacts of these lawsuits and the ongoing COVID-19 pandemic. At the same time, teaming up with other researchers from the Bovine Pan-Genome Consortium, we are actively probing and testing the other alternatives, such as single tube long fragment read (stLFR), Pacific Biosciences Sequel, Oxford Nanopore PromethiION, and Hi-C technologies. The long-term goals are to generate haplotype-resolved reference assemblies for selected cattle breeds from a pan-genome perspective; to map and phase structural variation, including copy number variation; and further test their associations with cattle economic traits.

Impacts
What was accomplished under these goals? For Objective 1, we started with generating sequence data from the third-generation sequencing and mapping platforms (PacBio, Oxford Nanopore, and 10X Genomics linked reads) and participating in international efforts to assemble breed-specific genomes for dairy and beef breeds, as well as other related species. Based on new long read and linked read sequencing data on selected samples, we performed an initial comparative study of multi-platform discovery of haplotype-resolved structural variation. Using on short read sequencing and microarray data, we conducted CNV discovery and association studies in Holsteins and goats.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Liu, M., Woodward Greene, M.J., Kang, X., Pan, M.G., Rosen, B.D., Van Tassell, C.P., Chen, H., Liu, G. 2020. Genome-wide CNV analysis revealed variants associated with growth traits in African indigenous goats. Genomics. 112(2):1477-1480. https://doi.org/10.1016/j.ygeno.2019.08.018.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Rosen, B.D., Bickhart, D.M., Schnabel, R.D., Koren, S., Elsik, C.G., Tseng, E., Rowan, T.N., Low, W.Y., Zimin, A., Couldrey, C., Hall, R., Li, W., Rhie, A., Ghurye, J., Mckay, S.D., Thibaud-Nissen, F., Hoffman, J., Murdoch, B.M., Snelling, W.M., Mcdaneld, T.G., Hammond, J.A., Schwartz, J.C., Nandolo, W., Hagen, D.E., Dreischer, C., Schultheiss, S.J., Schroeder, S.G., Cole, J.B., Van Tassell, C.P., Liu, G., Smith, T.P., Medrano, J.F. 2020. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 9(3)1-9. https://doi.org/10.1093/gigascience/giaa021.