Source: UNIVERSITY OF CALIFORNIA, DAVIS submitted to
LOBLOLLY PINE GENOME PROJECT
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0224477
Grant No.
2011-67009-30030
Cumulative Award Amt.
$14,625,000.00
Proposal No.
2015-01718
Multistate No.
(N/A)
Project Start Date
Feb 1, 2011
Project End Date
Jan 31, 2018
Grant Year
2015
Program Code
[A6141]- Sustainable Bioenergy: National Loblolly Pine Genome Sequencing
Project Director
Neale, D. B.
Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671
Performing Department
Plant Sciences
Non Technical Summary
We will use a novel strategy based on state-of-the-art cloning, next-generation sequencing, and assembly technologies to sequence the genome of loblolly pine (Pinus taeda), the largest genome yet sequenced. Our approach will be -adaptive- as it is not possible to predict the capacity of DNA sequencing technologies that will be available in latter years of the project. Because of our novel sequencing approach, we have the capacity to sequence not only one loblolly pine genome but also two additional conifer genomes: sugar pine (P. lambertiana), and Douglas-fir (Pseudotsuga menziesii). We will develop new sequencing approaches that can rapidly and inexpensively sequence genomes exceeding 20Gb in size and will fully integrate resulting genome sequences into all other existing genomic information resources for these species that can be used by researchers, breeders, and resource managers. The primary vehicle for accomplishing this goal are the Dendrome/TreeGenes databases that have served the forestry community for 20 years. In addition, we will form an alliance with the horticultural genomics community through the Genome Database for Rosaceae (GDR) project. Together, we will develop community-based gene annotation and genome database platforms that leverage the strengths of both the forestry and the horticultural community.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2010611108080%
2010612108020%
Goals / Objectives
Specific Aim 1 High-quality reference genome sequences of loblolly pine (LP), sugar pine, and Douglas-fir Effective deployment of new technologies in a hierarchical Whole Genome Shotgun (WGS) approach will yield reference sequences based on well-defined milestones. An initial and early deliverable will be 21X WGS sequence and preliminary assemblies (gene-boosted and whole genome) of the LP genome based on >= 100 bp paired-end Illumina sequences of a mix of 500-bp, 5-kbp, and 40-kbp (fosmid-diTag) libraries. In less than two years a 10x18 hierarchical WGS (180X total read depth) based on 18X (read depth of 500-bp, 5-kbp, and 40-kbp libraries) of many small pools of fosmids will be the fundamental data for two types of assemblies: a consensus based on all the data and a second consensus based on hierarchical analysis of subassemblies of the haploid fosmid pools. Polishing will follow that includes longer end reads from a 10X BAC library, deep fosmind-end sequencing, and existing or emerging long-read technologies which are deemed effective for improving assembly quality. A high-resolution (1.0 cM) genetic scaffold based on a new genotyping resource will incorporate all genotypable contigs and validate the contiguity of larger ones. In later years comparable reference sequences for sugar pine and Douglas fir will be created. Comparative genomic analysis of these three conifer genomes will provide a solid and rich annotation and further improve assembly quality and contiguity. Specific Aim 2 Transcriptome sequencing for gene discovery, reference building, and aids to genome assembly We will build transcriptome references using a sequencing approach to maximize evidence-based gene discovery in parallel with the reference genome assembly and annotation and we will provide full transcript assemblies for functional genomics studies. RNAs from a large number of loblolly pine organs, stages of development, and tissues exposed to biotic and abiotic stresses will be sequenced using RNA-Seq libraries and Illumina sequencing. Data will be used first to add depth and detail to the transcriptome and to catalog transcribed polymorphisms. Transcriptome analysis will profile gene expression differences of biological importance, including changes in development of reproductive tissues, embryos and seedlings, and wood and in response to biotic and abiotic stresses. Specific Aim 3 Dendrome and TreeGenes databases: Annotation, data integration, and distribution The transcriptome and genome sequences will be delivered via TreeGenes as sequence becomes available. Collaboration with GDR will provide the primary annotation and integrate a custom web-based tool known as GenSAS from GDR with GBrowse from Dendrome to facilitate community-level annotation. We will apply and expand existing pipelines to deliver a comprehensive SNP resource and distribute this through the existing DiversiTree interface. We will work with existing projects like Gene Ontology and Plant Ontology to implement specific conifer-based ontologies to consistently describe gene products and phenotypes. All pipelines and tools developed in this project will be made freely available.
Project Methods
Sequencing. Our strategy divides the genome into many random haploid partitions, each with a significantly smaller genome size than the complete genome directly obtained from diploid pine trees. We will also make deep and representative fosmid libraries (0.40 kbp inserts) from which random samples of thousands of clones, pools, can be easily manipulated and managed. Assembly. Two assemblies will be produced. The first is the traditional consensus of the input genotypes, in our case a mosaic of the maternal and paternal genomes. For the second, we will create as much as possible pair haploid sequences of the two parent genomes utilizing fosmid overlap and the genetic map. The first consensus assembly will have over 180X read coverage.. Using a second strategy, we will assembly each fosmid pool separately, which should produce relatively large contigs due to the deep coverage. We can then take the resulting assemblies and assemble those together using Celera Assembler, Minimus, or another package. Mapping. Genetic mapping will be used to assign and order scaffolds to the 12 loblolly pine linkage groups (chromosomes). We estimate that ~75,000 scaffolds will need to be mapped. Two segregating SNPs per scaffold will be identified during the assembly of scaffolds. An Illumina Infinium SNP genotyping chip(s) of 150,000 SNPs will be designed and applied to the reference mapping population of 500 progeny. Mapping activity will be spread out over all 5 years. In the spring of 2011 we will generate a new highdensity mapping population of 5000 to 10,000 individuals that could be used for mapping by the broader conifer genomics community. Annotation. Genomic contigs will be computationally annotated with MAKER and PASA. The annotation pipeline will be applied in two distinct stages, stage 1 predictions will encompass gene-finders that take advantage of alignments of expressed sequences such as proteins, ESTs, and assembled mRNAs within a single genome. Stage 2 will incorporate nucleotide level multi-genome alignments to enhance stage 1 predictions. Manual curation of both genomic and transcriptomic sequence will be facilitated through the use of a custom web-based application, GenSAS. We will provide a list of genes for which there is conflicting evidence based on inconsistency in transcript mapping onto predicted genes, similar to the model used in PlantGDB (www.plantgdb.org). Contributing to the overall effort to curate plant genes, we will co-ordinate with the NSF-funded WikiPlantGene to exchange curated genes between our respective databases and participate in the PlantGene workshops. Database/Software. The TreeGenes database as part of the Dendrome project, is currently the sole resource for comprehensive data submission, retrieval, and comparative genomics analysis for forest trees.. Online analytical tools and services, based on high-throughput computational methods and manual curation, enable the research community to interrogate this information, thus catalyzing discovery.

Progress 02/01/11 to 01/31/18

Outputs
Target Audience:Genomics researchers via publications in refereed journals and presentations at meetings and conferences. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?To date, the Project has offered training opportunities for high school students, undergraduates, graduate students, technical staff, postdoctoral scholars, and professionals or advanced researchers. In some instances, the Project provided salary and benefits for the trainees and in others, the trainees were funded by other projects. In the first category - Project supported, there have been 33 undergraduate students, two graduate students, seven postdoctoral scholars, and 21 technical staff. In the second category, there have been one high school student, five undergraduate students, four graduate students, five postdoctoral scholars, and one professional/researcher. How have the results been disseminated to communities of interest?Over the course of the project, the Project Director, co-Project Directors, and other project researchers disseminated information and results about the project by means of talks (~80) and posters (~15) to a wide range of venues that reached other genomics researchers, collaborators, and end-users. Some of these venues were meetings organized by the project: The Pine Genome Reference Sequence Workshops organized at the International Plant and Animal Genome XXIII Conferences in 2012, (XX), 2013 (XXI), 2014 (XXII), and 2015 (XXIII) and the Recreating Forestry through Science Workshop at the 2015 National Convention of the Society of American Foresters, Baton Rouge Nov. 3-7, 2015. In addition, the project contributed to and completed what became a 17-part series of Conifer Genomics Learning Modules, hosted at the eXtension website: http://www.extension.org/pages/60370/conifer-translational-genomics-network-online-modules. Utility of project genomic resources and tools Led by N. C. Wheeler, the Extension and Education coordinator, a survey was conducted on how the approaches and products of the project have found utility in the scientific community. The final report was filed in 2016. Examples of usage included: http://pinemap.org/). The USDA-NIFA funded PINEMAP project is using the loblolly pine genome sequence in a routine manner to design genotyping-by-sequencing assays to discover genetic variation in loblolly pine for application in marker-based breeding. Groups at Texas A&M University, University of Florida, North Carolina State University, and Virginia Tech all rely on the genome sequence to conduct their research. AdapTree. The Genome Canada- and Genome British Columbia-funded AdapTree project used the loblolly pine reference genome sequence to design a genotyping-by-sequencing assay for lodgepole pine toward understanding adaptive genetic potential of this important species in the face of changing climate. AdapTree researchers reported that the loblolly pine sequence was extremely important for their work in lodgepole pine that would suggest that other researchers around the world are using the sequence for similar applications in many other pines and conifers. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The accomplishments of the five originally funded years and the two no-cost extension years are presented below, organized by research aims Specific Aim 1 --High-quality reference genome sequences of loblolly pine, sugar pine, and Douglas-fir Specific Aim 2 --Transcriptome sequencing for gene discovery, reference building, and aids to genome assembly Loblolly pine (Pinus taeda) An improved assembly (V2.0) of loblolly pine was completed and released, based primarily on an additional 11X very-long-read whole genome shotgun (WGS) sequence coverage from the Pacific Biosciences (PacBio) sequencing platform. Thus, the assembly is a hybrid PacBio/Illumina assembly and takes advantage of better super-reads from additional paired-end loblolly pine sequencing to error correct the PacBio reads. A manuscript describing V2.0 was published (Zimin et al. 2017. GigaScience 6(1):1-4). The V2.0 assembly has been deposited at NCBI under BioProject PRJNA174450, and the PacBio reads are under the same project with accession number SRP034079. The V2.0 assembly has facilitated computation of single-nucleotide polymorphism (SNP) in loblolly pine. Whole-genome resequencing data (with the coverage of ~11X genome equivalents) was generated from ten different individual trees. These data (about 800 million reads per tree) were aligned to the loblolly pine V2.0 assembly. The initial result is a very large set of SNPs from which a subset was generated for application in genotyping by end users. PacBio Iso-Seq transcriptome resources generated by collaborator Ross Whetten were assembled and annotated by the PineRefSeq team. These resources were used to further scaffold the V2.0 assembly to generate V2.01. This version has been annotated with both new and previously generated transcriptomic resources. Assemblies V1.0, V1.01, and V2.01 are available at https://treegenesdb.org/FTP/Genomes/Pita/. Sugar pine (P. lambertiana) The initial WGS assembly (V1.0, a MaSuRCA-SOAPdenovo2 assembly) of the sugar pine genome has been completed and released along with the de novo sugar pine transcriptome assemblies. Genome assembly V1.0 and the transcriptome assembly have been published (Stevens et al. 2016. Genetics 204(3):1613-1626 and Gonzalez-Ibeas et al. 2016. G3 6:3787-3802, respectively). The assembly and annotation are available from GenBank as accession GCA_001447015.2 and BioProject 174450. Genomic DNA and RNA reads are also available under BioProject 174450. A rescaffolding of sugar pine was performed using barcoded sub-haploid pools of high molecular weight DNA using the 10X Genomics protocol. The new assembly, V1.5, has a scaffold N50 exceeding 1Mbp. The additional contiguity has confirmed the DiTag links to discover the Cr1 gene candidate presented in the Stevens et al. 2016 manuscript submitted to Genetics. The re-scaffolding of the genome has increased the number of bases genetically linked to Cr1 from ~2 Mbp to ~6 Mbp, and also the number of gene candidates. This rescaffolding effort was reported in Crepeau et al. 2017. G3 7:1563-1568. The sequence data used for this study was deposited in the NCBI trace archive under BioProject 174450 and accession number SRX2629912. Assemblies V1.0 and V1.5 are available at https://treegenesdb.org/FTP/Genomes/Pila/ Douglas-fir (Pseudotsuga menziesii) A MaSuRCA-SOAPdenovo2 assembly (V1.0) of the Douglas-fir genome from WGS sequencing on the Illumina HiSeq platform at 60X coverage has been completed and released (Neale et al. 2017. G3 7(9):3157-3167). The V1.0 genome assembly has been deposited at NCBI as accession LPNX000000000 in BioProject PRJNA174450 and raw sequence data have been deposited in the NCBI SRA database under accession SAMN03333061. Genome assemblies V0.5 and V1.0 are available at https://treegenesdb.org/FTP/Genomes/Psme/. Transcriptome resources from needle tissue were generated by collaborator Richard Cronn and assembled by the PineRefSeq team for scaffolding and annotation (Cronn et al. 2017. BMC Genomics 18:558). The raw sequence and mapped count data are available from the NCBI GenBank Short Read Archive under accession SRP018395 and the Gene Expression Omnibus under accession GSE44058. Specific Aim 3 -- Annotation, data integration, and distribution (tools, resources). The project website (https://pinerefseq.faculty.ucdavis.edu/) provides a home page with general information about the project (Aims and Principles) and links to the genomic resources, with additional pages for presenting project participants (Members), learning resources (Outreach), a roster of project presentations and posters (Presentations), a roster of project publications (Publications), and the original deliverable schedule (Timetable). The TreeGenes database https://treegenesdb.org/Drupal ,developed initially under the auspices of the project, provides access to genome resources developed by the project for each of the three species, loblolly pine (Pita), sugar pine (Pila), and Douglas-fir (Psme) at the following URLs: https://tgwebdev.cam.uchc.edu/FTP/Genomes/Pila/ https://tgwebdev.cam.uchc.edu/FTP/Genomes/Pita/ https://tgwebdev.cam.uchc.edu/FTP/Genomes/Psme/ With one exception, these resources include for each species the genome assemblies, transcriptomes, repeat annotations, and mitochondrial genomes. The exception is the mitochondrial genome for Douglas-fir which can be found at ftp://ccb.jhu.edu/pub/data/Douglas-fir/mito/mito.fa. Another resource is GenSAS, an online annotation pipeline (currently running at v5.1, https://www.gensas.org/) for whole genome structural annotation that has been developed and evaluated with genome sequences and support from this project.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Genomics of forest trees: Genome sequencing, marker-assisted breeding, and landscape genomics. CEBAS-CSIC Murcia, Spain. 6 February 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Genomics of forest trees: Genome sequencing, marker-assisted breeding, and landscape genomics. Barcelona CRAG, Spain. 8 February 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Genome to germplasm meeting. INRA Versailles, France. 28 February - 2 March 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Mockaitis, K. 2013. Defining genes in the largest genome: Transcript discovery in loblolly pine. Indiana University-Purdue University (IUPUI), Indianapolis IN. 26 April 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Wheeler, N. C. 2013. The pine reference genome sequence and applied tree breeding. Western Gulf Tree Improvement Cooperative Meeting. Idabel OK. 15-16 May 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Mockaitis, K. 2013. Activity and adaptation in two tree genomes: Approaches to gene annotation and intraspecies genic diversity in Theobroma cacao and Pinus taeda. LANGEBIO, CINVESTAV, Irapuato, Mexico. 11 June 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Wheeler, N. C. 2013. The pine reference genome sequence and applied tree breeding. Southern Forest Tree Improvement Conference. Clemson SC. 10-13 June 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Zimin, A. et al. 2013. Assembly of complex genomes. Livestock functional genomics summer school, Ara�atuba, Brazil. 13-21 September 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Mockaitis, K. 2013. Deep transcriptomes of loblolly pine: Building references for functional genomics in the largest genome sequenced to date. Arkansas Center for Plant Powered Production, Jonesborough AR. 16 October 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Wheeler, N. C. 2013. Conifer reference genome sequences and applied tree breeding. Pacific Northwest Tree Improvement Research Cooperative Meeting, World Forestry Center, Portland OR. 24 October 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Mar�ais, G. et al. 2013. The MaSuRCA genome assembler. Poster, Biology of Genomes Conference, Cold Spring Harbor Laboratory, New York. May 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Landscape genomic approaches to understanding plant adaptation to the environment in changing climate. Eidg. Forschungsanstalt fr Wald, Schnee und Landschaft (WSL), Birmensdorf/Zurich, Switzerland. 8-9 February 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Puiu, D. et al. 2013. Sequencing and assembly of the 22-gigabase genome of loblolly pine. Poster, Biology of Genomes Conference, Cold Spring Harbor Laboratory, New York. May 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Genomics-based breeding in forest trees: Are we there yet and why we need a reference genome to finish the job. USDA NIFA and DOE BER Project Director Meeting. Town and Country Resort, San Diego, CA, 13 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Landscape genomics of North American conifers. Adaptive Landscape Genetics current insights and future directions. University of Neuchatel, Switzerland. 7-8 February 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Possibilities for genome sequencing in alpine forest tree species. AForGen Workshop, National Park Adamello-Brenta. Italy. 25-26 June 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Landscape approaches to studying plant adaptation in natural plant populations. New Phytologist Workshop: Ecological and Evolutionary Genomics of Plant Adaptation, Old Townhouse, University of Aberdeen, United Kingdom. 28-29 June 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D. B. 2012. Forest tree genomics: Genome sequencing (de novo and resequencing), marker-based breeding, and landscape genomics. Fronteras en Ecologia y Evolucion Seminarios. Instituto de Ecologia UNAM, Mexico City, Mexico. 24 September 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Genomics-based breeding in forest trees: are we there yet? IUFRO Tree Biotechnology Conference. Porto Seguro, Brazil. 25 June - 2 July 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Stevens, K. A. 2012. PineRefSeq: Experience and challenges in constructing high quality reference genomes for conifers. Noveltree Workshop on genome analysis tools applied to forest tree breeding, Vantaa, Finland. 18 October 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Genotype to phenotype forest tree genomics: Genome sequencing (de novo and resequencing), marker-based breeding and landscape genomics. ICRAF Annual Science Week Meeting, World Agroforestry Centre, Nairobi, Kenya. 12-17 September 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Conifer genome sequencing. Evolutionary origins and development of woody plants. National Evolutionary Synthesis Center. NESCent Catalysis Meeting. Durham, NC. 14-16 October 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Tree biology - CyberInfrastructure, DNA to the globe. IPlant Collaborative. Tree Biology Meeting. Keating 433, U of Arizona, Tucson, AZ. 6 December 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Project presentation - PineRefSeq Project. AdapTree SAB meeting. Forest Science Centre. U of British Columbia, Vancouver, BC Canada. 9 December 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Neale, D., deJong, P. J., Langley, C. H., Salzberg, S. L., Yorke, J. A., Mockaitis, K., Loopstra, C., Main, D., Wegrzyn, J. 2012. Introduction to the PineRefSeq Project. International Plant and Animal Genome XX Conference, Pine Genome Reference Sequence Workshop, San Diego CA. 14 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Langley, C. 2012. A two-pronged approach to the loblolly genome sequence. International Plant and Animal Genome XX Conference, W511, Pine Genome Reference Sequence Workshop, San Diego CA. 14 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Zimin, A., Mar�ais, G., Roberts, M., Salzberg, S. L., Yorke, J. A. 2012. Assembly of huge plant genomes from WGS data. International Plant and Animal Genome XX Conference, W512, Pine Genome Reference Sequence Workshop, San Diego CA. 14 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Mockaitis, K., Loopstra, C., Main, D., Wegrzyn, J. 2012. Gene expression in loblolly pine early development. International Plant and Animal Genome XX Conference, W513, Pine Genome Reference Sequence Workshop, San Diego CA. 14 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Wegrzyn, J. 2012. Integrating genome and transcriptome resources into the TreeGenes database. International Plant and Animal Genome XX Conference, W514, Pine Genome Reference Sequence Workshop, San Diego CA. 14 January 2012
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Forest tree genomics: Genome sequences, marker-based breeding and landscape genomics. Gregor Mendel Institute of Molecular Plant Biology Seminar. Austrian Academy of Science. Vienna, Austria. 27 September 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2011 Citation: Neale, D. B. 2011. Pine sequencing. EFI 2011 Annual Conference Week. Swedish University of Agriculture Sciences, Uppsala, Sweden. 28-30 September 2011
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Langley, C.H., Wegrzyn, J. L., Stevens, K., Zimin, A., Puiu, D., Crepeau, M., Cardeno, C., Koriabine, M., Holtz-Morris, A., Liechty, J., Martinez-Garcia, P. J., Vasquez-Gross, H., Lin, B. Y., Zieve, J. J., Dougherty, M., Fuentes-Soriano, S., Wu, L., Gilbert, D., Mar�ais, G., Roberts, M., Holt, C., Yandell, M., Davis, J. M., Smith, K., Dean, J. F. D., Lorenz, W. W., Whetten, R. W., Sederoff, R., Wheeler, N., McGuire, P., Main, D., Mockaitis, K., Loopstra, C., deJong, P. J., Yorke, J. A., Salzberg, S. L., Neale, D. B. 2014. The loblolly pine genome, v1.0. International Plant and Animal Genome XXII Conference, W558, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Neale, D.B., McGuire, P.E., Wheeler, N.C., Stevens, K.A., Crepeau, M.W., Cardeno, C., Zimin, A.V. Puiu, D., Pertea, G.M., Sezen, U. U., Casola, C., Koralewski, T.E., Paul, R., Gonzalez-Ibeas, D., Zaman, S., Cronn, R., Yandell, M., C. Holt, Langley, C.H., Yorke, J.A., Salzberg, S.L., Wegrzyn, J.L. 2017. The Douglas-fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae. G3: Genes|Genomes|Genetics, 7(9):3157-3167.
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Neale, D.B., Martinez-Garcia, P.J., De La Torre, A.R., Montanari, S., Wei, X. (2017) Novel insights into tree biology and genome evolution as revealed through genomics. Annual Review of Plant Biology, 68(6):1-1.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J.L., Liechty, J., Stevens, K., Loopstra, C., Vasquez-Gross, H., Lin, B., Dougherty, M., Zieve, J., Martinez-Garcia, P. J., Holt, C., Yandell, M., Zimin, Yorke, J. A., Crepeau, M., Puiu, D., Salzerberg, S. L., deJong, P. J., Mockaitis, K., Main, D., Langley, C. H., Neale, D. 2014. Annotation of the loblolly pine megagenome. International Plant and Animal Genome XXII Conference, W559, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Zimin, A., Mar�ais, G., Puiu, D., Stevens, K., Roberts, M., Salzberg, S., Yorke, J. 2015. Moving toward loblolly pine 2.0, improvements of assembly techniques for mega-genomes. International Plant and Animal Genome XXIII Conference, W570, Pine Genome Reference Sequence Workshop, San Diego, CA. 10 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Grau, E., Vasquez-Gross, H., Liechty, J., Zieve, J., Gessler, D., Neale, D., Wegrzyn, J. 2015. TreeGenes: a comprehensive resource for forest tree genomics. International Plant and Animal Genome XXIII Conference, W311, Forest Tree Workshop, San Diego, CA. 11 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Grau, E., Vasquez-Gross, H., Liechty, J., Zieve, J., Gessler, D., Neale, D., Wegrzyn, J. 2015. TreeGenes and CartograTree: tools for forest tree genomics. International Plant and Animal Genome XXIII Conference, C19 Computer Demonstration, San Diego, CA. 13 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Gonzalez-Ibeas, D., Martinez-Garcia, P. J., Famula, R., Loopstra, C. A., Puryear, J., Neale, D. B., Wegrzyn, J. L. 2015. Survey of the sugar pine (Pinus lambertiana) transcriptome by deep sequencing. International Plant and Animal Genome XXIII Conference, Poster P0987, San Diego, CA. 10-14 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Paul, R., Stevens, K. A., Martinez-Garcia, P. J., Zimin, A., Holtz-Morris, A., Yorke, J. A., Koriabine, M., Crepeau, M., Puiu, D., Salzberg, S. L., deJong, P. J., Langley, C. H., Kurugunti, S., Neale, D. B., Wegrzyn, J. L. 2015. Repeat sequence characterization in sugar pine (Pinus lambertiana) and loblolly pine (Pinus taeda). International Plant and Animal Genome XXIII Conference, Poster P0988, San Diego, CA. 10-14 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Stevens, K., Crepeau, M., Puiu, D., Zimin, A., Wegrzyn, J., Koriabine, M., Cardeno, C., Holtz-Morris, A., deJong, P. J., Salzberg, S. L., Yorke, J. A., Langley, C., Neale, D. B. 2015. A reference genome sequence for sugar pine (Pinus lambertiana). International Plant and Animal Genome XXIII Conference, W568, Pine Genome Reference Sequence Workshop, San Diego, CA. 10 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Sezen, U. U., Baker, E. A. G., Falk, T., Maloney, P. E., Vogler, D. R., Delfino-Mix, A., Jensen, C. E., Mitton, J. B., Wright, J., Knaus, B., Rai, H. S., Cronn, R., Gonzalez-Ibeas, D., Vasquez-Gross, H., Famula, R., Liu, J.-J., Kueppers, L. M., Salzberg, S. L., Langley, C. H., Stevens, K., Puiu, D., Zimin, A., Yorke, J. A., Crepeau, M., Loopstra, C., Puryear, J., Neale, D., Wegrzyn, J. 2016. Signatures of selection among de novo assembled transcriptomes of four white pine species. International Plant and Animal Genome XXIV Conference, W740, Population and Conservation Genomics Workshop, San Diego, CA. 9 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Martinez-Garcia, P. J., Gonzalez-Ibeas, D., Famula, R. A., Delfino-Mix, A., Stevens, K. A., Puryear, J. D., Loopstra, C. A., Langley, C. H., Neale, D. B., Wegrzyn, J. L. 2016. A comprehensive study of the sugar pine (Pinus lambertiana) transcriptome implemented through diverse next-generation sequencing approaches. International Plant and Animal Genome XXIV Conference, W321, Forest Tree Workshop, San Diego, CA. 10 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Sezen, U. U., Neale, D. B., McGuire, P. E., Wheeler, N. C., Stevens, K. A., Crepeau, M. W., Cardeno, C., Zimin, A. V., Puiu, D., Pertea, G. M., Sablok, G., Casola, C., Koralewski, T. E., Paul, R., Gonzalez-Ibeas, D., Cronn, R., Yandell, M., Holt, C., Langley, C. H., Yorke, J. A., Salzberg, S. L., Wegrzyn, J. L. 2017. A reference draft genome for Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco). International Plant and Animal Genome XXV Conference, W352, Forest Tree Workshop, San Diego, CA. 15 January 2017
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Martinez-Garcia, P. J., Stevens, K., Crepeau, M., Liechty, J., Cardeno, C., Wegrzyn, J. L., Langley, C. H., Neale, D. 2014. Genetic mapping of scaffolds by whole genome shotgun sequencing. International Plant and Animal Genome XXII Conference, W563, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Neale, D. B., Langley, C. H., Salzberg, S. L., Wegrzyn, J. L. 2013. Open access to tree genomes: The path to a better forest. Genome Biology, 4:120.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Main, D., Ficklin, S. P., Lee, T., Humann, J. L., Cheng, C.-H., Wegrzyn, J., Neale, D. 2015. GenSAS. International Plant and Animal Genome XXIII Conference, W571, Pine Genome Reference Sequence Workshop, San Diego, CA. 10 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wegrzyn, J., Stevens, K., Paul, R., Gonzalez-Ibeas, D., Martinez-Garcia, P. J., Liechty, J., Vasquez-Gross, H., Kuruganti, S., Grau, E., Loopstra, C., Zimin, A., Yorke, J. A., Crepeau, M., Puiu, D., Holt, C., Yandell, M., Salzberg, S. L., deJong, P. J., Mockaitis, K., Main, D., Langley, C. H., Neale, D. Sugar pine annotation. International Plant and Animal Genome XXIII Conference, W569, Pine Genome Reference Sequence Workshop, San Diego, CA. 10 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Zimin, A., Salzberg, S. L., Puiu, D., Stevens, K., Roberts, M., Yorke, J. A. Mar�ais, G., Langley, C. 2014. Assembly improvements to move beyond loblolly pine assembly v1.0. International Plant and Animal Genome XXII Conference, W564, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Ficklin, S. P., Lee T., Humann, J. L., Cheng, C.-H., Wegrzyn, J., Neale, D., Main, D. 2015. GenSAS: A web-based platform for automated and manual curation of genomic sequence. International Plant and Animal Genome XXIII Conference, C03 Computer Demonstration, San Diego, CA. 10 January 2015
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Cronn, R., Dolan, P. C., Jogdeo, S., Wegrzyn, J. L., Neale, D. B., St. Clair, J. B., Denver, D. R. 2017. Transcription through the eye of a needle: Daily and annual cyclic gene expression variation in Douglas-fir needles. BMC Genomics, 18:558.
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Crepeau, M. W., Langley, C. H., Stevens, K. A. 2017. From pine cones to read clouds: Rescaffolding the megagenome of sugar pine (Pinus lambertiana). G3: Genes|Genomes|Genetics, 7:1563-1568.
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Mar�ais Guillaume, James A Yorke, Aleksey V Zimin. 2015. QuorUM: An error corrector for Illumina reads. PLoS ONE, 10(6):e0130821.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Mockaitis, K., Wu, L.-S., Loopstra, C., Fuentes-Soriano, S., Gilbert, D. 2014. Gene expression in conifers revealed on a comprehensive scale: Sequencing, assembly, and classification of the loblolly pine transcriptome. International Plant and Animal Genome XXII Conference, W561, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Stevens, K., Crepeau, M., Martinez-Garcia, P. J., Zimin, A., Puiu, D., Koriabine, M., Cardeno, C., Holtz-Morris, A., deJong, P. J., Salzberg, S. L., Yorke, J. A., Langley, C., Neale, D. 2014. Recent progress in the shotgun sequencing of conifer mega-genomes. International Plant and Animal Genome XXII Conference, W562, Pine Genome Reference Sequence Workshop, San Diego, CA. 11 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Ficklin, S. P., Lee T., Humann, J. L., Cheng, C.-H., Wegrzyn, J., Neale, D., Main, D. 2015. GenSAS: A web-based platform for automated and manual curation of genomic sequence. International Plant and Animal Genome XXIII Conference, Poster P1153, San Diego, CA. 10-14 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Grau, E., Vasquez-Gross, H., Liechty, J., Zieve, J., Gessler, D., Neale, D., Wegrzyn, J. 2015. TreeGenes and CartograTree: tools for forest tree genomics. International Plant and Animal Genome XXIII Conference, Poster P1232, San Diego, CA. 10-14 January 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Zieve, J., Vasquez-Gross, H., Gessler, D., Neale, D., Wegrzyn, J. 2014. Cartogratree: Enabling forest tree genomics through association studies. International Plant and Animal Genome XXII Conference, C11 Computer Demonstration, San Diego, CA. 13 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Chhatre, V. E., Westbrook, J., Echt, C. S., Gomide Neves, L., Martinez-Garcia, P. J., Munoz, P. R., Kirst, M., Peter, G. F., Neale, D., Davis, J. M., Nelson, C. D. 2014. A high-density, consensus linkage map for loblolly pine. International Plant and Animal Genome XXII Conference, Poster P0490, San Diego, CA. 10-14 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J., Zieve, J., Vasquez-Gross, H., Lin, D., Zheng, J., Neale, D. 2014. Bioinformatic solutions in forest genomics: Accessing the TreeGenes database. International Plant and Animal Genome XXII Conference, C13 Computer Demonstration, San Diego, CA. 13 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J., Zieve, J., Vasquez-Gross, H., Lin, D., Zheng, J., Neale, D. 2014. Bioinformatic solutions in forest genomics: Accessing the Treegenes database. International Plant and Animal Genome XXII Conference, Poster P1008, San Diego, CA. 10-14 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Grau, E., Demurjian Jr., S. A., Vasquez-Gross, H., Gessler, D., Staton, M., Jung, S., Feltus, A., Main, D., Ficklin, S. P., Neale, D., Wegrzyn, J. 2016. TreeGenes: Enabling visualization and analysis in forest tree genomics. International Plant and Animal Genome XXIV Conference, W970, Tripal Database Network and Initiatives Workshop, San Diego, CA. 10 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Zieve, J., Vasquez-Gross, H., Gessler, D., Neale, D., Wegrzyn, J. 2014. Cartogratree: Enabling forest tree genomics through association studies. International Plant and Animal Genome XXII Conference, Poster P1049, San Diego, CA. 10-14 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J., Liechty, J., Stevens, K., Loopstra, C., Vasquez-Gross, H., Lin, B., Zieve, J., Martinez-Garcia, P. J., Holt, C., Yandell, M., Zimin, Yorke, J. A., Crepeau, M., Puiu, D., Salzberg, S. L., deJong, P. J., Mockaitis, K., Main, D., Langley, C. H., Neale, D. 2014. Introns, ancient transposable elements, and gene families revealed in the annotation of the loblolly pine (Pinus taeda L.) megagenome. International Plant and Animal Genome XXII Conference, W512, Next Generation Genome Annotation and Analysis Workshop, San Diego, CA. 11 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Paul, R., Stevens, K. A., Gonzalez-Ibeas, D., Pratt, K., Zimin, A., Yorke, J. A., Holtz-Morris, A., Koriabine, M., Crepeau, M., Puiu, D., Salzberg, S. L., deJong, P. J., Langley, C. H., Neale, D. B., Wegrzyn, J. L. 2016. Elucidation of transposable elements in conifers and their effect on conifer evolution. International Plant and Animal Genome XXIV Conference, W959, Transposable Elements Workshop, San Diego, CA. 10 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Humann, J. L., Ficklin, S. P., Lee, T., Cheng, C.-H., Jung, S., Wegrzyn, J., Neale, D., Main, D. 2016. GenSAS v4.0: An easy-to-use, web-based platform for structural and functional genome annotation and Curation. International Plant and Animal Genome XXIV Conference, Poster 0364, San Diego, CA. 8-13 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Demurjian Jr., S. A., Grau, E., Vasquez-Gross, H., Gessler, D., Neale, D., Wegrzyn, J. 2016. TreeGenes and CartograTree: Community resources for forest tree genomics. International Plant and Animal Genome XXIV Conference, Poster 0383, San Diego, CA. 8-13 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Humann, J. L., Ficklin, S. P., Lee, T., Cheng, C.-H., Jung, S., Wegrzyn, J., Neale, D., Main, D. 2016. GenSAS v4.0: A web-based platform for structural and functional genome annotation and curation. International Plant and Animal Genome XXIV Conference, C25 Computer Demonstration, San Diego, CA. 13 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Demurjian Jr., S. A., Grau, E., Vasquez-Gross, H., Gessler, D., Neale, D., Wegrzyn, J. 2016. TreeGenes and CartograTree: Community resources for forest tree genomics. International Plant and Animal Genome XXIV Conference, C07 Computer Demonstration, San Diego, CA. 9 January 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Wegrzyn, J. L., et al. 2016. An overview of TreeGenes and CartograTree. US Forest Service: Gene conservation of tree species, Chicago IL. May 2016.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Wegrzyn, J. L., et al. 2016. Complex conifer gene families and their evolution. ProCoGen: Conifer Genomics Conference, Orleans, France. December 2016
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wegrzyn, J. L. 2015. The PineRefSeq Project. IUFRO Tree Biotechnology Conference in Florence, Italy. 8-12 June 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Neale, D. B. 2015. Comparative genomics of conifers. 35th New Phytologist Symposium: The genome of forest trees, Arnold Arboretum of Harvard University, Boston MA. 16-17 June 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wegrzyn, J. L. 2015. Computational tools and resources for comparative tree genomics. 35th New Phytologist Symposium: The genome of forest trees, Arnold Arboretum of Harvard University, Boston MA. 16-17 June 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wheeler, N. C. 2015. Conifer reference genome sequencing. Western Forest Genetics Association and NW Seed Orchard Managers Association 2015 Joint Annual Meeting, Seattle WA. 23-24 June 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wegrzyn, J. L. 2015. Computational tools and resources for comparative tree genomics. 3rd Conifer Genome Sequencing Summit, Gysinge, Sweden. 28-30 September 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Neale, D. B. 2015. The PineRefSeq Project. National Convention of the Society of American Foresters, Baton Rouge LA. 3-7 November 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Wheeler, N. C. 2015. Conifer reference genome sequencing. National Convention of the Society of American Foresters, Baton Rouge LA. 3-7 November 2015
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Fuentes-Soriano, S., Loopstra, C., Mockaitis, K. 2014. Phylogenetic diversity of the MADS-box gene family in loblolly pine. International Plant and Animal Genome XXII Conference, Poster P0492, San Diego, CA. 10-14 January 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Mockaitis, K. 2014. References for gene expression in loblolly pine. Conifer Genome Summit, For�t Montmorency, Qu�bec, Canada. 17 June 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Neale, D. B. 2014. Genome sequencing in conifers: Implications for breeding and gene resource management. IUFRO Forest Tree Breeding Conference, Prague, Czech Republic. August 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Neale, D. B. 2014. Genome sequencing in conifers: Implications for breeding and gene resource management. Five Needle Pines IUFRO Meeting, Ft. Collins, CO. June 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J. L. 2014. White pine transcriptomes. Five Needle Pines IUFRO Meeting, Ft. Collins, CO. June 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Neale, D. B. 2014. Genome sequencing in conifers: Implications for breeding and gene resource management. Third Workshop of the AForGen Network, Falfleralp, Switzerland. June 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J. L. 2014. The loblolly pine genome. Seminar, University of Connecticut, Storrs CT. October 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Mockaitis, K. 2014. Genomic sources of chemical defenses in trees from pine to avocado: New references for understanding large enzyme family conservation and diversification. Academic and Technical Workshop on Xyleborus glabratus and Euwallacea sp., INECOL, Xalapa, Mexico. 6 November 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J. L. 2014. The PineRefSeq Project. Seminar, Virginia Commonwealth University, Richmond VA. November 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Neale, D. B. 2014. Genome sequencing in conifers: Implications for breeding and gene resource management. Pinus radiata genome meeting, SCION, New Zealand. November 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Wegrzyn, J. L. 2014. The PineRefSeq Project. Pinus radiata genome meeting, SCION, New Zealand. November 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Mockaitis, K., Wu, L.-S., Fuentes-Soriano, S., Loopstra, C., Gilbert, D. 2014. Gene expression in Pinus taeda, the largest reference genome to date: Sequencing, assembly, and classification of the loblolly pine transcriptome. Poster, Advances in Genome Biology and Technology, Marco Island, FL. 12-16 February 2014
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Loblolly Pine Genome Project. USDA, National Institute of Food and Agriculture. AFRI Plant Genome, Genetics and Breeding Project Director Meeting. Town & Country Hotel, San Diego CA. 11 January 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Pine Genome Sequencing Project. ProCoGen Training Workshop, Genome sequencing and gene discovery. Ume�, Sweden. 30 January - 1 February 2013
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Neale, D. B. 2013. Genomics of forest trees: Genome sequencing, marker-assisted breeding, and landscape genomics. INIA Madrid, Spain. 5 February 2013


Progress 02/01/16 to 01/31/17

Outputs
Target Audience:Genomics researchers (via publications and a Workshop at the Annual International Plant and Animal Genome Conference) and tree breeders and researchers (via presentations at research meetings and invited talks). Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?To date, the Project has offered training opportunities for high school students, undergraduates, graduate students, technical staff, postdoctoral scholars, and professionals or advanced researchers. In some instances, the Project provided salary and benefits for the trainees and in others, the trainees were funded by other projects. In the first category - Project supported, there have been 33 undergraduate students, two graduate student, seven postdoctoral scholars, and 21 technical staff. In the second category, there have been one high school student, five undergraduate students, four graduate students, five postdoctoral scholars, and one professional/researcher. How have the results been disseminated to communities of interest?Throughout this reporting year, the Project Director and co-Project Directors made eight oral and poster presentations to national and international conferences about various aspects of the Project. Two research publications were published in this reporting year, with a third submitted and under review, and a fourth almost ready to be submitted. What do you plan to do during the next reporting period to accomplish the goals? The project applied for and was granted a seventh year, a second no-cost extension. The primary work targets are to bring to completion the reference genome sequences for loblolly pine, sugar pine, and Douglas-fir: A. Respond to reviewers and ultimately review galley proofs for the loblolly pine V2.0 genome paper, submitted in 2016. B. Improve scaffold contiguity for the loblolly pine assembly by completing the utilization of PacBio Iso-Seq transcriptome resources to generate V2.01. Subsequently the RNA-Seq resources developed by the project will be assembled directly onto the V2.01 assembly with the new StringTie assembly software (https://ccb.jhu.edu/software/stringtie/), improving the gene annotation significantly and generating assembly V2.5. Then Haplotyping-by-Sequencing bioinformatics and analysis will be used to genetically map all scaffolds. This will be released as V3.0. C. Complete bioinformatics and analysis for sugar pine V1.5 genome based on 10X Genomics sequence. Submit for publication. D. Respond to reviewers and ultimately review galley proofs for the Douglas fir V1.0 genome paper, submitted at the end of 2016. E. Complete Douglas-fir genome V1.5 based on scaffolding with an updated transcriptome from Oregon State University.

Impacts
What was accomplished under these goals? The accomplishments of year 6 are presented below, organized by research aims Specific Aim 1 --High-quality reference genome sequences of loblolly pine, sugar pine, and Douglas-fir Specific Aim 2 --Transcriptome sequencing for gene discovery, reference building, and aids to genome assembly Loblolly pine (Pinus taeda) An improved assembly (V2.0) of loblolly pine was completed and released, based primarily on an additional 11X very-long-read whole genome shotgun (WGS) sequence coverage from the Pacific Biosciences (PacBio) sequencing platform. Thus, the assembly is a hybrid PacBio/Illumina assembly and takes advantage of better super-reads from additional paired-end loblolly pine sequencing to error correct the PacBio reads. A manuscript was submitted (Zimin et al. 2016. GigaScience). The V2.0 assembly has facilitated computation of single-nucleotide polymorphism (SNP) in loblolly pine. Whole-genome resequencing data (with the coverage of ~11X genome equivalents) was generated from ten different individual trees (as described below under the `All three species' heading). These data (about 800 million reads per tree) were aligned to the loblolly pine V2.0 assembly. The initial result is a very large set of SNPs which will be filtered to generate a subset for use in a SNP chip for application in genotyping by end users. With loblolly pine V2.0 complete, haplotyping-by-sequencing (HBS) is again underway with additional sequencing with the objective to improve the number of genetically mapped scaffolds over the pilot HBS. Alignment of reads from over 700 megagametophytes to the loblolly pine V2.0 assembly is nearing completion. We have concluded a comparison of two SNP callers for use on haploid conifer data. SNPs will be converted to markers later this year. To further increase the size of the map, we are evaluating the replacement of Joinmap, the proprietary Windows program used in the pilot, with MSTmap, a more efficient command-line tool capable of operating on larger datasets. PacBio Iso-Seq transcriptome resources generated by collaborator Ross Whetten were assembled and annotated by the PineRefSeq team. These resources were used to further scaffold the V2.0 assembly to generate V2.01. This version is currently being annotated with both new and previously generated transcriptomic resources. Sugar pine (P. lambertiana) The initial WGS assembly (V1.0, a MaSuRCA-SOAPdenovo2 assembly) of the sugar pine genome has been completed and released along with the de novo sugar pine transcriptome assemblies. One publication on this assembly has been submitted (Stevens et al. 2016. Genetics) and one on the transcriptome has been accepted (Gonzalez-Ibeas et al. 2016. G3). A rescaffolding of sugar pine was performed using barcoded sub-haploid pools of high molecular weight DNA using the 10X Genomics protocol. The new assembly, V1.5, has a scaffold N50 exceeding 1Mbp. The additional contiguity has confirmed the DiTag links to discover the Cr1 gene candidate presented in the Stevens et al. 2016 manuscript submitted to Genetics. The re-scaffolding of the genome has increased the number of bases genetically linked to Cr1 from ~2 Mbp to ~6 Mbp, and also the number of gene candidates. A second manuscript is currently in preparation. Douglas-fir (Pseudotsuga menziesii) A MaSuRCA-SOAPdenovo2 assembly (V1.0) of the Douglas-fir genome from WGS sequencing on the Illumina HiSeq platform at 60X coverage has been completed and released. Transcriptome resources from needle tissue were generated by collaborator Richard Cronn and assembled by the PineRefSeq team for scaffolding and annotation. A publication focused on the assembly as well as comparative genomics across all conifer genomes will be submitted before the end of 2016 (Neale et al. 2016. G3). All three species With the objective of diversity characterization, full whole-genome resequencing on the HiSeq platform of ten trees of each species has been finished. Raw reads were used to call SNPs in the ten individuals distributed across the range of each species. SNP discovery is completed for loblolly pine and Douglas fir and filtering is underway for loblolly pine as described above. SNP discovery in sugar pine is in the pipeline. In the meantime, we have dissected megagametophytes (seeds) and needles for all individuals in the three species. A total of 14,400 megagametophytes were dissected and pulled together in groups of ten individuals sharing a mother tree, which resulted in 1440 families. In addition, we have extracted DNA from needle tissue from other samples, which resulted in a total of 2,286 individual samples for sugar pine. This material is currently being put on plates for genotyping. In loblolly pine, DNA from a total of 567 individuals that represent both the association and the QTL populations has been extracted. In Douglas fir, DNA from 600 individuals was extracted. DNA samples were sent for genotyping at the end of 2016. Specific Aim 3 --Dendrome and TreeGenes databases: Annotation, data integration, and distribution -- applying and expanding existing pipelines to deliver a comprehensive SNP resource. The project website (http://www.pinegenome.org/pinerefseq/) links to the following resources currently available at TreeGenes: Sugar pine assembly V1.0, gene models, repeat annotation, and tissue-specific and reference transcriptomes (FTP download, BLAST, and genome browser access) Douglas-fir assembly V1.0, gene models, repeat annotation, and needle reference transcriptome (FTP download, BLAST, and genome browser access) Loblolly pine assembly V2.0, tissue-specific and reference transcriptomes (FTP download, BLAST, and genome browser access) GenSAS is an online pipeline (currently running at v4.0,https://gensas.bioinfo.wsu.edu/) for whole genome structural annotation that has been developed and evaluated with genome sequence from the project.

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Gonzalez-Ibeas D., P.J. Martinez-Garcia, R.A. Famula, A. Delfino-Mix, K.A. Stevens, C.A. Loopstra, C.H. Langley, D.B. Neale,J.L. Wegrzyn (2016) Assessing the gene content of the megagenome: Sugar pine (Pinus lambertiana). G3: Genes|Genomes|Genetics 6:3787-3802
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Stevens K.A., J. Wegrzyn, A. Zimin, D. Puiu, M. Crepeau, C. Cardeno, R. Paul, D. Gonzalez-Ibeaz, M. Koriabine, A. Holtz-Morris, P. Martinez-Garcia, U. Sezen, G. Marcais, K. Jermstad, P. McGuire, C.A. Loopstra, J. M. Davis, A. Eckert, P. deJong, J.A. Yorke, S.L. Salzberg, D.B. Neale, C.H. Langley (2016) Sequence of the sugar pine megagenome. Genetics 204(3):1613-1626.
  • Type: Journal Articles Status: Submitted Year Published: 2017 Citation: Zimin AV, Stevens KA, Crepeau MW, Puiu D, Wegrzyn JL, Yorke JA, Langley CH, Neale DB, Salzberg SL. 2017. An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. GigaScience (submitted and under revision).


Progress 02/01/15 to 01/31/16

Outputs
Target Audience:Genomics researchers (via publications and a Workshop at the Annual International Plant and Animal Genome Conference) and tree breeders and researchers (via presentations at research meetings and invited talks). Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?To date, the Project has offered training opportunities for high school students, undergraduates, graduate students, technical staff, postdoctoral scholars, and professionals or advanced researchers. In some instances, the Project provided salary and benefits for the trainees and in others, the trainees were funded by other projects. In the first category - Project supported, there have been 27 undergraduate students, one graduate student, seven postdoctoral scholars, and 21 technical staff. In the second category, there have been one high school student, five undergraduate students, four graduate students, five postdoctoral scholars, and one professional/researcher. How have the results been disseminated to communities of interest?The Project organized a public workshop `Recreating Forestry through Science' at the 2015 National Convention of the Society of American Foresters, Nov. 3-7, 2015 in Baton Rouge, LA at which progress of the Project was presented. In addition, throughout this reporting year, the Project Director and co-Project Directors made five presentations about various aspects of the Project. Two research publications have been published to date, with two almost ready to be submitted. What do you plan to do during the next reporting period to accomplish the goals? The project applied for and was granted a sixth year, a no-cost extension. The primary work targets are completion of analyses and preparation and submission of papers reporting the results: 1. Complete publication of sugar pine v1.0 genome and transcriptome papers. Work to respond to reviewers and review galley proofs will be needed in early months of 2016. 2. Complete the final annotation and publication of Douglas-fir V1.0 genome. Work for drafting and submitting a manuscript for publication will be needed in 2016. 3. Complete the assembly of loblolly pine v2.0 genome, annotation, and prepare their publication. Work for drafting and submitting a manuscript for publication will be needed in 2016. 4. Complete the resequencing and polymorphism discovery efforts in loblolly pine, sugar pine, and Douglas-fir. 5. Complete the re-analysis of loblolly pine, sugar pine, and Douglas-fir GWAS studies to demonstrate the utility of reference genome sequences. 6. Compilation and dissemination of the results of a survey undertaken in 2015 on how the approaches and products of this project have been used in the scientific community.

Impacts
What was accomplished under these goals? The accomplishments of year 5 are presented below, organized by these research aims: High-quality reference genome and transcriptome sequencing; Annotation, data integration, and distribution; and Outreach and training. High-quality reference genome and transcriptome sequencing Loblolly pine (Pinus taeda) Steps toward an improved assembly (to be v2.0) of loblolly pine were undertaken, based primarily on an additional 11X very-long-read whole genome shotgun (WGS) sequence coverage from the Pacific Biosciences (PacBio) sequencing platform. The assembly will be a hybrid PacBio/Illumina assembly and takes advantage of better super-reads from additional paired-end loblolly pine sequencing to error correct the PacBio reads. All sequencing has been completed. The assembly is in progress, but is very computation intensive. Results are not expected until 2016. Haplotyping-by-sequencing (HBS) has been conducted for loblolly pine which will produce a high-density genetic map for the species and be used to improve the loblolly pine v1.0 and/or v2.0 assemblies. Sugar pine (P. lambertiana) The initial WGS assembly (v1.0, a MaSuRCA-SOAPdenovo2 assembly) of the sugar pine genome has been completed and released along with de novo sugar pine transcriptome assemblies. Two publications, one on this assembly and one on the transcriptome are in progress with a submission target in the first quarter of 2016. HBS is underway for sugar pine which will produce a high-density genetic map for the species and be used to improve the sugar pine v1.0 assembly. Douglas-fir (Pseudotsuga menziesii) A MaSuRCA-SOAPdenovo2 assembly (v1.0) of the Douglas-fir genome from WGS sequencing on the Illumina HiSeq platform at 60x coverage has been produced. All three species Polymorphism discovery within each target species has been initiated by full resequencing of 10 trees of each species on the HiSeq platform. SNP discovery analysis is proceeding, with emphasis on the Douglas-fir genome. Comparative genome analysis among these three genome has begun. Annotation, data integration, and distribution The project website (http://www.pinegenome.org/pinerefseq/) has made publicly available the following resources: Sugar pine assembly v1.0, gene models, annotation, and transcriptome Douglas-fir draft assembly v0.5 Loblolly pine assembly v1.01, gene models, annotation, and transcriptome GenSAS is an online public pipeline for whole genome structural annotation that has been developed and evaluated with genome sequence from the project. Version 4.0 was completed, tested, and made live during this reporting period. Outreach and training The principal means of outreach for the project are by invited and conference presentations by project participants of research results at several venues and by publications presenting research results. The educational Powerpoint presentations on the PineRefSeq project, located at the Project website and at eXtension websites, continues to be viewed. This presentation `Module 17. Reference Genome Sequencing', is also available via youtube at https://www.youtube.com/watch?v=sH7Yiyymfc4 and has been viewed there about once a week over the past year. In the past year, a survey and evaluation of how the approaches and products of this project have been used in the scientific community and whether these genomic resources have fed directly into any of the nation's applied tree improvement and research cooperatives was undertaken. Results will be made available in a report in 2016.

Publications

  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Westbrook Jared W., Vikram E. Chhatre, Le-Shin Wu, Srikar Chamala, Leandro Gomide Neves, Pedro J. Mart�nez-Garc�a, David B. Neale, Matias Kirst, Keithanne Mockaitis, C. Dana Nelson, Gary F. Peter, John M. Davis, Craig S. Echt. 2015. A consensus genetic map for Pinus taeda L. and P. elliotii and extent of linkage disequilibrium in two genotype-phenotype discovery populations of P. taeda. G3: Genes|Genomes|Genetics 5(6):1-42.
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Wheeler, Nicholas C., Kim C. Steiner, Scott E. Schlarbaum, David B. Neale. 2015. The evolution of forest genetic and tree improvement research in the United States. Journal of Forestry 113(5):500 -510.


Progress 02/01/14 to 01/31/15

Outputs
Target Audience: Genomics researchers (via publications and the Annual Workshop at Plant and Animal Genome) and tree breeders and researchers (via presentations at research meetings and invited talks). Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? To date, the Project has offered training opportunities for high school students, undergraduates, graduate students, technical staff, postdoctoral scholars, and professionals or advanced researchers. In some instances, the Project provided salary and benefits for the trainees and in others, the trainees were funded by other projects. In the first category -Project supported, there have been 24 undergraduate students, one graduate student, seven postdoctoral scholars, and 20 technical staff. In the second category, there have been one high school student, five undergraduate students, four graduate students, five postdoctoral scholars, and one professional/researcher. Training is also achieved by courses conducted by or contributed to by Project personnel. At Washington State University, Hort 503: Bioinformatics for Research is offered by DS Main, Project co-PD. In this course, 14 graduate and undergraduate students were trained on how to do genome annotation and curation using GenSAS 2.0, the development of which has been supported by this Project. How have the results been disseminated to communities of interest? The Project will hold a public workshop at the XXIII International Plant and Animal Genome Conference, Jan. 10, 2015 to present Project results. In addition, throughout this reporting year, the Project Director and co-Project Directors made a total of 18 presentations about various aspects of the Project. Seven research publications have been published to date, with two currently in review. What do you plan to do during the next reporting period to accomplish the goals? There are six targets toward which the teams will be directed in year 5: 1. An improved assembly (v2.0) of loblolly pine (Pinus taeda) based on additional linking and DiTag libraries and information from the genetic map produced from haplotyping-by-sequencing (HBS), and validation of assembly from independent assembly from the PacBio platform. 2. Completion of the initial assembly (v1.0, the MaSuRCA-SOAPdenovo assembly) of sugar pine (P. lambertiana); completion of de novo transcriptome assemblies. 3. Assembly (v2.0) of sugar pine (the MaSuRCA assembly) improved with availability of additional DiTag libraries and the completed HBS and enhanced by another round of annotation. The HBS will take advantage of 1200 megagametophytes from the sequenced mother tree derived from full sib matings. 4. Initial assembly (v1.0) of Douglas-fir (Pseudotsuga menziesii) of nearly complete sequencing on the Illumina HiSeq platform at 60x coverage, also informed by publicly available transcriptome data. 5. Comparative genome analysis, which will begin with development of computational tools followed by analysis using them. 6. Polymorphism discovery within each target species by full resequencing 20 trees of each species on the HiSeq platform. In our approach to each target there may be need for ongoing R&D and we will also be adaptive to technology changes. We plan for public release of the data produced, development of completed manuscripts for submission to refereed journals, and public outreach extending information about the utility and availability of the accomplishments.

Impacts
What was accomplished under these goals? The accomplishments of year 4 are presented for each project team: Cloning/Library Group, Sequencing Group, Assembly Group, Transcriptome and Annotation Groups, and Education/Extension Group This year the Cloning/Library Group primarily focused on generating additional fosmid DiTag libraries for the whole genome shotgun assembly of the 31Gb genome of sugar pine (version 1.0). A two-pronged strategy was used. A set of nick translation DiTag libraries, similar to what was described in Neale et. al. 2014 for loblolly pine was created. In addition, a developmental "TH DiTag" protocol was used to create two new libraries with the prospect of higher complexity. These three libraries were delivered to the Sequencing group and sequenced to practical limits given time and complexity constraints. The library group has also constructed and delivered a set of nick translation libraries for loblolly pine (to be used in v2.0). Finally, we are in-progress on a "TH" library for loblolly. The Sequencing Group contributed to a collaborative manuscript describing and analyzing the successful whole genome shotgun strategy used to create the 22Gb loblolly pine v1.01 assembly (Zimin et al. 2014). That WGS strategy is now well underway for sugar pine and Douglas-fir. We have constructed and sequenced all short insert and mate pair libraries for the 31Gb genome of sugar pine. In Douglas-fir, we have constructed and sequenced all short insert libraries to 60x coverage, and are well on our way to finishing long insert library construction and sequencing. We have also sequenced a fosmid pool for each of loblolly pine and sugar pine using the PacBio platform. These have been assembled for QC. The next release of the loblolly reference sequence will be augmented by additional long insert (up to 13kbp) and DiTag (35-40kbp) libraries. Light sequencing over 600 megagametophytes from the sequenced tree (20-1010) will form the foundation for a genetic map locating scaffolds along the chromosomes. The conifer genome assemblies provided by the Assembly Group have yielded a resource for the entire scientific community, particularly scientists who work on pines. The techniques developed for assembly of large genomes are broadly applicable to any very large genome, and should have an impact on projects across a wide range of species. We have done a preliminary evaluation of the PacBio platform by sequencing with it four pools of loblolly pine fosmids (48 non-overlapping fosmids/pool) and considering their possible impact on assembly. The focus of this analysis was to check the correctness of the initial (published) assembly of loblolly pine. The contigs assembled from the PacBio sequences were aligned with the published WGS assembly. Because the WGS assembly was based on a haploid genome, and the fosmids were generated from diploid DNA, we expected only 50% of the fosmids to derive from the same haplotype. Our results support an interpretation that the assembly disagreements in half of the fosmids are a consequence of haplotype differences. As a result of our evaluation, we conclude the rate of WGS assembly errors are at most 30 per 2.95 Mbp (half the total assembled length), or 1 error per 98.5 Kbp. With respect to haplotypes, we observed five near-full-length fosmids with 8 to 11 assembly differences, including multi-kilobase insertions, indicating considerable divergence between the haplotypes. This finding highlights the value of our strategy of using haploid DNA as the basis of our primary assembly. The Annotation & Transcriptome Groups reported the following: · Loblolly pine transcriptome Expanded and improved RNA assemblies of loblolly pine are providing the foundation for building gene annotations that are more complete and more accurate. These will serve both more detailed genomic studies in loblolly and comparative genomics studies across conifers. Our studies of large and diversified gene families continued to provide biologically contextualized validation of transcript assemblies and select curated gene annotations. Use of results from our analyses will aid ecological investigations and breeding programs that address diversity in genes within the species. Among these in current studies are genes responsible for generating secondary metabolic products of pine, most notably relevant to tree interactions with insects, wood quality and oils of commercial interest. Use of transcript references in collaborative work with pine geneticists is already leading to increased success in association and QTL mapping for genetic selection of important traits of interest in US loblolly populations. Due to its size and complexity, aspects of our pine transcriptome reference project work have served as excellent examples in training in both genomics and computing for all educational levels at the university and as an outreach tool. In addition to post doctoral training that was funded by PineRefSeq, unfunded students and other (especially other USDA funded) collaborators have benefitted from demonstrations and discussion of pine research at IU. · Sugar pine transcriptome o Successful sequencing and assembly of six libraries in three size selections using the new PacBio IsoSeq approach o Sequencing and assembly of six additional libraries with MiSeq o Full structural and functional annotations on all transcripts assembled to date o Set of annotated and trimmed transcripts available for scaffolding the sugar pine genome · Repeat library construction o Re-development of de novo repeat identification pipeline using RepeatModeler o Preliminary analysis of fosmid libraries with homology approaches through Repbase/RepeatMasker o Preliminary analysis of de novo interspersed content in sugar pine and Douglas-fir fosmids o Analysis of tandem repeat content in the sugar pine genome, sugar pine fosmids, and Douglas-fir fosmids with the previously developed TRF pipeline. o Re-analysis of the loblolly pine BAC/fosmids and sampled genome-based library using the RepeatModeler approach to improve comparative capacity. · Loblolly pine gene annotations o Updated version of the gene models released for loblolly pine genome 1.01. · GenSAS o In year 4, we implemented a major revision to the user interface of GenSAS as well as the underlying data storage as a result of feedback from users and to improve performance. One of the major improvements was integration of WebApollo which is a commonly used manual annotation tool that uses JBrowse for visualization, a commonly used genome browser. We have established a positive collaborative relationship with WebApollo developers and the integration of our two tools should help with the mutual longevity and adoption of both. As a result of these changes, GenSAS is now more user-friendly and better supports larger genome assemblies. o GenSAS is designed such that output files conform to community standards, and predicted features are named to avoid ambiguity. Therefore, final output files can be immediately ported to community genome database with little effort. Curators need only focus on annotations as GenSAS prepares output ready for publishing within the community database or NCBI in formats already properly prepared. GenSAS output can easily be imported into Tripal (http://tripal.info), an open-source tool for construction of online genome databases. Thus GenSAS combined with Tripal will allow for any group to annotate genomic sequence and more easily share those annotations publicly. The educational Powerpoint presentations on the PineRefSeq project, located at the Project website and at eXtension websites, continue to be viewed. The eXtension page hosting the learning module, completed earlier in the project, has been viewed an average of 12.6 times per week since it was launched 85 weeks ago.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Zimin Aleksey, Kristian A Stevens, Marc Crepeau, Ann Holtz-Morris, Maxim Koriabine, Guillaume Mar�ais, Daniela Puiu, Michael Roberts, Jill L Wegrzyn, Pieter J de Jong, David B Neale, Steven L Salzberg, James A Yorke, and Charles H Langley. Sequencing and assembling the 22-Gb loblolly pine genome Genetics March 2014 196:875-890; doi: 10.1534/genetics.113.159715 http://www.genetics.org/content/196/3/875.
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Wegrzyn Jill L, John D Liechty, Kristian A Stevens, Le-Shin Wu, Carol A Loopstra, Hans Vasquez-Gross, William M Dougherty, Brian Y Lin, Jacob J Zieve, Pedro J Mart�nez-Garc�a, Carson Holt, Mark Yandell, Aleksey Zimin, James A Yorke, Marc Crepeau, Daniela Puiu, Steven L Salzberg, Pieter de Jong, Keithanne Mockaitis, Doreen Main, Charles H. Langley, David B Neale. Expansion of introns, ancient transposable elements and gene families revealed in the annotation of the loblolly pine (Pinus taeda L.) megagenome. Genetics March 2014 196: 891-909; doi: 10.1534/genetics.113.159996 http://www.genetics.org/content/196/3/891
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Neale DB, JL Wegrzyn, KA Stevens, A Zimin, D Puiu, M Crepeau, C Cardeno, M Koriabine, A Holtz-Morris, JD Liechty, PJ Mart�nez-Garc�a, HA Vasquez-Gross, BY Lin, JJ Zieve, WM Dougherty, S Fuentes-Soriano, L Wu, D Gilbert, G Mar�ais, M Roberts, C Holt, M Yandell, JM Davis, K Smith, JFD Dean, WW Lorenz, RW Whetten, R Sederoff, N Wheeler, PE McGuire, D Main, CA Loopstra, K Mockaitis, P de Jong, JA Yorke, SL Salzberg, and CH Langley. The megagenome of loblolly pine (Pinus taeda L.). Genome Biology 2014, 15:R59 http://genomebiology.com/2014/15/3/R59
  • Type: Journal Articles Status: Under Review Year Published: 2014 Citation: Westbrook Jared W., Vikram E. Chhatre, Le-Shin Wu, Srikar Chamala, Leandro Gomide Neves, Pedro J. Mart�nez-Garc�a, David B. Neale, Matias Kirst, C. Dana Nelson, Keithanne Mockaitis, Gary F. Peter, John M. Davis, Craig S. Echt. An annotated consensus genetic map for Pinus taeda L. and extent of linkage disequilibrium in three genotype-phenotype discovery populations. http://biorxiv.org/content/early/2014/12/12/012625
  • Type: Journal Articles Status: Submitted Year Published: 2014 Citation: Wheeler, Nicholas C., Kim C. Steiner, Scott E. Schlarbaum, David B. Neale. The evolution of forest genetic and tree improvement research in the United States. Submitted for review to Journal of Forestry.


Progress 02/01/13 to 01/31/14

Outputs
Target Audience: Genomics researchers (via publications and the Annual Workshop at Plant and Animal Genome) and tree breeders and researchers (via presentations at regional tree breeding cooperatives). Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? To date, the Project has offered training opportunities for high school students, undergraduates, graduate students, technical staff, and postdoctoral scholars. In some cases, the Project provided salary and benefits for the trainees and in other cases, the trainees were funded by other projects. In the first category, there have been 17 undergraduate students, one graduate student, four postdoctoral scholars, and 19 technical staff. In the second category, there have been one high school student, two undergraduate students, three graduate students, and five postdoctoral scholars. How have the results been disseminated to communities of interest? The Project held a public workshop at the XXII International Plant and Animal Genome Conference, Jan. 11, 2014 to present Project results. In addition, throughout this reporting year, the Project Director and co-Project Directors made a total of 21 presentations about various aspects of the Project. Five research publications have been published to date, with two more in press. What do you plan to do during the next reporting period to accomplish the goals? Fosmid-derived “36-kb” jumping clones (diTags) at about 60- to 100-fold genome coverage in pairs will be created for sugar pine and for Douglas-fir. We will catalog and annotate the loblolly pine fosmid pools and explore selective isolation of specific fosmids from the pools to facilitate gap-closure and conflict resolution for improving the loblolly pine assembly. Sequencing efforts in year four will support will support three major future deliverables: 1) loblolly pine version 2.0 with both an improved reference and scaffolds assigned to chromosomes; 2) whole genome shotgun sequencing and assembly of sugar pine; 3) whole genome shotgun sequencing and assembly of Douglas-fir. For the sugar pine whole-genome assembly, multiple next-generation assemblers, primarily MaSuRCA and SOAPdenovo, and other methods as needed, will be used to produce an initial assembly. The full whole-genome data set will be available in early 2014 and assembly work will commence immediately after. If the WGS data for Douglas-fir are completed in year 4, we will begin assembly of that species as well. We also plan to produce assemblies of the several hundred loblolly pine fosmid pools, each containing over 4,000 fosmids. The transcriptome work will include sequencing, assembly, and analysis of loblolly pine RNAs in year 4 to survey gene expression in new types of samples and to refine current determinations of tissue/condition specificities of gene expression, including alternately spliced transcripts. For sugar pine, additional RNAs from differing conditions will be produced. A comprehensive transcriptome for sugar pine will be generated. The loblolly pine annotation will be completed and the annotation of the sugar pine genome assembly will be initiated. All of the Project's genome, transcriptome, and annotation data will be distributed through the TreeGenes database. TreeGenes provides genome and annotation information via three formats: GenSAS, GBrowse, and WebApollo. To date, GenSAS provides tools for genome analysis such as gene prediction, repeat finding, and identification of other non-coding features, such as SSRs and tRNA and in year 4, infrastructure for functional annotation will be added along with a functional curation tool and a redesigned more-intuitive interface. Outreach presentations will continue along with Project publications as milestones are reached.

Impacts
What was accomplished under these goals? We have assembled an efficient pipeline to process fosmid clones into high quality NextGen sequence libraries compatible with genome assembly. We have developed a successful whole genome shotgun (WGS) strategy used to create the loblolly pine V1.01 assembly and a manuscript detailing this is in press in Genetics. This year we have transitioned to the HiSeq 2500 as our primary sequencing platform. Our successful WGS strategy is now well underway for sugar pine and Douglas-fir using the new platform. The loblolly pine genome sequence assembly version 1.01 from the WGS data is currently the best, most contiguous and complete large conifer assembly to date. The MaSuRCA assembler was significantly improved and used to assemble the genome. The assembler has been published and released to public under the open source GPL license. The primary complete transcripts of the first loblolly pine transcriptome reference we had generated and classified were used in the current genome annotation to identify over 50,000 expressed loci in version 1.01 of the genome assembly. A publication documenting this annotation is in press in Genetics. The transcriptome references we have provided this year are of the highest quality, very deep and informative alone as functional references, and have been demonstrably essential to deciphering gene activity in the genome reference of loblolly pine. The TreeGenes database was improved with the addition of the complete conifer repeat library generated by the team. In addition, TreeGene resources for viewing genome annotations were developed in Gbrowse and WebApollo. The latest version of the on-line annotation tool (Genome Sequence Annotation Server) GenSAS v2.0 is now being beta tested by several researchers to help annotate/curate genomes of interest. Tutorial modules, started in grant year 2, were completed and reviewed by Project PDs and staff and subsequently produced at Oregon State University and made available via the eXtension website. Outreach presentations about how the reference genome sequence can be used were made to regional tree improvement cooperatives.

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Wegrzyn Jill L, Brian Y Lin, Jacob J Zieve, William M Dougherty, Pedro J Mart�nez-Garc�a, Maxim Koriabine, Ann Holtz-Morris, Pieter de Jong, Marc Crepeau, Charles H Langley, Daniela Puiu, Steven L Salzberg, David B Neale, Kristian A Stevens. 2013. Insights into the loblolly pine genome: Characterization of BAC and fosmid sequences. PLoS ONE 8(9): e72439. doi:10.1371/journal.pone.0072439
  • Type: Journal Articles Status: Awaiting Publication Year Published: 2014 Citation: Zimin Aleksey, Kristian A Stevens, Marc Crepeau, Ann Holtz-Morris, Maxim Koriabine, Guillaume Mar�ais, Daniela Puiu, Michael Roberts, Jill L Wegrzyn, Pieter J de Jong, David B Neale, Steven L Salzberg, James A Yorke, and Charles H Langley. Sequencing and assembling the 22-Gb loblolly pine genome Genetics, submitted 2013, reviewed, accepted for publication.
  • Type: Journal Articles Status: Awaiting Publication Year Published: 2014 Citation: Wegrzyn Jill L, John D Liechty, Kristian A Stevens, Le-Shin Wu, Carol A Loopstra, Hans Vasquez-Gross, William M Dougherty, Brian Y Lin, Jacob J Zieve, Pedro J Mart�nez-Garc�a, Carson Holt, Mark Yandell, Aleksey Zimin, James A Yorke, Marc Crepeau, Daniela Puiu, Steven L Salzberg, Pieter de Jong, Keithanne Mockaitis, Doreen Main, Charles H. Langley, David B Neale. Expansion of introns, ancient transposable elements and gene families revealed in the annotation of the loblolly pine (Pinus taeda L.) megagenome. Genetics, submitted 2013, reviewed, accepted for publication.
  • Type: Journal Articles Status: Submitted Year Published: 2014 Citation: Neale DB, JL Wegrzyn, KA Stevens, A Zimin, D Puiu, M Crepeau, C Cardeno, M Koriabine, A Holtz-Morris, JD Liechty, PJ Mart�nez-Garc�a, HA Vasquez-Gross, BY Lin, JJ Zieve, WM Dougherty, S Fuentes-Soriano, L Wu, D Gilbert, G Mar�ais, M Roberts, C Holt, M Yandell, JM Davis, K Smith, JFD Dean, WW Lorenz, RW Whetten, R Sederoff, N Wheeler, PE McGuire, D Main, CA Loopstra, K Mockaitis, P de Jong, JA Yorke, SL Salzberg, and CH Langley. The megagenome of loblolly pine (Pinus taeda L.). Genome Biology, submitted 2014.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Mar�ais G, A Zimin, and J Yorke. 2013. QuorUM: an error corrector for Illumina reads. arXiv:1307.3515 arXiv.org
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Mart�nez-Garc�a Pedro J, Kristian A Stevens, Jill L Wegrzyn, John Liechty, Marc Crepeau , Charles H Langley, and David B Neale. 2013. Combination of multipoint maximum likelihood (MML) and regression mapping algorithms to construct a high-density genetic linkage map for loblolly pine (Pinus taeda L.). Tree Genetics & Genomes DOI 10.1007/s11295-013-0646-4


Progress 02/01/12 to 01/31/13

Outputs
OUTPUTS: Purification of DNA from fosmid pools has been scaled up to a production level and will be used to create short-insert libraries and jumping libraries for subsequent sequencing by the HiSeq platform sequencing and quality assessment by the MiSeq platform. These libraries are under evaluation as an option for improving Whole Genome Shotgun assemblies of Loblolly pine and Sugar pine. The v0.8 Loblolly pine WGS assembly with an N50 contig size of 7 kilobases (kb) and an N50 scaffold size of 15Kb was released. The assembly contains 22 Gb of sequence which was in line with the original genome size estimates. The assembly was produced at the University of Maryland (UMD) using MaSuRCA assembler version 1.9.2 developed jointly by the UMD and Johns Hopkins University (JHU) teams. The underlying data set consisted of 60x coverage by Illumina Paired end reads, and 12x coverage by Illumina sequenced jumping library and DiTag long mate pairs. Initial sequencing of Sugar pine: The initial WGS consisted of 8 libraries run on a HiSeq 2000 at the UC Davis Genome Center obtaining 281Gbp of 125bp x 125bp paired end reads, supplemented by 4 libraries run on a GAIIX at UC Davis obtaining 118Gbp of 160bp x 155bp paired end reads and 2 libraries run on a MiSeq obtaining 48Gbp of 250bp x 250bp paired end reads. The insert sizes were chosen for each read length to limit gap size for efficient conversion to super-reads (requiring gap imputation). A total of 447Gbp of short-insert sequence was obtained. To date, 27 transcriptomes of Loblolly pine have been sequenced and assembled. The Loblolly pine v0.8 assembly has been analyzed for repeat elements and the characterization data have been publicly released. An annotation pipeline has been adopted and training sets for gene predictions have been prepared. Genome Sequence Annotation Server (GenSAS) v. 2013.01, has been publicly released, in collaboration with this project, and it will be a major tool of the project's annotation efforts. A 42-slide training module 'Genome Sequencing Module', was created by the project and is under peer review. It is destined for the USDA eXtension site. In addition, a short module on Landscape Genomics which presents an important application of the genome sequence information has been prepared and is under review. PARTICIPANTS: PD David Neale (UC Davis) directed the project overall and organized the public workshop supported by Patrick McGuire, Project Coordinator (UC Davis) and Nicholas Wheeler (education and training consultant). Co-PD Pieter de Jong (CHORI) is leader for the Cloning and Library Construction group supported by Ann Holtz-Morris, Maxim Koriabine, and Darota Kostecka (also at CHORI). Co-PD Chuck Langley (UC Davis) leads the Sequencing group supported by Kristian Stevens, Marc Crepeau, and Charis Cardeno (also at UC Davis). Co-PDs Steven Salzberg (Johns Hopkins U) and James Yorke (U of Maryland College Park) lead the Assembly group supported by Daniela Puiu (JHU) and Aleksey Zimin (UMCP). Co-PDs Keithanne Mockaitis (U of Indiana), Carol Loopstra (Texas A&M U), Jill Wegrzyn (UC Davis), and Dorrie Main (Washington State U), lead the Transcriptome group supported by Sara Fuentes, Michael Alley, Le-Shin Wan, and Zach Smith (all at U of Indiana), Jeff Puryear (Texas A&M U), Pedro J. Martinez-Garcia, John Liechty, Ben Figueroa, and John Yu (all at UC Davis), and Sook Jung, Taein Lee, and Stephen Ficklin (all at Washington State U). Co-PDs Jill Wegrzyn (UC Davis), Dorrie Main (Washington State U), Carol Loopstra (Texas A&M U), and Keithanne Mockaitis (U of Indiana) lead the Annotation/database group supported by Pedro J. Martinez-Garcia, John Liechty, Ben Figueroa, and John Yu (all at UC Davis), Sook Jung, Taein Lee, and Stephen Ficklin (all at Washington State U), Jeff Puryear (Texas A&M U), and Sara Fuentes, Michael Alley, Le-Shin Wan, and Zach Smith (all at U of Indiana). TARGET AUDIENCES: As part of its second annual project meeting, a public workshop (Pine Genome Reference Sequence Workshop) was held at the most widely attended international conference dedicated to plant and animal genomics (Plant and Animal Genome XXI, Jan. 12-16, 2013, San Diego CA). The workshop, held Jan. 12, 2013, consisted of presentations from five of the project's task groups outlining progress and plans. In addition, a presentation about applications of project sequence data to applied tree breeding was made by Ross Whetten, North Carolina State Univ. The workshop generated much interest and was well attended (~120 persons). The audience consisted primarily of other tree (conifer and hardwood) geneticists, persons interested in expanding comparative genomics to conifers, persons interested in tools, protocols, and bioinformatics for large genomes, and students. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Genome size estimates were made from the WGS data of Loblolly pine (20.68 Gb) and Sugar pine (33.98 Gb). It was determined that the greater genome size of Sugar pine was not a consequence of a greater fraction of repeat sequences in that genome, suggesting that the repeat content of Sugar pine will be no greater assembly challenge than we are facing with the Loblolly pine genome. Transcriptome library quality was evaluated and found not to be impacted by contaminants.

Publications

  • Wegrzyn J, Figueroa B, Yu J, Vasquez-Gross H, Liechty J, Lin B, Zieve J, Ip J, and Neale D. 2013. Bioinformatic solutions in forest genomics: Accessing the TreeGenes database. (P0948), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Main D, Lee T, Zheng P, Jung S, Ficklin SP, Humann J, Wegrzyn J, and Neale D. 2013. GenSAS: Genome Sequence Annotation Server, a tool for online annotation and curation. (P0985), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Wegrzyn J, Lin B, Zieve J, Stevens K, Neale D, Martinez-Garcia PJ, and Dougherty M. 2013. Insights into the Loblolly pine genome: Characterization of fosmid sequences. (W539), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Martinez-Garcia PJ, Stevens K, Wegrzyn J, Liechty J, Crepeau M, Langley C, and Neale D. 2013. A comprehensive high-density genetic linkage map for Loblolly pine (Pinus taeda L.). (W451A), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Crepeau M, Puiu D, Holtz-Morris A, Koriabine M, Cardeno C, Zimin A, de Jong P, Langley C, Salzberg SL, and Stevens K. 2013. Sequencing strategies in conifer. (W537), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Mockaitis K, Fuentes-Soriano S, Loopstra C, and Wegrzyn J. 2013. Building references for functional genomics of conifer development: deep sequencing and comparative gene expression in reproductive tissues and seeds of Loblolly pine. (W540), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.
  • Zimin A. 2013. Assembly of the 24Gb Loblolly pine genome from WGS short read data. (W538), International Plant and Animal Genome XXI Conference, Jan. 12-16, 2013, San Diego CA.


Progress 02/01/11 to 01/31/12

Outputs
OUTPUTS: The first year of the project has focused primarily on cloning, sequencing, and assembly method development to facilitate efficient and economic sequencing and assembly of such large genomes. Specific highlights of each group include the following: The Cloning and Library Construction group created a de novo fosmid vector specific to the project and using it and others created and amplified two high-complexity libraries and protocols to use them while investigating logistics, economical factors, reproducibility, and compatibility with the sequencing protocols that will be used. The Sequencing group developed a prototype sequencing pipeline and used it on loblolly pine fosmid pools produced by the Cloning and Library Construction group. Data transfer protocols have been devised, tested, and are in place for the anticipated high-throughput, high-volume sequencing data that will be generated. The Assembly group has evaluated and optimized software for handling the assembly of the whole genome shotgun sequence data and assembly has been tested on the sequenced fosmid pools from more than one library. The Transcriptome group has produced assemblies from both 454 and RNASeq sequencing of six tissue/species libraries to date: shoots, callus, and stem from loblolly pine, needles from sugar pine, and needles and shoots from Douglas fir. The Annotation/database group has reconfigured GBrowse to improve speed and connection to resources within the project's TreeGenes database; integrated the transcriptome assemblies into the TreeGenes database with GMOD GBrowse viewer implementation; established individual conifer species page views with genome/transcriptome data in TreeGenes; established an analysis pipeline integration with the iPlant project's Atmosphere infrastructure; and completed a pipeline for tracking RNA extraction and downstream analysis of RNASeq runs and connection of this interface to the current Forest Tree Genetic Stock Center. In addition, the group co-organized a meeting for the Plant Ontology organization in February 2012 to initiate the integration of existing phenotype terms into TO (Trait Ontology) and to integrate woody structures into PO (Plant Ontology), enhancing the broader utility of the work of this project to genomics in other species. PARTICIPANTS: PD David Neale (UC Davis) directed the project overall and organized the public workshop supported by Patrick McGuire, Project Coordinator (UC Davis) and Nicholas Wheeler (education and training consultant). Co-PD Pieter de Jong (CHORI) is leader for the Cloning and Library Construction group supported by Ann Holtz-Morris, Maxim Koriabine, and Boudewijn ten Hallers (also at CHORI). Co-PD Chuck Langley (UC Davis) leads the Sequencing group supported by Kristian Stevens, Marc Crepeau, and Charis Cardeno (also at UC Davis). Co-PDs Steven Salzberg (Johns Hopkins U) and James Yorke (U of Maryland College Park) lead the Assembly group supported by Daniela Puiu (JHU) and Aleksey Zimin (UMCP). Co-PDs Keithanne Mockaitis (U of Indiana), Carol Loopstra (Texas A&M U), Jill Wegrzyn (UC Davis), and Dorrie Main (Washington State U), lead the Transcriptome group supported by James Ford, Ram Podicheti, and Zach Smith (all at U of Indiana), Jeff Puryear (Texas A&M U), John Liechty, Ben Figueroa, and John Yu (all at UC Davis), and Sook Jung, Taein Lee, and Stephen Ficklin (all at Washington State U). Co-PDs Jill Wegrzyn (UC Davis), Dorrie Main (Washington State U), Carol Loopstra (Texas A&M U), and Keithanne Mockaitis (U of Indiana) lead the Annotation/database group supported by John Liechty, Ben Figueroa, and John Yu (all at UC Davis), Sook Jung, Taein Lee, and Stephen Ficklin (all at Washington State U), Jeff Puryear (Texas A&M U), and James Ford, Ram Podicheti, and Zach Smith (all at U of Indiana). TARGET AUDIENCES: As part of its first annual project meeting, a public workshop (Pine Genome Reference Sequence Workshop) was held at the most widely attended international conference dedicated to plant and animal genomics (Plant and Animal Genome XX, Jan. 14-18, 2012, San Diego CA). The workshop, held Jan. 14, 2012, consisted of presentations from each of the project's task group outlining challenges, progress, and expectations. In addition, presentations from two other conifer sequencing projects (the Swedish Norway Spruce Genome Project and the Canadian White Spruce Genome Sequencing Project) were made. The workshop generated much interest and was well attended: room seating capacity of 50 was reached, aisles were filled, and many were turned away. The audience consisted primarily of other tree (conifer and hardwood) geneticists, persons interested in expanding comparative genomics to conifers, persons interested in tools, protocols, and bioinformatics for large genomes, and students. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
At this early stage in the project, impacts of the project are already apparent. Novel vectors (new and improved fosmids) improving efficiency of cloning procedures and increasing target DNA fragment size and complexity have been developed and in addition to their use in this project, they will have utility for other large-genome sequencing projects. Assembly pipelines and protocols are emerging that will handle the vast whole genome sequence data resource from the project's genomes which are much greater than any sequenced to date. Through collaboration by the project with the Plant Ontology consortium, the working vocabulary relevant to comparative genomics has been broadened to accommodate gymnosperm-specific traits, conditions, and stages of growth.

Publications

  • Wegrzyn J, Figueroa B, Yu J, Liechty J, Vasquez-Gross H, Lin B, Zieve J, and Neale D. 2012. Bioinformatic solutions in forest genomics: Overview of the resources from the TreeGenes Database. Poster P0963. Plant & Animal Genome XX Conference. 14-18 January 2012. San Diego CA USA.
  • Neale D, de Jong P, Langley C, Loopstra C, Main D, Mockaitis K, Salzberg SL, Yorke JA, Wegrzyn J, and Wheeler N. 2012. PineRefSeq: Genome sequences for loblolly pine, Douglas-fir, and sugar pine. Poster P0060. Plant & Animal Genome XX Conference. 14-18 January 2012. San Diego CA USA.