Progress 08/01/13 to 07/31/14
Outputs Target Audience: In this reporting period (August 1st 2013-July 31st 2014), my efforts reached a very wide scientific community. I presented the progress of re-sequencing 1,000 mutants lines and building the public resource at two scientific conferences, “Beyond the Genome” and “Plant and Animal Genome”, In addition, I have been presenting this project at six job interviews in research institutions and invited talks. Our results published previously in Genome Biology (Krasileva et al. 2013, PMID: 23800085) last summer have been already cited 12 times and accessed 11,483 times according to the journal website. I also published testing mutation discovery in wheat in collaboration with Luca Comai’s group in Plant Cell (Henry et al. 2014, PMID: 24728647). This publication is available Open Access and it has been cited 5 times in the past three months since it has been accepted and available online. I have been using my professional Twitter account (@kseniakrasileva) to report the progress of my work under hashtag #1000wheatlines. This allowed me reach even wider audience including non-scientists and directly interact with scientist from developing countries. Finally, I have been involved in teaching bioinformatics through the Software Carpentry program, and helped teach their workshop on Data Visualization at Davis. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? The project provided me with unique opportunities to advance my computational skills, including large-scale analyses of next generation sequencing data. With the support of this AFRI-NIFA grant, I have attended two conferences “Beyond the Genome” in San Francisco and “Plant and Animal Genomics (PAG) XII” in San Diego. Both conferences presented me with opportunities to communicate with leaders in the field as well as to advance my knowledge about genomics. I have presented my research at both conferences and received invaluable feedback from the participants. Attending PAG XII has been especially motivating, as I have been invited to present my work in the ‘Triticeae genomics workshop’. Also, I applied for assistant professorship/group leader positions during the past year. Without doubt, being a NIFA fellow helped me to receive invitations to 5 interviews and 2 job offers. I should note that because I will be starting my own group and leaving UC Davis at the end of the year, I would not be able to attend the NIFA conference in Washington DC scheduled for the first quarter of 2015. Instead, I would like to substitute this presentation requirement with my presentation at PAG XII, which is a major conference in the field of genomics. How have the results been disseminated to communities of interest? Results of my work have been disseminated to a wide scientific community. Our data is publicly available through the National Center for Biotechnology Information (NCBI). We have built new lab website with dedicated pages for enabling searching the mutant database and displaying the mutation information: http://dubcovskylab.ucdavis.edu/wheat_blast (the website will be available without password at the end of the year). This outreaches to both scientists and breeders interested in wheat. Our programs have been available on GitHub, the main portal for programs exchange among programmers and bioinformaticians. Finally, I am actively using professional social networks, Twitter (@kseniakrasileva) and Blogging to reach an even wider community. I tweet about my research and current progress of my project under hashtag #1000wheatlines. What do you plan to do during the next reporting period to accomplish the goals? I have requested a no cost extension of my NIFA fellowship for 3 months until the end of October 2014. During these three months I plan to finish the analysis of 1,000 lines that I sequenced and finish writing the draft of manuscript describing these results.
Impacts What was accomplished under these goals?
Non-technical summary Wheat was domesticated at the dawn of agriculture and has since been adapted to grow in different environments throughout the world. However, we need to continue wheat breeding to address challenges of today: improving wheat quality and grain yield in response to growing population, combating shortage of water and greater variability in temperatures caused by global climate change and developing disease resistant varieties in response to the emergence and migration of pathogens. Our collection of targeted induced lesions in genomes presents a major resource of wheat gene variants for tetraploid wheat. During the past year, I have adopted rapid and high throughput protocols to for targeted sequencing of wheat lines and applied these techniques to re-sequence 1,000 wheat lines from our collection. Using computational tools, I identified allelic variation in each wheat gene in over 250 lines. The remaining lines are already sequenced and are being analyzed right now. On average, I am discovering 2,000 alleles per line and a total of around 2,000,000 new wheat gene alleles are expected from analyzing the full set of 1,000 lines. With collaboration of a database specialist in the lab, we have built a public database of all mutations that will be accessible by the end of the year to anyone without restrictions. The corresponding seeds of wheat lines will be available from the Dubcovsky lab upon request. Our characterization of induced alleleic variants in 1,000 wheat lines provides tools to researchers to analyze the function of different wheat genes. The discovery of 2 million allelic changes presents a huge increase in genetic diversity that will help breeders address current agricultural challenges. Progress on all objectives During the second year of the project, I successfully completed Aim 2 and made significant progress on Aim 3-4 as outlined below: Aim1. Define the gene space of tetraploid wheat: This aim has been successfully completed during the first year of the project as reported last year. Aim 2. Capture and sequence the exome of the tetraploid wheat: I have validated our computational approaches of mutation identification in wheat on the test dataset and the results have been published in collaboration with Luca Comai’s group in Plant Cell earlier this year. I have finalized the exome capture design in collaboration with Nimblegen (Roche). We improved our design by re-balancing the probes to normalize the coverage across exons. I have developed high throughput methods to adapt modern robotics for the projects and successfully completed 1) library preparation 2) exome captures 3) sequencing of 1,000 mutant wheat lines reaching the goal proposed for the project. Aim3. Assemble a comprehensive mutant library of tetraploid wheat: I have established streamline pipelines of sequencing data analyses and completed the analyses of 284 lines. In addition to using the gold standard wheat genome reference, I have assembled reads from that did not map to the reference and included resulting contigs to the analyses. Additional contigs allowed improving mapping rate from 96% to 98% of mapped reads. Surveying the additional contigs showed that they had a large number of disease resitance genes that have been shown to be variable between different plant cultivars. Therefore, this effort ensures that we detect mutations in the genes that are not represented in the standard wheat reference. In each line, I am discovering on average 2245(+/- 930) mutations. In total, there are 637,777 mutations present in our current database. The analyses of the remainder of the lines will be completed within next month. The projected number of mutations in the database is expect to reach >2,000,000. In collaboration with database specialist in the lab, Hans Vasquez-Gross, we have developed a website for displaying mutant database that can be searchable by simple blast searches (http://dubcovskylab.ucdavis.edu/wheat_blast). The website is currently under password protection which will be lifted as soon as we submit a publication for review. We already have over 100 beta users who routinely report any display issues so the website will be fully tested before the official release. We plan to publicly announce the database during the PAGXIII (January 2015) in San Diego and advertise it broadly across plant communitites. Aim 4: Analyze functional mutants involved in disease resistance pathways: I have previously computationally predicted components of plant immunity pathways. In each line that we are sequencing, I am predicting the mutation effect on gene function using the Variant Effect Prediction software from Ensembl. Therefore, I have accumulated the list of disease resistance genes and other signaling components with stop codon/splice site changes or non-synonymous mutations. I have also phenotyped our 1,000 lines for increased or reduced disease resistance to wheat stripe rust by both field experiments and confirmation of the results in controlled growth chamber environment.
Publications
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2013
Citation:
Krasileva KV, Buffalo V, Ayling S, Soria M, Uauy C and Dubcovsky J Using exome-capture technology to develop functional genomics tools for wheat Presentation. Beyond the Genome Conference, San Francisco CA 2013
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2013
Citation:
Krasileva KV Development of Functional genomic Tools for Wheat Presentation. Bits and Bites forum, UC Davis CA 2013
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2014
Citation:
Krasileva KV, Vasquez-Gross H, Ayling S, Paraiso F, Howell T, Uauy C and Dubcovsky J Exome capture and TILLING in tetraploid and hexaploid wheats PAG XXII, San Diego CA 2014
- Type:
Journal Articles
Status:
Published
Year Published:
2014
Citation:
Henry IM, Nagalakshmi U, Lieberman MC, Ngo KJ, Krasileva KV, Vasquez-Gross H, Alina Akunova A, Akhunov E, Dubcovsky J, Tai TH and Comai L Efficient genome-wide detection and cataloging of EMS-induced mutations using exome capture and next-generation sequencing Plant Cell, 26:1382-1397 (2014)
|
Progress 08/01/12 to 07/31/13
Outputs Target Audience: In the past year, my efforts reached a very wide scientific community. First, I have sequenced and assembled a transcriptome of tetraploid wheat, which is available through the National Center for Biotechnology Information (NCBI) Transcriptome Shotgun Assembly (TSA) database. Additional annotation efforts of this transcriptome, including predicted wheat proteome, are available on a dedicated project website: http://maswheat.ucdavis.edu/Transcriptome/index.htm In the past three months, this website received many visits and datasets have been downloaded many times, indicating that they scientists (and may be breeders) interested in wheat research. Our transcriptome sequences also became part of the GrainGenes database (http://wheat.pw.usda.gov/GG2/WheatTranscriptome/). Finally, these sequences have been used to annotate gene models in the hexaploid wheat genome now available at the Ensembl database (http://plants.ensembl.org/Triticum_aestivum/Info/Index). The transcriptome, its annotation, wheat gene models and bioinformatics workflows were published in an open access journal Genome Biology (Krasileva et al. 2013, PMID: 23800085). This publication is highly accessed and has been viewed 6174 times since it was accepted in June 2013 according to the journal’s website. According to Almetric, it has been mentioned to a very broad audience in 32 tweets from 24 accounts, including 11 scientist, 11 non-scientists and 2 science communicators with an upper bound of 22,172 combined followers. Although my publication came out only three months ago, it has been already cited in two peer-reviewed journal articles and referenced on several databases, including GrainGenes and Ensembl. Our dedicated project website with all our data has been highly accessed with audience mostly from United States, but also from other countries (among the top five are China, United Kingdom, Canada, Israel and India). Overall, this indicates that our efforts definitely reached a very wide scientific community and a subset of general audience interested in science communications and wheat research. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Supported by my AFRI-NIFA funding, I presented my research at two conferences. This allowed me to network with my colleagues and present my research to a very broad audience. In addition, I received training by the UC Davis Genome core facility to work on machines designed for high throughput liquid handling. This is an invaluable experience for me as I plan to apply these techniques in my own lab. Importantly, my fellowship released enough funds for the Dubcovsky lab to hire a professional programmer, whom I mentored. This was an amazing experience working with someone coming form a completely different discipline. I learned advanced programming skills and I gained experience mentoring someone with little experience in biology. As I plan to lead an interdisciplinary team in my own lab, this was an invaluable experience. Finally, due to my accomplishments supported by this grant I have been invited to speak at the Triticeae workshop at the Plant and Animal Genome Conference (PAG) next year. How have the results been disseminated to communities of interest? Results of my work have been disseminated to a wide scientific community. As described in the first part of this report, our data is publicly available through the National Center for Biotechnology Information (NCBI) Transcriptome Shotgun Assembly (TSA) database (Project IDs PRJNA191053, PRJNA191054), a dedicated project website, http://maswheat.ucdavis.edu/Transcriptome/index.htm), the GrainGenes database (http://wheat.pw.usda.gov/GG2/WheatTranscriptome/), and as part of the Ensembl database (http://plants.ensembl.org/Triticum_aestivum/Info/Index). This outreaches to both scientists and breeders interested in wheat. Our programs have been available on GitHub, the main portal for programs exchange among programmers and bioinformaticians. Finally, I am actively using professional social networks, Twitter (@kseniakrasileva) and Blogging to reach an even wider community. I tweet about my research and current progress of my project; I also express my views on promoting diversity and equity in sciences under hashtag #womensci. What do you plan to do during the next reporting period to accomplish the goals? During the next reporting period I plan to finish my proposed efforts on Aim2-4 as following: Analyze the capture data from the test design and approve large-scale production (next month). Finish high throughput library preparation (next few months) Complete adapting robotics for wheat exome captures (next few months) Perform high-throughput exome captures. As I am multiplexing 8 samples per capture, I need to do only 125 captures (within a year.) Perform bioinformatic analyses to identify mutations in sequenced lines (throughout the year) Annotate mutations in candidate disease resistance pathways (throughout the year)
Impacts What was accomplished under these goals?
During the past year, I successfully completed Aim 1 and made significant progress on Aims 2-4 as outlined below. Aim1. Define the gene space of tetraploid wheat: I successfully reached the first milestone of defining wheat genes based on transcriptome assembly. Completing this aim has been challenging due to lack of the bioinformatic solutions to assembly of polyploid organisms. As I became aware that current assembly programs do not work well in tetraploid wheat, I lead collaborative efforts to develop specific boinformatics workflows. Specifically, we adapted phasing methods to resolve homeologous sequences that were collapsed during the assembly. Results of this study were published in an open access peer-reviewed journal Genome Biology (Krasileva et al. 2013) and the datasets were made publicly available. Also, I surveyed published wheat transcripts (Schreiber et al. 2012, Brenchley et al. 2012, Mochida et al. 2009, Cantu et al. 2011) and identified sequences that complemented our data, annotating protein-coding sequences. Together, these data formed a comprehensive exome-capture design targeted specifically for wheat protein-coding regions. Aim 2. Capture and sequence the exome of the tetraploid wheat: I made significant progress on adapting exome capture and mutation identification protocols. First, the exome capture design described above has been submitted to NimbleGen for custom EZ Capture Library preparation. There are two stages in this capture production: a) evaluation of small-scale reagents and re-balancing of oligoes if needed b) large scale production. I tested the small-scale reagents and currently I am waiting for the sequencing results to see if we need to make any adjustments to the design. In parallel, together with Eduard Akhunov’s group at Kansas State University, we used previously available wheat exome capture (not specific to protein-coding sequences) to capture and sequence 7 of our mutant lines and a wild type control Although this particular design covers only 30% of our targets (and that’s why we are making our own capture with NimbleGen), the data is still very useful for development of bioinformatic pipelines. By my analyses, we can successfully identify EMS-induced mutations using the Mutation and Polymorphism Survey (MAPS) pipeline developed by the Luca Comai group at UC Davis. This pipeline analyzes multiple samples at the same time and reports polymorphisms detected only in one sample and not in any other. This is exactly what we expect to see in our individual mutant lines. I tested a range of parameters for mutation identification with MAPS. Using the wild type data as a control for the false positives, I defined that minimum heterozygous allele coverage MinHetCov=8 gives optimal signal to noise ratio (5% false positives). Analyzing the mutations detected, I observed overrepresentation of GA and CT changes as expected from EMS-mutagenesis. Overall, this shows that using MAPS pipeline is a good strategy for this project. Aim3. Assemble a comprehensive mutant library of tetraploid wheat: By our estimates, we need to sequence around 1,500 mutant wheat lines to get a knockout allele in 95% of the genes. The Dubcovsky lab already has high quality DNA stocks for over 2,000 of mutant lines. Our ability to move our project to this scale depends on automation of the library preparation and capture. Originally, I proposed to do 200 samples by hand and move to automation thereafter. However, current robotics reached high quality and outperformed the artesian library preparation methods, therefore I targeted my efforts at early adoption of automatic workflows for DNA shearing, DNA library preparation and exome capture. The robots that I used are hosted at the UC Davis DNA Technology Core Facility. For DNA shearing, I adapted protocols for Covaris E220 capable of processing 96 samples in a run. The default specifications described by manufacturer needed a lot of modifications, and the protocol previously described by Fisher and colleagues (Fisher et al. 2011) worked best. For the high throughput library preparation protocols, I adapted ScicloneG3 liquid handler and the Maestro KAPA DNA library preparation protocol, which needed only a few modifications. Now, I am able to process 96 libraries in a single day. I prepared the first 2 sets of 96 libraries and will be able to finish the rest of the library preparation within next several months. The ScicloneG3 liquid handler is also capable of performing exome captures. I have been trained on capture-related protocols by a Caliper’s representative and I will test them with our samples within next couple of months. Once capture protocol are fully adapted, ScicloneG3 is capable of performing 384 captures in 1 week. The Dubcovsky lab has sufficient support from Betty and Gordon Moore Foundation for sequencing costs; therefore, I see no obstacles to complete this part of the project within next year. Aim 4: Analyze functional mutants involved in disease resistance pathways: I used bioinformatic analyses and comparative genomics to predict the components of wheat disease resistance pathways. I applied three bioinformatic approaches: Identification of protein families previously implicated in disease resistance based on the key combinations of functional domains using Hidden Markov Model-based searches. Identification of wheat orthologs of Arabidopsis genes with known functions in plant immunity using phylogenetic approaches. Identification of critical residues in disease resistance gene candidates based on amino acid conservation and structural modeling Using these approaches, I annotated nearly 600 of NBS-LRR disease resistance genes as well as other components of plant immunity. Once we start sequencing our mutant population on large scale, I will annotate mutation in the predicted wheat disease resistance genes and signaling components. In summary, I achieved significant progress on this USDA-funded project. I released a database of annotated wheat transcripts and led the development of new methods for assembly of wheat genes, published in Genome Biology. I made significant progress on adopting modern robotics for high throughput library construction and sequencing. I tested bioinformatic pipeline for mutation identification and defined the parameters that work best for this project. Finally, I used comparative genomics to annotate components of wheat innate immunity. References: Krasileva KV, Buffalo V, Bailey P, Pearce S, Ayling S, Tabbita F, Soria M, Wang S, Consortium I, Akhunov E, Uauy C, Dubcovsky J: “Separating homeologs by phasing in the tetraploid wheat transcriptome”. Genome Biology, 2013 14:R66.s Schreiber AW, Hayden MJ, Forrest KL, Kong SL, Langridge P, Baumann U: Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat. BMC Genomics 2012, 13:492. Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N,.Kramer M, Kerhornou.A, Bolser D et al: Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature.2012, 491(7426):705-710. Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K. TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. Plant Physiol 2009, 150:1135-1146. Cantu D, Pearce SP, Distelfeld A, Christiansen MW, Uauy C, Akhunov E, Fahima T, Dubcovsky J.: Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence. BMC Genomics 2011, 12:492. Sheila Fisher, Andrew Barry, Justin Abreu, Brian Minie, Jillian Nolan, Toni M Delorey, Geneva Young, Timothy J Fennell, Alexander Allen, Lauren Ambrogio et al. “A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries” Genome Biology 2011, 12:R1
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Krasileva KV, Buffalo V, Bailey P, Pearce S, Ayling S, Tabbita F, Soria M, Wang S, Consortium I, Akhunov E, Uauy C, Dubcovsky J: Separating homeologs by phasing in the tetraploid wheat transcriptome. Genome Biology, 14:R66. (2013)
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2012
Citation:
Krasileva KV, Buffalo V and Dubcovsky J De novo transcriptome assembly of polyploid organisms: insights from working with diploid and tetraploid wheat Howard Huges Medical Institute Conference, Janelia Farm, VA USA (Poster). 2012
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2012
Citation:
Krasileva KV, Schwessinger B, Buffalo V and Dubcovsky J Potential pathogen targets and innate immunity genes of durum wheat New Phytologist Conference: 'Immunomodulation by plant-associated organisms' Fallen Leaf Lake, CA USA (Poster). 2012
|