Progress 05/15/18 to 05/14/22
Outputs Target Audience:Researchers ranging from undergraduate students to senior faculty in educational and research institution, andin private industry and government in the fields of genomics, DNA sequence data analysis, breeding and population studies. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?This project funded 40% of the salary for postdoctoral fellow Christopher Pockrandt at JHU. How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Both Specific Objectives have been accomplished in full, and tested on A.thaliana and human data sets. In addition, we have developed additional software described below. Reference Guided Assembly Figure 1. MaSuRCA-Syntigs strategy. We turn Illumina reads into super-reads and then use 25-mers in the super-reads to compute approximate alignments of the super-reads to reference contigs. Then we build syntigs as paths of exactly overlapping super-reads where the overlaps are confirmed by the reference alignment positions. The paths are called syntigs. Finally we assemble super-reads and syntigs de novo. The number of reference or high quality genome assemblies for different species is growing rapidly thanks to proliferation of 3-rd generation long read sequencing technologies. The reference genomes can be used as templates to assist in individual de novo genome assemblies of closely related (or the same) species from low cost short read Illumina data. We show that using one or more reference genomes yields a short-read de novo assembly that is superior in contiguity and completeness. The MaSuRCA-Syntig software that is a new addition to MaSuRCA genome assembly package that enables synteny-assisted de novo genome assembly from Illumina paired end read data guided by one of more reference sequences of closely related species. The principal difference of the new technique is that multiple references can be used at the same time and we show that assembly contiguity improves as more reference sequences are added. We achieved N50 contig size of 986Kb for de novo synteny assisted assembly of A. thaliana, 2.8 times bigger than N50 for assembly of the same data that did not use reference information. Use of human reference genome version GRCh38 resulted in N50 contig size of 482Kbp for de novo Illumina-only assembly of NA12878 data set, 5 times bigger than the corresponding N50 for the assembly with-out use of the reference. The MaSuRCA-Syntig strategy is shown on Figure 1. We split the reference assembly (or assemblies) into contigs at gaps. We then compute the super-reads from the Illumina reads in a standard way done in MaSuRCA (Zimin et al, 2013). Next we create approximate alignments of the super-reads to each contig from read using 25-mers that the reference contigs have in common with super-reads. Smaller seeds may be needed for more divergent species. The 25-mer seeds work for closely related species, where DNA sequences are >98% similar. For the alignments, we first build a database of all 25-mers in the super-reads. We use this database to compute, for each super-read, its approximate start and end positions on each reference contig using the LCS algorithm described in (Zimin et al., 2017). For each reference contig R, we walk down the contig looking at each 25-mer. We use the 25-mer database to determine (in constant time for each 25-mer) which 25-mers are found in super-reads. Once we have the super-reads that match R, for each such super-read S we look for ordered subsequences of the 25-mers that both R and S have in common. We then assign a score to each super-read S, where the score is number of 25-mers in the longest common subsequence (LCS) of 25-mers in the two sequences. We label an alignment as plausible if the score of S exceeds some specified minimum. For each plausible alignment, we compute an approximate position of S along R based on the positions of the LCS 25-mers in R and S. Using all super-read positions on a reference contig R, we create possible paths of (plausible) super-reads along P. Each path consists of a sequence of super-reads where two adjacent super-reads must have an exact overlap of at least 40 bases, and also must have positions on R that make it possible for them to overlap. We call each such path a synteny read, or syntig. We then assemble the super reads and syntigs using Flye assembler (Kolmogorov et al., 2019) in "subassemblies" mode. Table 1. Reference genome sequences used for A. thaliana experiments Reference genomes for Arabisopsis thaliana experiments ID Genbank accession Total Sequence (Mbp) N50 Contig (bp) N50 Scaffold (bp) TAIR1.0 (Col) GCA_000001735.1 118.96 10,898,021 23,459,830 Ler1 (Ler) GCA_001651475.1 117.11 862,972 22,588,203 Ler2 (Ler) GCA_000835945.1 127.42 11,163,166 11,163,166 We show the performance of our preliminary algorithm on assemblies of Arabidopsis thaliana Ler (Landsberg erecta) ecotype data set, consisting of 100x coverage by 2x300 Illumina MiSeq paired end reads. The references that we use are shown in Table 1. We use the official reference genome for A. thaliana Col (Columbia ecotype) TAIR1.0, and two references of more closely related species A. thaliana Ler. The Genbank accessions are listed in Table 1. The Ler2 reference is the most contiguous one, because it was produces using 3rd generation PacBio sequenceing data (Berlin et al., 2015). We set up four reference assisted assembly experiments, shown in Table 2: Experiment 1. Use TAIR 1.0 reference - different ecotype Experiment 2. Use Ler1 reference - less contiguous, same ecotype Experiment 3. Use two references, TAIR1.0 and Ler1 Experiment 4. Use the most contiguous and the closest reference Ler2. Table 2 shows that using more contiguous reference improves the assembly, because the assembly produces using Ler2 reference has N50 contig size of 986Kbp whereas assembly produced using less contiguous reference Ler1 has 723Kbp N50 contig size. Using more closely related reference works better, since Ler1 reference albeit less contiguous, yielded better result than the more contiguous TAIR1.0 reference. Using two references yields longer contigs than using a single reference, as shown in Experiment 3, even though we are combining much less contiguous Ler1 reference with more contiguous TAIR1.0 reference. The most contiguous and the closest reference Ler2 yields the best reference -assisted assembly result with 2.8 times longer contigs than the ones produces from Illumina data alone. Table 2. Reference assisted assemblies of A.thaliana Synteny assisted assemblies of Arabidopsis thaliana Ler Reference used Total Sequence (Mbp) N50 Contig (bp) N50 Scaffold (bp) none 127,353,458 351,096 433,094 TAIR1.0 121,573,373 501,958 503,045 Ler1 126,141,674 723,190 726,839 TAIR+Ler1 123,405,969 800,823 801,815 Ler2 131,677,486 986,399 993,221 The reference assisted assembly code is included in version in MaSuRCA assembler version 3.3.3 and up, and it is available on github at https://github.com/alekseyzimin/masurca.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Zimin AV, Salzberg SL. The SAMBA tool uses long reads to improve the contiguity of genome assemblies. PLoS computational biology. 2022 Feb 4;18(2):e1009860.
- Type:
Journal Articles
Status:
Awaiting Publication
Year Published:
2022
Citation:
Guo A, Salzberg S, Zimin AV. JASPER: a fast genome polishing tool that improves accuracy and creates population-specific reference genomes. bioRxiv. 2022 Jan 1.
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Zimin AV, Shumate A, Shinder I, Heinz J, Puiu D, Pertea M, Salzberg SL. A reference-quality, fully annotated genome from a Puerto Rican individual. Genetics. 2022 Feb;220(2):iyab227.
|
Progress 05/15/20 to 05/14/21
Outputs Target Audience:Researchers ranging from undergraduate students to senior faculty in educational and research institution, andin private industry and government in the fields of genomics, DNA sequence data analysis, breeding and population studies. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?This project funded 40% of the salary for postdoctoral fellow Christopher Pockrandt at JHU. How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Synteny-guided assembly approach. The number of reference or high quality genome assemblies for different species is growing rapidly thanks to proliferation of 3-rd generation long read sequencing technologies. The reference genomes can be used as templates to assist in individual de novo genome assemblies of closely related (or the same) species from low cost short read Illumina data. We show that using one or more reference genomes yields a short-read de novo assembly that is superior in contiguity and completeness. The MaSuRCA-Syntig software that is a new addition to MaSuRCA genome assembly package that enables synteny-assisted de novo genome assembly from Illumina paired end read data guided by one of more reference sequences of closely related species. The principal difference of the new technique is that multiple references can be used at the same time and we show that assembly contiguity improves as more reference sequences are added. We achieved N50 contig size of 986Kb for de novo synteny assisted assembly of A. thaliana, 2.8 times bigger than N50 for assembly of the same data that did not use reference information. Use of human reference genome version GRCh38 resulted in N50 contig size of 482Kbp for de novo Illumina-only assembly of NA12878 data set, 5 times bigger than the corresponding N50 for the assembly with-out use of the reference. The MaSuRCA-Syntig strategy is shown on Figure 1. We split the reference assembly (or assemblies) into contigs at gaps. We then compute the super-reads from the Illumina reads in a standard way done in MaSuRCA (Zimin et al, 2013). Next we create approximate alignments of the super-reads to each contig from read using 25-mers that the reference contigs have in common with super-reads. Smaller seeds may be needed for more divergent species. The 25-mer seeds work for closely related species, where DNA sequences are >98% similar. For the alignments, we first build a database of all 25-mers in the super-reads. We use this database to compute, for each super-read, its approximate start and end positions on each reference contig using the LCS algorithm described in (Zimin et al., 2017). For each reference contig R, we walk down the contig looking at each 25-mer. We use the 25-mer database to determine (in constant time for each 25-mer) which 25-mers are found in super-reads. Once we have the super-reads that match R, for each such super-read S we look for ordered subsequences of the 25-mers that both R and S have in common. We then assign a score to each super-read S, where the score is number of 25-mers in the longest common subsequence (LCS) of 25-mers in the two sequences. We label an alignment as plausible if the score of S exceeds some specified minimum. For each plausible alignment, we compute an approximate position of S along R based on the positions of the LCS 25-mers in R and S. Using all super-read positions on a reference contig R, we create possible paths of (plausible) super-reads along P. Each path consists of a sequence of super-reads where two adjacent super-reads must have an exact overlap of at least 40 bases, and also must have positions on R that make it possible for them to overlap. We call each such path a synteny read, or syntig. We then assemble the super reads and syntigs using Flye assembler (Kolmogorov et al., 2019) in "subassemblies" mode. Figure 1. MaSuRCA-Syntigs strategy. We turn Illumina reads into super-reads and then use 25-mers in the super-reads to compute approximate alignments of the super-reads to reference contigs. Then we build syntigs as paths of exactly overlapping super-reads where the overlaps are confirmed by the reference alignment positions. The paths are called syntigs. Finally we assemble super-reads and syntigs de novo.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2021
Citation:
Masonbrink RE, Alt D, Bayles DO, Boggiatto P, Edwards W, Tatum F, Williams J, Wilson-Welder J, Zimin A, Severin A, Olsen S. A pseudomolecule assembly of the Rocky Mountain elk genome. PloS one. 2021 Apr 28;16(4):e0249899.
|
Progress 05/15/19 to 05/14/20
Outputs Target Audience:A community of scientists doing work in genome assembly, analysis and annotation. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?This project supported a postdoctral research associate Christopher Pockrandt. How have the results been disseminated to communities of interest?We presented the results of all activities repated to this project in multiple meetings and seminars, including PAG 2020. What do you plan to do during the next reporting period to accomplish the goals?We will proceed with implementation of the Specific Objectives as planned.
Impacts What was accomplished under these goals?
In this project year we modified the syntigs algorithm for better performance. Instead of super-reads we now use pre-assembled contigs (assembled from the super-reads and paired end linking mate pairs) to create the syntigs, which makes the syntigs longer and improves the resulting performance of the assembler. We also added the option of filling gaps in the resulting reference guided assembly with the reference sequence, in lowercase. We are currently working on the manuscript describing the technique. The reference guided assembly code is available in the current release of MaSuRCA v3.4.1, and the usage of the option is described in the documentation. We downloaded sevebal bovine genomic data sets produced at the University of Missouri Columbie from NCBI SRA and we are now conducting xperiments on low-coverage reference assisted cattle genome assembly using the latest v5 cow reference genome. We implemented and published a fast and accurate POLCA tool for polishing genome assemblies. We use this tool in our reference assisted genome assembly pipeline.POLCA is implemented as a bash script program that takes as input a file of Illumina reads and the target assembly to be polished. The outputs are the polished assembly and a VCF (variant call format) file containing the variants used for polishing. The basic outline of the script is to align the Illumina reads to the genome and then call short variants from the alignments. A variant call is treated as a putative error in the consensus if the count of the alternative allele observations is greater than 1 and at least twice the count of the reference allele. Each error is fixed by replacing the error variant with the highest scoring alternative allele suggested by the Illumina reads. The variants can be substitutions or insertions/deletions of one or more bases. POLCA uses bwa mem (Li and Durbin, 2009) to align reads to the assembly, but another short read aligner can easily be substituted. For variant calling, it uses FreeBayes (Garrison and Marth, 2012) due to its stability and portability; however by default FreeBayes can only use a single thread (processor). In POLCA we use shell level multiprocessing FreeBayes to run multiple instances of FreeBayes in parallel, thus significantly speeding up the variant calling. We also tuned its alignment and variant calling parameters to improve sensitivity, specificity, and speed for detecting consensus errors. The FreeBayes binary is included with the POLCA distribution as part of the MaSuRCA package. (Note that POLCA installs with MaSuRCA but can be run independently to polish assemblies produced with third-party assemblers.) POLCA first builds an index of the target assembly, and then aligns the Illumina reads to the target with bwa. It then uses samtools to sort the alignment (bam) file. For variant calling we run FreeBayes in 5Mb batches, merging the variant call vcf files after all batches finish. We then process the assembly using the computed variant calls in parallel, where the number of batches is equal to the user-specified number of CPUs. We extract all target sequence names, sort them in lexicographic order and split the sorted list into batches. This helps balance the amount of target sequence in each batch, thus balancing the load on the CPUs. Parallel execution is achieved using the "xargs -P" command, which ensures compatibility between different Unix-based systems. We compared the performance of POLCA to other published polishing techniques on a real data set, using a previously published assembly of the NA12878 human genome, Genbank accession GCA_001013985.1. That assembly was produced from PacBio SMRT data (Pendleton et al, 2015), and as such it was likely to contain more consensus-level sequence errors than an assembly based on Illumina data. Alignment of this assembly to the GRCh38.p12 human reference genome with nucmer, followed by dnadiff to compute differences, yields an average alignment identity rate of 99.66%. For polishing this assembly, we used Illumina data for the same subject, NA12878, from the Genome In A Bottle project (Zook et al., 2014), dataset 140115_D00360_0009_AH8962ADXX, which contains 553,657,530 149-bp reads. Because the "true" sequence of the NA12878 genome is not known, we evaluated, for each of the three polishing programs, whether the polished genome yielded a better alignment to the GRCh38.p12 sequence. The NA12878 assembly polished with POLCA had the closest alignment by a small margin, with 99.752% identity to GRCh38, while the assemblies polished with NextPolish, Pilon and Racon had 99.750%, 99.746% and 99.749% identity respectively. Thus all four polishing programs gave very similar results in terms of accuracy, however, POLCA and NextPolish ran considerably faster, completing the task in 4 hours and less than 1 hour respectively, while Racon took 15h 39m and Pilon took far longer, 150h 16m. We note that Pilon is designed to do more than correct single base substitutions and short indel errors, which explains its longer run times. It attempts to identify and correct mis-assembled or collapsed repeats as well, a much more computationally demanding problem. POLCA provides an effective way to correct single-base substitution and short insertion/deletions errors in draft genome assemblies. On simulated data, it proves to be more accurate than Pilon and Racon and equivalent to the newer NextPolish method. POLCA is faster than Racon and Pilon, but slower than NextPolish. On simulated data, the most accurate polishing was achieved by using a combination of both POLCA and NextPolish. On real human and bacterial genome data, POLCA and NextPolish performed similarly, and better than Pilon and Racon, although POLCA appeared to be marginally better for human genome polishing. The manuscript describing the performance of the POLCA tool has been published in PLoS Computational Biology (Zimin and Salzberg, 2020).
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Alonge M, Shumate A, Puiu D, Zimin A, Salzberg SL. Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies. Genetics. 2020 Aug 12.
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Shumate A, Zimin AV, Sherman RM, Puiu D, Wagner JM, Olson ND, Pertea M, Salit ML, Zook JM, Salzberg SL. Assembly and annotation of an Ashkenazi human reference genome. Genome biology. 2020 Dec;21(1):1-8.
- Type:
Journal Articles
Status:
Awaiting Publication
Year Published:
2020
Citation:
Scott AD, Zimin AV, Puiu D, Workman R, Britton M, Zaman S, Caballero M, Read AC, Bogdanove AJ, Burns E, Wegrzyn J. A Reference Genome Sequence for Giant Sequoia. G3: Genes, Genomes, Genetics. 2020 Sep 18.
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Giordano R, Donthu RK, Zimin A, Chavez IC, Gabaldon T, van Munster M, Hon L, Hall R, Badger J, Nguyen M, Flores A. Soybean aphid biotype 1 genome: Insights into the invasive biology and adaptive evolution of a major agricultural pest. Insect Biochemistry and Molecular Biology. 2020 Feb 25:103334.
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLoS computational biology. 2020 Jun 26;16(6):e1007981.
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, Hall R. De novo assembly of the cattle reference genome with single-molecule sequencing. GigaScience. 2020 Mar;9(3):giaa021.
|
Progress 05/15/18 to 05/14/19
Outputs Target Audience:
Nothing Reported
Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?We presented the results at the Biology of Genomes meeting in 2019. What do you plan to do during the next reporting period to accomplish the goals?We will proceed with implementation of the Specific Objectives 1 and 2 as planned.
Impacts What was accomplished under these goals?
In the first year of the project we spent the most effort on Specific Objective 1: to develop a technique that will allow the production of high quality de novo genome assemblies of individual animals from low-cost Illumina paired-end data with the aid of synteny information from one or more reference genomes of closely related species. The number of reference or high quality genome assemblies for different species is growing rapidly thanks to proliferation of 3-rd generation long read sequencing technologies. The reference genomes can be used as templates to assist in individual de novo genome assemblies of closely related (or the same) species from low cost short read Illumina data. We show that using one or more reference genomes yields a short-read de novo assembly that is superior in contiguity and completeness. The MaSuRCA-Syntig software that is a new addition to MaSuRCA genome assembly package that enables synteny-assisted de novo genome assembly from Illumina paired end read data guided by one of more reference sequences of closely related species. The principal difference of the new technique is that multiple references can be used at the same time and we show that assembly contiguity improves as more reference sequences are added. We achieved N50 contig size of 986Kb for de novo synteny assisted assembly of A. thaliana, 2.8 times bigger than N50 for assembly of the same data that did not use reference information. Use of human reference genome version GRCh38 resulted in N50 contig size of 482Kbp for de novo Illumina-only assembly of NA12878 data set, 5 times bigger than the corresponding N50 for the assembly with-out use of the reference. The MaSuRCA-Syntig strategy is shown on Figure 1. We split the reference assembly (or assemblies) into contigs at gaps. We then compute the super-reads from the Illumina reads in a standard way done in MaSuRCA (Zimin et al, 2013). Next we create approximate alignments of the super-reads to each contig from read using 25-mers that the reference contigs have in common with super-reads. Smaller seeds may be needed for more divergent species. The 25-mer seeds work for closely related species, where DNA sequences are >98% similar. For the alignments, we first build a database of all 25-mers in the super-reads. We use this database to compute, for each super-read, its approximate start and end positions on each reference contig using the LCS algorithm described in (Zimin et al., 2017). For each reference contig R, we walk down the contig looking at each 25-mer. We use the 25-mer database to determine (in constant time for each 25-mer) which 25-mers are found in super-reads. Once we have the super-reads that match R, for each such super-read S we look for ordered subsequences of the 25-mers that both R and S have in common. We then assign a score to each super-read S, where the score is number of 25-mers in the longest common subsequence (LCS) of 25-mers in the two sequences. We label an alignment as plausible if the score of S exceeds some specified minimum. For each plausible alignment, we compute an approximate position of S along R based on the positions of the LCS 25-mers in R and S. Using all super-read positions on a reference contig R, we create possible paths of (plausible) super-reads along P. Each path consists of a sequence of super-reads where two adjacent super-reads must have an exact overlap of at least 40 bases, and also must have positions on R that make it possible for them to overlap. We call each such path a synteny read, or syntig. We then assemble the super reads and syntigs using Flye assembler (Kolmogorov et al., 2019) in "subassemblies" mode. Table 1. Reference genome sequences used for A. thaliana experiments Reference genomes for A.thaliana experiments ID Genbank accession Total Sequence (Mbp) N50 Contig (bp) N50 Scaffold (bp) TAIR1.0 (Col) GCA_000001735.1 118.96 10,898,021 23,459,830 Ler1 (Ler) GCA_001651475.1 117.11 862,972 22,588,203 Ler2 (Ler) GCA_000835945.1 127.42 11,163,166 11,163,166 We show the performance of our preliminary algorithm on assemblies of Arabidopsis thaliana Ler (Landsberg erecta) ecotype data set, consisting of 100x coverage by 2x300 Illumina MiSeq paired end reads. The references that we use are shown in Table 1. We use the official reference genome for A. thaliana Col (Columbia ecotype) TAIR1.0, and two references of more closely related species A. thaliana Ler. The Genbank accessions are listed in Table 1. The Ler2 reference is the most contiguous one, because it was produces using 3rd generation PacBio sequenceing data (Berlin et al., 2015). We set up four reference assisted assembly experiments, shown in Table 2: Experiment 1. Use TAIR 1.0 reference - different ecotype Experiment 2. Use Ler1 reference - less contiguous, same ecotype Experiment 3. Use two references, TAIR1.0 and Ler1 Experiment 4. Use the most contiguous and the closest reference Ler2. Table 2 shows that using more contiguous reference improves the assembly, because the assembly produces using Ler2 reference has N50 contig size of 986Kbp whereas assembly produced using less contiguous reference Ler1 has 723Kbp N50 contig size. Using more closely related reference works better, since Ler1 reference albeit less contiguous, yielded better result than the more contiguous TAIR1.0 reference. Using two references yields longer contigs than using a single reference, as shown in Experiment 3, even though we are combining much less contiguous Ler1 reference with more contiguous TAIR1.0 reference. The most contiguous and the closest reference Ler2 yields the best reference -assisted assembly result with 2.8 times longer contigs than the ones produces from Illumina data alone. Table 2. Reference assisted assemblies of A.thaliana Synteny assisted assemblies of A.thaliana Ler Reference used Total Sequence (Mbp) N50 Contig (bp) N50 Scaffold (bp) none 127,353,458 351,096 433,094 TAIR1.0 121,573,373 501,958 503,045 Ler1 126,141,674 723,190 726,839 TAIR+Ler1 123,405,969 800,823 801,815 Ler2 131,677,486 986,399 993,221 The reference assisted assembly code is included in alpha version in MaSuRCA assembler version 3.3.3 and up, and it is available on github at https://github.com/alekseyzimin/masurca.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2019
Citation:
New de novo assembly of the Atlantic bottlenose dolphin (Tursiops truncatus) improves genome completeness and provides haplotype phasing
KA Martinez-Viaud, CT Lawley, MM Vergara, G Ben-Zvi, T Biniashvili, ...
GigaScience 8 (3), giy168
- Type:
Journal Articles
Status:
Published
Year Published:
2019
Citation:
Breitwieser FP, Pertea M, Zimin AV, Salzberg SL. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome research. 2019 Jun 1;29(6):954-60.
- Type:
Journal Articles
Status:
Under Review
Year Published:
2019
Citation:
Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. bioRxiv. 2019 Jan 1.
- Type:
Journal Articles
Status:
Awaiting Publication
Year Published:
2019
Citation:
Read AC, Moscou MJ, Zimin AV, Pertea G, Meyer RS, Purugganan MD, Leach JE, Triplett LR, Salzberg SL, Bogdanove AJ. Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing. PLOS Genetics. 2020 Jan 27;16(1):e1008571.
|
|