Recipient Organization
MICROBITYPE LLC
5110 CAMPUS DR STE 170
PLYMOUTH MEETING,PA 19462
Performing Department
(N/A)
Non Technical Summary
This SBIR (small business innovationresearch) Phase I projectis responsive to National Challenge Area: Food Safety, and to USDA Strategic Goal 4.3 - Protect Public Health by Ensuring Food is Safe (www.ocfo.usda.gov/usdasp/usdasp.htm). The detection and investigation of foodborne outbreaks associated with pathogens such as E. coli O157 is highly dependent on strain typing. The typing methods employed must provide sufficient strain resolution to justify the commitment of considerable resources to epidemiological fieldwork. Ideally, the strain typing method is also expedient, since time is a critical factor in tracking pathogens to their food and environmental sources. Finally, if an affordable, user-friendly, outsourced, and confidential strain typing service were available to food processors, there is considerable potential for its use in tracking down pathogens within the food chain before an outbreak occurs.
Animal Health Component
100%
Research Effort Categories
Basic
(N/A)
Applied
100%
Developmental
(N/A)
Goals / Objectives
Strain typing plays a central role in detection and investigation of foodborne outbreaks, particularly those mediated by Salmonella, Listeria, and Shiga toxigenic E. coli (STEC). To a much lesser extent, strain typing is used by food processors to monitor pathogens within their facilities (e.g., introduction of new strains or persistence of established strains), although increasing this practice would undoubtedly reduce outbreaks. The gold standard for strain typing has been pulsed-field gel electrophoresis (PFGE), but its multiple disadvantages have encouraged development of alternative methods, in particular whole genome sequence-based single nucleotide polymorphism analysis (WGS-SNP). Although providing high resolution, WGS-SNP requires major investments in equipment, reagents, and personnel, largely limiting its use to government-supported labs. Neither PFGE nor WGS-SNP are practical alternatives for food processors, and government labs would also benefit from a complementary sequence-based typing approach that is more rapid, less costly, and user-friendly. MicrobiType was founded in 2014 to address this need. Its technology platform ? polymorphic locus sequence typing (PLST) ? is based on standard PCR and dideoxy sequencing, and is hence simple, affordable, and robust. Its novelty is in its targeting of specific genomic loci (patent pending) that have been bioinformatically determined to be the most phylogenetically informative, as a consequence of highly polymorphic tandem repeats. This Phase I proposal will build upon recently published and commercially implemented PLST services for Salmonella and Listeria by developing and evaluating similar services for STEC, along with the relatively neglected but increasingly important foodborne pathogens Campylobacter and Vibrio parahaemolyticus.
Project Methods
The approach used to develop and evaluate PLST typing for STEC, C. jejuni/C. coli, and V. parahaemolyticus will be analogous to the approach previously used with Listeriaand Salmonella. This general algorithmis outlined below, beginning with the bioinformatic identification of PLST loci and primer design (steps 1-9), followed by laboratory testing (steps 10-16). Finally, specific details relevant to each of the three species/groups are provided.1. Tandem repeats are identified within the complete genome sequences from 3 representative strains using Tandem Repeats Database (https://tandem.bu.edu). From the 200 or so loci, approximately 20 with highest repeat number and lengths between 6 and 60 bases are downloaded along with 500 base of flanking sequence.2. Each repeat locus is used as query in BLASTN searches (https://blast.ncbi.nlm.nih.gov) of the GenBank Nucleotide and Genomes databases, and the locus evaluated with respect to: (a) presence in all or nearly all strains of that species/group, (b) number of distinct alleles as approximated by the number of different "Max score" values, and (c) overlap if any with VNTRs from published MLVA methods and consideration of their diversity index and allele number.3. For the 5 or so most promising loci based on the above, the full sequences (500 base flanks plus repeat regions) are downloaded from all strains in the Nucleotide and Genomes databases (separate BLASTN searches are used to determine if strains with the same Max score have identical or distinct sequences).4. Downloaded sequences (reoriented as needed to reverse complement) are aligned with Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo), in PHYLIP format.5. PHYLIP alignments are analyzed using dnapars (DNA parsimony; default parameters) in the PHYLIP package (version 3.69; http://evolution.genetics.washington.edu/phylip.html), which, importantly, weighs both insertions/deletions and SNPs. Dendrograms are generated using drawgram.6. Dendrograms are evaluated with respect to (a) allele number, (b) available epidemiological data (e.g., isolates from same outbreak or food/environmental source), and (c) other typing data. For the latter, this typically involves comparisons to other PLST loci, to available serotype data, to available PFGE data, to available WGS-SNP-based phylogenies, and to MLST sequence types (determined by downloading the 7 loci from each strain's genome sequence and submitting to the relevant database; e.g., http://pubmlst.org/campylobacter).7. To estimate relative strain resolution, the diversity index (Simpson's dominance) is calculated (www.alyoung.com/labs/biodiversity_calculator.html).8. For the 2 most promising PLST loci selected based on the above, the BLASTN search is repeated against the GenBank WGS database, followed again by sequence downloads, clustal alignment, and dnapars/dendrogram analysis. For most foodborne pathogens,the WGS database includes hundreds or thousands of strains, and is thus an invaluable resource for this project. On the other hand, there are two common issues with this database: (a) longer repeat regions are often incomplete since these genomes are more fragmented and Illumina-type sequencing technologies struggle with repeat regions; and (b) WGS genomes often lack annotation; i.e., source, year and location of strain isolation. Thus, WGS-derived sequences must be carefully selected to avoid bias. Unfortunately, GenBank's short read archive (SRA) database is not useful for analyzing tandem repeat regions.9. For primer design, clustal alignments are used to identify conserved sequences within the upstream and downstream flanks, ideally separated by 800 to 900 bases (the limit for clearly readable dideoxynucleotide sequencing). Conserved seqeunces typically fall within protein-coding regions, so the the reading frame is taken into consideration (i.e., avoiding the wobble position of codons at the primer 3' terminus. Candidate primers (3 upstream and 3 downstream, for comparison of amplification efficiency, for PCR versus sequencing, and for nested PCR if required) are further screened by BLASTN searches of the WGS database. Final adjustments are made to yield Tm ≈ 60oC before ordering (Integrated DNA Technologies).10. DNA templates for PLST are prepared by simple and safe heat lysis in the client's lab (for this project, the "clients" are USDA-ARS consultants/collaborators). Specifically, isolated colonies are suspended in 200 ul Tris/EDTA (10/1 mM); turbidity should approximate McFarland standard 1. Tubes (screw-capped) are incubated in a 100oC heat block (or boiling water bath) for 15 min, with a cover to ensure bacterial killing throughout the tube. Tubes are transported to MicrobiType overnight at ambient temperature and without biohazard packaging (which substantially reduces shipping costs).11. At MicrobiType, tubes are centrifuged to pellet cell debris. PCR is conducted with Taq polymerase as recommended by the manufacturer (New England BioLabs), with minor modifications. Template (0.5 to 2 ul of lysate) and cycle number (28 to 32) are adjusted according to the turbidity of the bacterial lysate. Template-free controls are included to rule out contamination. (In initial studies with representative templates, all 6 combinations of the 3 upstream and 3 downstream primers are tested; subsequently, the optimum primer pair is used.) (Time ≈ 2.5 h)12. Aliquots from the PCR tubes are analyzed by conventional agarose gel electrophoresis and SYBR Safe staining (Invitrogen) to assess yield and quality of product. (Time ≈ 2.5 h).13. For sequencing, PCR products (1 to 3 ?l) are treated with ExoSAP-IT as recommended by the manufacturer (Affymetrix), sequencing primer is added to 2 ?M, and samples sent by courier to Genewiz (South Plainfield, NJ). Sequences are typically available online the following morning. (Time ≈ 18 h)14. DNA sequences are edited as needed based on visual inspection of the chromatograms, and trimmed to common termini. Sequences are analyzed by BLASTN searches of the GenBank Nucleotide, Genomes, and WGS databases to identify any identical matches. Database annotations and dendrogram results (step 8 above) for the matched strain are recorded (Time ≈ 0.5 h per sequence).15. Sequences lacking an identical GenBank database match are added to the downloaded sequences (step 8) and analyzed by clustal and dnapars to generate a dendrogram as described above to show relationship to database strains. (Time ≈ 1 h)16. All non-confidential sequences are deposited in GenBank with relevant annotation.