Analysis of genomic variation in Mycoplasma bovis isolates from mastitis and respiratory disease; how typical is the Type strain PG45?

ANALYSIS OF GENOMIC VARIATION IN MYCOPLASMA BOVIS ISOLATES FROM MASTITIS AND RESPIRATORY DISEASE; HOW TYPICAL IS THE TYPE STRAIN PG45?

Sponsoring Institution

Cooperating Schools of Veterinary Medicine

Project Status

COMPLETE

Funding Source

STATE

Reporting Frequency

Annual

Accession No.

1002898

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Apr 4, 2014

Project End Date

Dec 31, 2014

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
UNIVERSITY OF MISSOURI
(N/A)
COLUMBIA,MO 65211

Performing Department
Veterinary Pathobiology

Non Technical Summary
Mycoplasmas are important bovine pathogens which significantly impact the US cattle industry. Annual economic losses due to Mycoplasma mastitis and respiratory disease have been estimated to be greater than $100 million. This proposal seeks funding to study genomic variation among mastitis and respiratory isolates. to fill a current knowledge gap which should facilitate the development of better strategies for disease control and prevention.

Animal Health Component

10%

Research Effort Categories

Basic

80%

Applied

10%

Developmental

10%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
311	3410	1100	100%

Knowledge Area
311 - Animal Diseases;

Subject Of Investigation
3410 - Dairy cattle, live animal;

Field Of Science
1100 - Bacteriology;

Keywords

mycoplasma bovis

mastitis

bovine respiratory disease

genomics

virulence factors

Goals / Objectives
Acquire, bank and begin genotypic analysis of approximately 100 Mycoplasma bovis isolates.Determine five genome sequences (four Mycoplasma bovis isolates and one Mycoplasma californicum strain).Use PCR to determine distribution of strain variable genomic loci in M. bovis isolates.Use next generation sequencing to generate additional draft genome sequences of M. bovis .

Project Methods
1) Use PCR to determine distribution of strain variable genomic loci in M. bovis isolates Initially 50 mastitis-associated M. bovis isolates and 50 respiratory tract isolates will be studied, together with strain PG45 as a control. Low passage broth cultures of each strain will be grown and total genomic DNA prepared from harvested cells using commercially available kits (Qiagen). Template DNA will then be used in PCR using Phusion High Fidelity DNA polymerase (Fermentas) to test whether specific gene targets are present or absent. For each strain, a positive control reaction will be performed using the M. bovis uvrC gene (55). The targets selected for this study are listed in Table 1 and include 6 genes that are only found in the PG45 genome and 8 genes that are >99% identical between strains HB0801 and Hubei-1, but that are absent from PG45. For single genes that are strain variable in presence or absence between PG45 and the Chinese isolates, an amplicon of the appropriate size will be considered as a positive result. In cases where the gene is not amplified, a further round of PCR will be performed using lower stringency conditions in case there is a sequence mismatch between primers and the target sequence in a given isolate. In parallel, PCR between the conserved flanking genes will be performed. This will either confirm the presence of the target gene, disclose the presence of an "empty site" that lacks the target, or reveal the presence of an unexpected gene or IS unit. In the latter case, amplicon sequencing will be used to identify gene "x".For larger regions of genomic difference, PCR will be used to query the presence of four target genes among the 100 isolates. PCR will also be performed between the target genes and the conserved flanking genes. Strains that have neither the gene arrangement of PG45 or HB0801 will be analyzed further by inverse PCR. This approach allows "chromosome walking" to be performed starting at a known sequence and permitting sequencing of adjacent "unknown" sequence. This strategy has been used by the PI to sequence three different mobile genetic elements in Mycoplasmas that were greater than 20 kb and that had no significant similarity to known sequences. Once a new configuration has been established (if present), then specific PCR primers will be designed for the new genes and used to assess distribution in other M. bovis isolates. Initially, highest priority will be given to the three closely linked regions that represent ~25 kb of strain variable DNA. It is not known whether strains will have the PG45 pattern or that exhibited by both HB0801 and Hubei, or whether this will segregate with mastitis or respiratory tract isolates. Depending on the distribution, this study will be extended to include additional isolates, as necessary. Certain surface proteins were found to be unexpectedly divergent between PG45 and the two Chinese isolates. To determine whether the US isolates have one or other gene sequence, PCR and sequencing will be performed for each of these genes. In this instance it is expected that the gene will be present in all isolates since they are present in M. agalactiae also and thus appear to be conserved (in presence) within these closely related taxa. Sequencing of each amplicon will be performed in 96 well plate format at the DNA Core facility and the resulting sequences compared using Blast and ClustalW to delineate sequence diversity within the group. For any gene, it is possible that (i) the gene is absent; (ii) the gene matches closely the PG45 or Chinese isolate sequence, or (iii) contains a distinct genomic signature that might be prevalent within a group of US isolates. Whatever scenario is found, these data will provide useful new information regarding strain distribution of these highly divergent protein sequences. 2) Use next generation sequencing to generate additional draft genome sequences of M. bovisFour M. bovis field isolates will be selected for next generation sequence analysis, based upon the results obtained in Aim 1. It is planned that two mastitis and two respiratory tract isolates will be sequenced, that reflect the most common genotype(s) in the US as assessed by strain variable gene sets characterized in Aim 1. For example, if most US isolates contain the HB0801 signatures, field isolates harboring these will be sequenced to determine whether there is genome-wide similarity between the isolates (both to HB0801 and to other field isolates). By Illumina sequencing, approximately 15 gigabases of 100-bp reads can be generated per channel, representing over 150 million reads per sample. Based on an estimated genome size of 1 Mb, a single Illumina channel can be used to sequence 5 genomes at 200X coverage, which is sufficient to generate a draft genome sequence for each isolate. One limitation of this sequencing methodology is that the read length is too short to sequence across sequence repeats such as IS units (~1.5 kb) and the two rRNA operons. Nevertheless, this approach will enable the draft genome sequences to be assembled into contiguous stretches (contigs) between such repetitive sequences. Assembly of the Illumina reads will be perfomed using Velvet (56) or NextGENe (57), both of which have been used for Mycoplasma genomes in the PI's research. This will give a complete representation (with high depth of coverage) of the gene coding potential of an isolate genome which can then be annotated and compared with the three previously published genome sequences. The contigs for each genome will be submitted to the National Center for Biotechnology Information to be passed through the PGAAP auto-annotation pipeline (58). This free service results in genomes that have all ORFs and stable RNAs identified and annotated. The resulting annotated contigs of each genome will then be compared by BLAST to determine which of the reference genomes is the most similar, both in terms of nucleotide similarity for most housekeeping genes and for regions of variation. For ease of comparison between isolates, the contigs will be ordered based on the genome organization of the most similar reference strain. It is not within the scope of this proposal to close each genome and it is fully appreciated that such ordering is purely a hypothetical order to permit more facile strain comparison and does not take into account structural re-arrangements that may be present (14). As has been performed for the comparisons of the three M. bovis genomes, regions of variation including individual genes, large genetic loci, or variable genes that exhibit considerable sequence diversity between isolates, will be noted. Any regions of the newly sequenced isolates that appear to encode novel genes (lacking similarity to the gene sets of the three M. bovis genomes) will be analyzed further. Such loci could encode hitherto unrecognized mobile genetic elements, strain variable restriction systems, additional members of multi-gene families or genes that lack similarity to known sequences. Depending on the number and nature of any new genes found in the genomes, PCR will be carried out to assess distribution among the 100 isolates that were initially characterized.

Progress 04/04/14 to 12/31/14

Outputs
Target Audience: Microbiologists, Mycoplasmologists, Molecular genetics and Infectious Disease Researchers, animal health scientists. Changes/Problems: We had originally proposed Illumina sequencing which would be expected to give genomes in 30-50 contigs (sections) each. With the availability of “PacBio” long read sequencing we had the opportunity to potentially produce finished (closed/complete) genomes for each of the target strains. Although successful for M. dispar and M. californicum, and for the majority of the chromosome of M. bovis, it has not yet been possible to get complete assembly across the highly plastic vsp-gene region. The high frequency re-arrangements in this region would be problematic for most next-gen methods. Although it was not in our initial proposal to “close/assemble” this region, we are currently exploring alternate cloning strategies to reach this goal. What opportunities for training and professional development has the project provided? Use of a new sequencing technology has necessitated self-training with different sequence assembly/alignment software. We have also implemented new annotation pipelines that became available during the last year. The PI has been invited to the biennial European Mycoplasma conference to give an overview of mycoplasma sequencing approaches. How have the results been disseminated to communities of interest? One genome sequence/annotation is publically available through GenBank and a manuscript published in an open access journal published by the American Society for Microbiology. A second manuscript should be submitted within 6 weeks, and a final manuscript later in 2015. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We received approximately 100 M. bovis isolates from Dr Larry Fox (Washington State University). Single colony isolates (“clones”) of each isolate together with a population of each isolate were separately cultured and the ~200 lineages banked. In addition, 8 contemporary M. bovis isolates (US origin) were obtained from the ARS Culture Collection (also known as the NRRL Collection). High quality genomic DNA has been prepared from these clonal isolates. Polymerase chain reaction (PCR) was used to assess the distribution of different genes and loci within this collection of isolates. We have initially screened for genes that were present in two respiratory isolates (China origin) but absent from the 1960s mastitis isolate from the US. PCR-based genotyping has generated amplicons corresponding to two segments of the “China” isolates in many, but not all, of the isolates analyzed so far (including the NRRL strain that has been submitted for genome sequencing [see below]). In addition, we have also assessed variability in the surface protein LppD, that exhibits significant divergence between the two China isolates (which are almost identical) and the single US isolate for which the lppD sequence is known. Using primers specified to the Chinese isolate sequences, an internal portion of lppD was successfully amplified from >50% of strains. Large genomic DNA preparations have also been prepared from the type strains of M. californicum and M. dispar, a mastitis and respiratory pathogen of cattle, respectively. These were sequenced at the National Center for Genome Resources (NCGR) in New Mexico. Although in the original proposal, Illumina platform genome sequencing was proposed, recent advances have made the Pacific Biosciences system cost effective. The longer read lengths of the “PacBio” technology often enables genomes to be closed into a single sequence. Because of the large number of Insertion Sequence repeats in M. bovis, it was anticipated that the Illumina sequencing would generate a genome that was in ~30-50 sections. To date, one M. bovis strain genome is in an advanced state of completion (2 sections/contigs); one large section encompassing 98% of the genome (999.7 kb) and one smaller section that contains the highly variable vsp locus (which is known to be subject to high frequency re-arrangement). The genome has been auto-annotated and is under manual refinement. Initial analysis has revealed that signature regions that had previously only been found in the genomes of two Chinese isolates, are present in the US isolate (but not in the PG45 reference genome that is of US origin). The genome contains a distinctive repertoire of IS insertion sites, together with a large conjugative element (ICEB-1). Notable among the features characterized to date are the presence of genes for two surface proteins that are predicted to be functional in this strain, but are defective (pseudogenes) in the three genomes characterized previously. The genome sequence of the Mycoplasma dispar reference strain is completely assembled; preliminary analysis of the 1, 084 kb genome revealed many shared genes with the respiratory pathogens, M. hyopneumoniae and M. ovipneumoniae, including at least 7 gene pairs that encode predicted adherence-related proteins. These are the first examples of such candidates for this species. It is anticipated that a fully annotated genome sequence will be submitted in early 2015. The M. californicum sequence and annotation have been completed; the genome is in the GenBank database and a manuscript reporting the genomic features has been published in an Open Access journal. This was the first genome for this mastitis-related mycoplasma species. The sequence data generated for these isolates represent a change in knowledge and each will form/has formed the basis of publications and GenBank submissions. The high success rate of assembly with these genomes has led our laboratory to adopt a change in action by using this sequencing platform instead of the previously used next gen sequencing methods.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Complete Genome Sequence of the Bovine Mastitis Pathogen Mycoplasma californicum Strain ST-6T (ATCC 33461T). Calcutt MJ, Foecking MF, Fox LK. Genome Announc. 2014 Jul 3;2(4). pii: e00648-14. doi: 10.1128/genomeA.00648-14. PMID: 24994797