Performing Department
Animal Dairy & Veterinary Sciences
Non Technical Summary
The major histocompatibility complex (MHC) is a large gene-dense region of the genome that contains many genes important for immunity. The genes for which this region is named, the MHC genes, are highly variable, and this variation is critical for the health of populations. In many species, this variation is derived from sequence differences among the genes. However, in cattle and other ruminants, this variation appears to be generated by a combination of sequence variation and differences in gene content. In other words, different animals have different genes. These differences have come to be by gene duplication and deletion. These variations among individuals, and the duplicated genes, complicate study of this region--in particular, genome assembly. This project will use new technology to find the best ways to sequence through this region to determine which genes are present in a small number of samples. We will explore two very different sequencing methods to see which produces the best data. We will also use a new method to select just that part of the genome away from the rest of the DNA for sequencing (which will make the process much more cost-efficient). Previous selection methods yield small pieces of DNA that are unsuitable for this project. The goals of this project are to (1) determine the best ways to sequence and assemble this part of the genome, (2) generate more data demonstrating the breadth of variation, and (3) generate preliminary data demonstrating that these methods will work in order to strengthen a full-scale grant proposal to the USDA. Successful completion of this project will strengthen USU's position as leaders in this field, and will significantly strengthen a proposal to the USDA for external funding. Our group and others have speculated on this topic and done some investigation, but only at a cursory level. This will be the first concerted effort to understand how the bovine MHC diversity is created and maintained, and to understand the genomic structure of this region.
Animal Health Component
0%
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Goals / Objectives
The goal of the larger project is to sequence the entire MHC region from several disparate haplotypes to discover the range of structural variation in the bovine MHC region to better understand how diversity is generated and maintained and how that impacts the function of the bovine immune system. This purpose of the work proposed here is to: (1) determine the best approach to prepare and sequence libraries for this project, (2) generate more data demonstrating that the hypothesized variation exists, and (3) demonstrate that the methods proposed will yield high-quality sequence assemblies to strengthen a full proposal to the USDA.
Project Methods
The first task to be undertaken is to identify appropriate samples for full length MHC sequencing. These samples will be identified among the cattle at the USU dairy by MHC genotyping. We will use our already-established genotyping protocol. Briefly, exon 2 from the MHC-II genes and exons 2 and 3 from the MHC-I genes will be amplified by PCR using the Fluidigm Access Array system. Sequencing adapters and indexes will be added during the amplification process. After amplification, the PCR products will be cleaned and pooled and sequenced on the Illumina MiSeq. This sequencing data will be processed to identify the alleles present, and therefore the genotype, of each sample. From these, we will select appropriate homozygous and heterozygous individuals for further analysis. Of the candidates identified, we will select two haplotypes that are the most different, based on the putative genomic structure, with the goal of having as many of the predicted gene loci represented as possible.SequencingOne of the primary issues to be worked out in this project is to determine the relative efficiency and effectiveness of two different sequencing approaches. These two approaches are (1) 10x Genomics linked-read genomic sequencing, and (2) Oxford Nanopore long read sequencing.(1) Typically, long sequencing reads are required to generate phased haplotype assemblies. Linked-read sequencing technology from 10x Genomics uses microfluidics to partition and barcode DNA, which is then sequenced on traditional short-read sequencing platforms. The result is that reads originating from each original molecule of DNA in the sample are individually barcoded, allowing the data from each of those individual molecules to be assembled individually. Fully sequenced phased haplotypes of the HLA region (human MHC) have been generated using this technology. Samples will be submitted to the Genomics and Bioinformatics Core at the Huntsman Cancer Institute for sequencing using this method. These data will be assembled using supernova, a software package developed by 10x Genomics specifically for assembly of linked-read sequencing data into diploid genome assemblies.(2) The second approach to be used is nanopore sequencing using the Oxford Nanopore MinION. This sequencer is capable of extremely long sequence reads, limited (in theory) only by the length of the DNA molecules. Reads of 10s to 100s of kb are routine, and a read over 2 million bp was recently reported. MinION sequences has been shown to be capable of producing de novo genome assemblies revealing large structural variants and enabling assembly and phasing of the entire HLA region. While the longer reads of the MinION will be very valuable to this project, the accuracy of nanopore sequencing reads is relatively low. For this reason, these samples will also be sequenced on the Illumina NextSeq using traditional methods for error correction purposes. The MinION data will be assembled with and without the NextSeq data to determine whether this error correction step is necessary.Target selectionThe other methodological issue to be resolved by in this project is that of target selection. This region of the genome is approximately 3.3 million bp in length, which is only 0.12% of the entire genome. Sequencing costs can be reduced significantly if that region can be targeted and sequenced without the rest of the genome. However, current methods of target selection are based on either PCR amplification or probe hybridization and capture of the desired regions. Both of these approaches yield DNA molecules a few thousand base pairs in length at best and are wholly unsuitable for this project. Sage Science has recently released an instrument (SageHLS) capable of selecting large genomic regions in fragments up to 500 kb. An application of this system, called CATCH, starts with whole cells and uses CRISPR-Cas9 to cleave and capture the desired section of the genome. Guide RNA molecules are designed to target and cleave the genome at specific sites, and the cleaved DNA is collected by the SageHLS system. This has already been paired with 10x Genomics linked-read sequencing for sequencing and assembly of genomic regions, including HLA.