SEED Grant: Genomic Content and Structure of the Bovine Major Histocompatibility Complex

SEED GRANT: GENOMIC CONTENT AND STRUCTURE OF THE BOVINE MAJOR HISTOCOMPATIBILITY COMPLEX

Sponsoring Institution

State Agricultural Experiment Station

Project Status

COMPLETE

Funding Source

STATE

Reporting Frequency

Annual

Accession No.

1019855

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

May 23, 2019

Project End Date

Oct 31, 2020

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
UTAH STATE UNIVERSITY
(N/A)
LOGAN,UT 84322

Performing Department
Animal Dairy & Veterinary Sciences

Non Technical Summary
The major histocompatibility complex (MHC) is a large gene-dense region of the genome that contains many genes important for immunity. The genes for which this region is named, the MHC genes, are highly variable, and this variation is critical for the health of populations. In many species, this variation is derived from sequence differences among the genes. However, in cattle and other ruminants, this variation appears to be generated by a combination of sequence variation and differences in gene content. In other words, different animals have different genes. These differences have come to be by gene duplication and deletion. These variations among individuals, and the duplicated genes, complicate study of this region--in particular, genome assembly. This project will use new technology to find the best ways to sequence through this region to determine which genes are present in a small number of samples. We will explore two very different sequencing methods to see which produces the best data. We will also use a new method to select just that part of the genome away from the rest of the DNA for sequencing (which will make the process much more cost-efficient). Previous selection methods yield small pieces of DNA that are unsuitable for this project. The goals of this project are to (1) determine the best ways to sequence and assemble this part of the genome, (2) generate more data demonstrating the breadth of variation, and (3) generate preliminary data demonstrating that these methods will work in order to strengthen a full-scale grant proposal to the USDA. Successful completion of this project will strengthen USU's position as leaders in this field, and will significantly strengthen a proposal to the USDA for external funding. Our group and others have speculated on this topic and done some investigation, but only at a cursory level. This will be the first concerted effort to understand how the bovine MHC diversity is created and maintained, and to understand the genomic structure of this region.

Animal Health Component

(N/A)

Research Effort Categories

Basic

100%

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
304	3310	1040	25%
304	3310	1090	25%
304	3410	1040	25%
304	3410	1090	25%

Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3310 - Beef cattle, live animal; 3410 - Dairy cattle, live animal;

Field Of Science
1040 - Molecular biology; 1090 - Immunology;

Keywords

major histocompatibility complex

genomics

next generation sequencing

Goals / Objectives
The goal of the larger project is to sequence the entire MHC region from several disparate haplotypes to discover the range of structural variation in the bovine MHC region to better understand how diversity is generated and maintained and how that impacts the function of the bovine immune system. This purpose of the work proposed here is to: (1) determine the best approach to prepare and sequence libraries for this project, (2) generate more data demonstrating that the hypothesized variation exists, and (3) demonstrate that the methods proposed will yield high-quality sequence assemblies to strengthen a full proposal to the USDA.

Project Methods
The first task to be undertaken is to identify appropriate samples for full length MHC sequencing. These samples will be identified among the cattle at the USU dairy by MHC genotyping. We will use our already-established genotyping protocol. Briefly, exon 2 from the MHC-II genes and exons 2 and 3 from the MHC-I genes will be amplified by PCR using the Fluidigm Access Array system. Sequencing adapters and indexes will be added during the amplification process. After amplification, the PCR products will be cleaned and pooled and sequenced on the Illumina MiSeq. This sequencing data will be processed to identify the alleles present, and therefore the genotype, of each sample. From these, we will select appropriate homozygous and heterozygous individuals for further analysis. Of the candidates identified, we will select two haplotypes that are the most different, based on the putative genomic structure, with the goal of having as many of the predicted gene loci represented as possible.SequencingOne of the primary issues to be worked out in this project is to determine the relative efficiency and effectiveness of two different sequencing approaches. These two approaches are (1) 10x Genomics linked-read genomic sequencing, and (2) Oxford Nanopore long read sequencing.(1) Typically, long sequencing reads are required to generate phased haplotype assemblies. Linked-read sequencing technology from 10x Genomics uses microfluidics to partition and barcode DNA, which is then sequenced on traditional short-read sequencing platforms. The result is that reads originating from each original molecule of DNA in the sample are individually barcoded, allowing the data from each of those individual molecules to be assembled individually. Fully sequenced phased haplotypes of the HLA region (human MHC) have been generated using this technology. Samples will be submitted to the Genomics and Bioinformatics Core at the Huntsman Cancer Institute for sequencing using this method. These data will be assembled using supernova, a software package developed by 10x Genomics specifically for assembly of linked-read sequencing data into diploid genome assemblies.(2) The second approach to be used is nanopore sequencing using the Oxford Nanopore MinION. This sequencer is capable of extremely long sequence reads, limited (in theory) only by the length of the DNA molecules. Reads of 10s to 100s of kb are routine, and a read over 2 million bp was recently reported. MinION sequences has been shown to be capable of producing de novo genome assemblies revealing large structural variants and enabling assembly and phasing of the entire HLA region. While the longer reads of the MinION will be very valuable to this project, the accuracy of nanopore sequencing reads is relatively low. For this reason, these samples will also be sequenced on the Illumina NextSeq using traditional methods for error correction purposes. The MinION data will be assembled with and without the NextSeq data to determine whether this error correction step is necessary.Target selectionThe other methodological issue to be resolved by in this project is that of target selection. This region of the genome is approximately 3.3 million bp in length, which is only 0.12% of the entire genome. Sequencing costs can be reduced significantly if that region can be targeted and sequenced without the rest of the genome. However, current methods of target selection are based on either PCR amplification or probe hybridization and capture of the desired regions. Both of these approaches yield DNA molecules a few thousand base pairs in length at best and are wholly unsuitable for this project. Sage Science has recently released an instrument (SageHLS) capable of selecting large genomic regions in fragments up to 500 kb. An application of this system, called CATCH, starts with whole cells and uses CRISPR-Cas9 to cleave and capture the desired section of the genome. Guide RNA molecules are designed to target and cleave the genome at specific sites, and the cleaved DNA is collected by the SageHLS system. This has already been paired with 10x Genomics linked-read sequencing for sequencing and assembly of genomic regions, including HLA.

Progress 05/23/19 to 10/31/20

Outputs
Target Audience:This study will be of interest to the geneticists and veterinary immunologists. A second target audience for this project is federal funding agencies in the form of a grant proposal. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One bioinformatics graduate student has worked on the assembly. How have the results been disseminated to communities of interest?Thus far, no dissemination has taken place. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? DNA samples were collected from approximately 300 cattle at the USU dairy. These samples were used to genotype the cattle at the MHC locus to identify homozygous individuals for sequencing of the full MHC region. For two individuals, a full genomic sequence was obtained using the MinION sequencer. Other samples were used to test other methods of sequencing to focus on the MHC region. One approach used the Sage HLS instrument, along with CRISPR/Cas9 and guide RNAs on either side of the MHC region to isolate just that region for sequencing. This approach worked, but not very well. The enrichment was not good enough, and the resulting short-read sequences could not be reliably assembled. The second approach used CRISPR/Cas9 to enrich the sequencing libraries for reads in the desired region. These libraries are sequenced on the Oxford Nanopore MinION. Guide RNAs were designed that were tiled across the entire MHC region at intervals of approximately 20,000 bp. This library prep method adds sequencing adapters ideally only at cleavage sites, so that sequencing data comes from only those regions. In reality, it enriches the data for the desired region such that a good run yields as much as 3% of data in that region, which is approximately 100 times greater than a random library without this enrichment process. Using this method, assemblies were obtained for 4 MHC-I haplotypes. In most cases, the sequence is not one complete contig throughout the MHC-I or MHC-II region, but is in 2 separate contigs with a break in between. One of the MHC-I haplotypes, AH012A is a very common haplotype and is associated with multiple MHC-II haplotypes. Several individuals with the AH012A haplotype were sequenced, so of these, data was obtained for 3 MHC-II haplotypes. In all, sequence assemblies were obtained for 6 MHC-II haplotypes, although these are, in general, not as complete as those for the MHC-I haplotypes. In the MHC-I region, it was found that there seems to be a genera structure that all of the haplotypes follow. In the classical region, starting from one end, there is a MHC-I gene that has never been observed to be expressed, but appears to be a complete intact gene, followed by the NC1 gene approximately 25,000 bp downstream. Between the NC1 locus and the other end of the classical region, there are a variable number of classical MHC-I genes in varying arrangements. The rearrangements that multiply and delete the MHC-I genes appear to be confined to this region, although the mechanism for this is not known. All of the haplotypes analyzed have this same structure. One major goal of this project was to identify the best way to sequence this region. The CRISPR/Cas9-MinION approach worked well and is the best approach for this work moving forward.

Publications

Progress 05/23/19 to 09/30/19

Outputs
Target Audience:This study will be of interest to the geneticists and veterinary immunologists. A second target audience for this project is federal funding agencies in the form of a grant proposal. Changes/Problems:The 10x Genomics Linked-Reads technology in the original proposal is no longer available due to legal action against the manufacturer. This was replaced by the TELL-Seq technology from Universal Sequencing. What opportunities for training and professional development has the project provided?One bioinformatics graduate student has worked on the assembly. How have the results been disseminated to communities of interest?Thus far, no dissemination has taken place. What do you plan to do during the next reporting period to accomplish the goals?More blood samples will be collected for further runs on the SageHLS to better optimize this protocol. Although it has worked to some degree, there is still room for improvement. The analysis of the data collected so far and the data still to be collected will be analyzed and the different approaches will be compared to determine which is the best protocol for this work. At the end, if there is sufficient funds available, the original two samples that were sequenced fully on the MinION will also be sequenced using Illumina sequencing technology and combined with the MinION data for two high-quality whole genome assemblies for submission to public databases.

Impacts
What was accomplished under these goals? We have collected DNA samples from nearly 300 cattle at the USU dairy. These samples have been used to determine the MHC genotypes of those animals. These genotypes were used to select individuals for full-length genomic sequencing of the MHC region according to the original proposal. Blood was collected from selected individuals for further analysis. Genomic DNA was isolated and subjected to sequencing on the MinION. Full genomic DNA sequence has been generated for 2 individuals of two different haplotypes. This data is currently being assembled. The Sage HLS instrument has been used to isolate the MHC region from genomic DNA. This instrument begins with fresh live cells and uses CRISPR/CAS9 to cleave out the targeted region. Several samples have been run through this instrument to date. This DNA has been analyzed in two different ways. First, Illumina sequencing libraries have been prepared using the TELL-Seq library system from Universal Sequencing. The original proposal planned to use the 10x Genomics Linked-Reads technology, but that is no longer a viable solution due to legal action against 10x Genomics. The TELL-Seq approach is similar and was deemed a suitable replacement. These libraries have been sequenced and analysis is ongoing. The second approach for sequencing this DNA is the MinION. This sequencer requires much more DNA than the typical yield from the SageHLS instrument. Alterations to the protocol to account for reduced starting material have been tried with some success. This data is also currently in the analysis phase.

Publications