Big Data: Biocomputing, Bioinformatics, and Biological Discovery

Goals / Objectives
The Institute for Genomics, Biocomputing & Biotechnology (IGBB) at Mississippi State University (MS State) and the ARS¿s Genomics & Bioinformatics Research Unit (GBRU) will continue their collaborative efforts to improve the biocomputing infrastructures of MS State and the ARS while advancing big data biological research. Goals of the current project include (but are not limited to) [a] acquisition of computer hardware/software that will facilitate computational biology research by the IGBB and the ARS; [b] providing assistance to the ARS while it works to establish its own supercomputing facility; [c] generating novel computer scripts and adapting existing scripts for big data analyses; [d] conducting big data genomics and proteomics research on organisms of agricultural importance in collaboration with the GBRU and other ARS units; [e] using differential gene expression analysis to explore health and fertility issues in livestock; [f] using biomolecular data/tools produced in the study of model organisms to advance research on agricultural species; [g] establishing and testing both laboratory and computational pipelines/protocols for producing, analyzing, and protecting sensitive genomic data and accompanying metadata; and [h] participating in training ARS and other scientists in big data management and analyses.

Project Methods
Pacific Bioscience (PacBio, long-read), Oxford Nanopore (long-read), and Illumina (short-read) sequencing technologies will be used to generate DNA and transcriptome sequence data. Proteomics data will be generated using the IGBB¿s LTQ Orbitrap Velos mass spectrometer. The IGBB will utilize its supercomputing capacity and expertise to conduct genome assembly, SNP calling, comparative genomics research, and proteome analyses. RT-qPCR will be used to validate RNASeq-based differential gene expression results. Computational tools developed for this project will be tested on existing and new data sets, with laboratory support data being generated as needed. Workshops will be developed by IGBB faculty/staff to train ARS and other scientists in advanced data management and analysis techniques.