DSFAS: Integrating multiomics and high-throughput phenotypic datasets through machine learning to improve animal resilience and welfare

Recipient Organization
PURDUE UNIVERSITY
(N/A)
WEST LAFAYETTE,IN 47907

Performing Department
Animal Science

Non Technical Summary
Animal welfare directly impacts the economic return, environmental efficiency, and consumers acceptability of animal products. As animal welfare is a multifaceted concept, there is a clear need to identify and integrate quantifiable measurements of physical, behavioral, emotional and physiological aspects to accurately predict future welfare status. This multidisciplinary proposal aims to integrate large-scale datasets generated in precision dairy and pig farms with multiomic datasets through machine learning methods to optimize management practices and genomic selection for improved animal welfare and resilience. Our specific objectives are to: 1) Evaluate computational and statistical algorithms and develop tools to process high-throughput datasets and derive novel indicators and predictors of animal welfare; 2) Perform data mining analyses to identify novel indicators of animal welfare based on a multitude of high-throughput datasets collected in precision farms; and, 3) Integrate multiple phenotypic and multiomic variables to accurately predict animal welfare status and the genetic merit of breeding animals. For a broader applicability of our findings and greater representability, we will focus on different welfare issues in pigs and dairy cattle: heat stress, pre-weaning mortality, and overall resilience; and temperament in automated milking systems (milking robots). Various datasets and stakeholders support are available for the research and the proposal aligns well with the USDA strategic goals and DSFAS program area priorities. Altogether, we will recommend best practices, build the methods and tools to process and integrate large-scale datasets, and generate innovative and trustworthy strategies for management and genomic-enhanced breeding for improve welfare and resilience in livestock.

Animal Health Component

30%

Research Effort Categories

Basic

70%

Applied

30%

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
303	3410	1080	25%
303	3510	1080	25%
315	3510	1020	20%
306	3510	1080	15%
307	3410	1081	15%

Knowledge Area
303 - Genetic Improvement of Animals; 307 - Animal Management Systems; 306 - Environmental Stress in Animals; 315 - Animal Welfare/Well-Being and Protection;

Subject Of Investigation
3510 - Swine, live animal; 3410 - Dairy cattle, live animal;

Field Of Science
1081 - Breeding; 1080 - Genetics; 1020 - Physiology;

Keywords

Goals / Objectives
The long-term goal of our research group is to develop methods, tools, and biological knowledge needed to improve livestock welfare. The overall objective of this proposal is to use machine learning techniques to derive novel indicators of welfare and predict animal welfare status and overall resilience based on precision technology and multiomic datasets. Our specific objectives are to: 1. Evaluate computational and statistical algorithms and develop tools to process high-throughput datasets and derive novel indicators and predictors of animal welfare; 2. Perform data mining analysis to identify novel indicators of animal welfare based on a multitude of high-throughput datasets collected in precision farms; and, 3. Integrate multiple phenotypic and multiomic variables to accurately predict future performance (animal welfare) and the genetic merit of breeding animals.

Project Methods
Objective #1: Develop and evaluate computational and statistical algorithms and tools to process high-throughput datasets and derive novel indicators and predictors of animal welfare. 1.1 Description of datasets Dairy cattle. Pedigree, phenotypic (variables measured on individual animals) and genomic datasets for Holstein cattle animals will be obtained from different sources. The first data source will be the Homestead Dairy Farms (Plymouth, Indiana), which already has Non-disclosure and Data-sharing Agreements with Purdue researchers (Drs. Brito and Boerman). This is the largest milking robot farm in North America, in which there are currently 36 milking robot stations. There is pedigree, genotypic, and phenotypic information for >15,000 cows, in which >8,000 have been milked through milking robots. In addition to milking robot datasets, there are also pedometers and activity collars, video-imaging in both calf and cow barns(24-h monitoring), automated milk intake recorders, daily milk production, reproduction, and health records. The other sources of dairy cattle datasets will be ST Genetics farms that have precision technologies and are spread across the U.S. In summary, there will be over 50,000 animals with pedigree, genotypes, and phenotypic variables to be used for the analyses proposed here. All the animals were either genotyped with (or imputed to) 50K SNP (single nucleotide polymorphism) chip panel. Pigs. Several datasets are available for pigs. These datasets have been generated at the Livestock Behavior Research Unit (USDA-ARS), Purdue University, on-going research projects funded through state and federal sources, and through partnership with private companies (see Letters of Support). First, characterizing the physiological and thermoregulatory response of pigs to increasing ambient temperatures. These include studies with animal numbers ranging from 24 to 108 pigs. During the course of these experiments, pigs were exposed to either thermoneutral or cycling heat stress conditions and body temperatures were monitored continuously (15 min intervals for 24 hours per day) via the use of vaginal implants or intraperitoneal implants. In addition, depending on the experimental objectives, blood samples were obtained before, during and after the heat stress exposure to characterize stress, post-absorptive metabolism, and immune and stress biomarkers (metabolomics). Measures of animal performance including feed intake and growth were also collected. In addition to the data sets already available, a NIFA funded experiment was conducted between June and July 2021 where thermoregulatory, behavioral, and production data were collected on a total of 1,600 lactating sows. Fecal microbiome analysis was alsocollected and will be analyzed for these sows. Hair cortisol analysis will also be conducted in 1,500 of these sows with support from this proposal. Furthermore, epigenomic analyses have been conducted (PDs Brito and Johnson) in pigs that were gestated by sows under thermoneutral or heat stress conditions )see Preliminary Data section). Daily feed intake records and more than 100 production, reproduction, management, and health variables from at least 50 pig farms will be provided by Jyga Technologies. These datasets will be mainly used to investigate overall resilience in pigs based on variations in daily feed intake. 1.2. Video analytics system development We will develop video analytics algorithms to extract movement, facial measurements, and animal interactions from cattle and pigs in an open pen. Multiple algorithms are necessary, each tailored for the specific viewpoint and target information to extract. Tracking and pose estimation will provide information about animal location, posture, and movement; pattern matching will characterize facial expressions; and activity recognition will quantify social interactions. ?1.3. Analytics of precision technology and multiomic datasets A strict data editing and quality control will be performed to eliminate non-informative or incomplete records. Raw phenotypes for each variable will be extracted from the precision technologies used (e.g. milking robots, activity monitors). Additional variables will be obtained from the herd management systems, such as conformation, health, reproduction, and production records. Machine learning algorithms will be utilized to analyze the high-throughput phenotypes collected. These algorithms will include support vector machines (SVMs), randomized forests, and deep learning models based on neural networks, among others. Python is the language of choice for machine learning these days, and we will apply the best available Python packages, such as Sci-Kit Learn, TensorFlow, and Keras. 2. Objective #2. Perform data mining analysis to identify novel indicators of animal welfare based on a multitude of high-throughput datasets collected in precision farms. In this objective, we will focus on data mining of the variables generated in Objective #1. Related to animal resilience we will be focusing on variables including feed intake, growth rate, normality of behavior, and speed of acquisition of predictive positive feed rewards (i.e. adoption to the autofeeder or milking system). After quality control and processing of the raw datasets, we will merge all the datasets in a common platform (leveraged through a research project funded by Purdue University; see Letter of Support and Fig. 5). This will be done using Python and R scripts. Each data source will be evaluated separately to better understand all the variables generated in Objective #1 and define the biological meaning of numerous variables. This will be led by the animal welfare science and multiomic specialists. Various computational tools will be used to data mine these datasets, especially R packages and Python scripts developed by the researchers, such as: https://zenodo.org/record/3366107#.YQLLcy2cZR4. Based on the biological expertise of the team combined with the machine learning experts, various traits will be derived. 3. Objective #3. Integrate multiple phenotypic and "-omic" variables to accurately predict future performance (animal welfare) and the genetic merit of breeding animals. In this objective, we will use the tools, datasets, and biological knowledge generated in Objectives #1 and #2 to integrate all the information available to predict welfare status of individual animals and their genetic merit for traits related to welfare and resilience. Considering the large number of welfare risks in the livestock industry, we will focus on: 1) heat stress in pigs; 2) pre-weaning mortality in pigs and cattle; 3) overall resilience [measured based on daily variations in feed intake (pigs), milking robot variables, milk intake (calves), milk traits (dairy cows), health records, and walking ability - lameness indicator, Fig. 2]; and 4) dairy cattle temperament in automated milking systems (milking robots). Classification and regression methods will be implemented for categorical and continuous outcomes, respectively, including techniques based on penalized regression and hierarchical modeling [130], as well as machine learning algorithms such as support vector machines and regression, random forest, and multilayer perceptron neural networks [133]. Statistical learning approaches suitable to high-dimensional data will be implemented and compared, including partial least-squares and penalized regression techniques such as lasso and elastic net [132]. In addition, machine learning algorithms will be employed including support vector machines, random forest and deep neural networks [133]. The main outcomes of this objective will be the development of tools and definition of the most accurate and least biased prediction models of the four welfare risks described above and genomic breeding values for these welfare and resilience traits.