Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695
Performing Department
Plant Pathology
Non Technical Summary
Data science has become a popular term in recent years, with different definitions, sometimes in conflict, available. Regardless of the way data science is defined, the enormous volume of data available nowadays that comes in different forms and in high speed requires a novel approach with respects todata acquisition, management, andinformation extraction. There are two major differences from past concepts about data: (a) today the majority of the data come in an unstructured fashion, such as text or image, and (b)traditional statistical methods may not be suitable forthe enormous data volume but likely new methods are required, such as machine learning. Fields such as health care have already taken advantage of the opportunities created by big data developing, for instance, personalized medical care initiatives. In this case mining unstructured text data, delivers previously unknown information, patterns and trends. Would unstructured data be equally important to agriculture? Most of the data in agricultural science are generated by well-designedexperiments and thus data areof high quality; however often they are few in numbers, a serious limitation when data are used to predict events, such as pests occurrence and spread. Investigations are needed to examine if unstructured data such as text from social media can enhance the available experimental data to improve predictability. In a first step, a dictionary will be developed to derive meaning from words and sentences commonly used for plant pests and diseases in order to classify documents, and create summaries of content. Then methods used in other fields, such as health care, will be implemented to combine unstructured data with experimental data, typically generated in agricultural sciences. Methods will be amended accordingly to tailagricultural problems. A second but relevant objective is the continuation of developing models that predict plant pests and diseases outbreaks. Those models are important to alert farmers and the society in general about potential risks,and minimizeeconomic losses in crops. However, those models when used generate more data because extension personnel talk on social media about them, and farmers click on websites to get the latest pest alerts. Using those unstructured data to improve model predictabilitymay save significant amount of money to farmers by improving accuracy and precision of alerts.Lastly, data science in conjunction with sensors holds an exciting opportunity for generating large datasets of phenotypic responses of plants to pests and pathogens' attacks. Plant sciences have made a tremendous progress in identifying and characterizing pests and pathogens' interactions with plants in the genetic and molecular level; however there is not a similar effort in the phenotypic level. Part of the problem is that phenotypic data are labor intense, require land and other resources not always available in the scale needed and thus the data may be limited and describe only specific events in a few points in time. Sensors become increasingly common and inexpensive. Recent applications of (micro-)sensors include health devices that continuously monitor patients' conditions, or athletes' muscle movement and performance just to name a few. Modifying those sensors to allow for continuous gathering of phenotypic plant data is an exciting perspective that will shade light on plants' behavior in ways we have never been able to describebefore. Those data can provideinformation about how plants defend against pests and pathogens under different stages of attack. This information is useful to identify critical responses at the genetic and molecular level and accelerate breeding for resistance to pests and diseases that cause economic losses to crops.
Animal Health Component
50%
Research Effort Categories
Basic
20%
Applied
50%
Developmental
30%
Goals / Objectives
Goal: Promote the field of Data Science and develop new statistical models and methods for plant epidemiology.Objectives:1. Develop models to predict pest outbreaks with emphasis on use of Bayesian statistical methods. Bayesian methods allow for seamless inclusion of new data as they become available.2. Developmethods to (a) extract informationrelated to plant pests from unstructured dataavailable in social media and other sources in Internet,(b) combine unstructured and structureddata,and (c) evaluate the value of information from unstructured data as it relates to improve information generated from structured data.3. Develop algorithms to describe phenotypic responses in plants when challenged with pathogens for which pathogenicmechanisms have been described atthe genomicand molecular level. For the algorithms large datasets are required. Datasets will be generated using micro-sensors.
Project Methods
Objective 1.Develop models to predict pest outbreaksModels that cause significant economic losses in crops will be developed for pests and diseases for which there are no available models. Data used for the model development will be derived from available surveys or generated by new studies. Emphasis will be given to develop models that predict pest and disease outbreaks in state or continental level. Bayesian statistical methods will be implemented for the model development. Bayesian methods allow for seamless inclusion of new data as they become available. Models will be evaluated for accuracy and precision in prediction with newly collected data. Effort will be made to make models available to extension personnel and the public and record users' feedback.Objective2. Developmethods to (a) extract informationrelated to plant pests from unstructured dataavailable in social media and other sources in Internet,(b) combine unstructured and structureddata,and (c) evaluate the value of information from unstructured data as it relates to improve information generated from structured data.Documents from social media, websites and other digital resources will be collected and used to extract common words and phrases used for plant pests and diseases. A dictionary will be developed. Subsequently words and phrases will be indexed and be analyzed. Specific case studies where a small and large volume of quantitative /experimental data is available will be used and compared with information extracted from unstructured data to derive conclusions about the value of unstructured data as it is related to data collected with experimental approaches in agricultural sciences. Information from unstructured data will be combined with structured data using Bayesian hierarchical methods. Pest models developed in objective 1 will be used as examples to combine unstructured with structured data.Objective 3. Develop algorithms to describe phenotypic responses in plants when challenged with pathogens for which pathogenicmechanisms have been described atthe genomicand molecular level.In collaboration with researchersfrom Electrical and Mechanical Engineering, who are involved in the development and application of sensors, I will investigate the use of micro-sensors for seamless collection of data of plant responses to pests and pathogens attacks. Sensors allow the collection of large datasets required to develop machine learning algorithms. Machine learning algorithms will be develop to identify patterns and trends in plant responses.