Progress 01/15/22 to 08/29/22
Outputs Target Audience:Our target audience included researchers interested in precision soil sampling, and they were contacted through peer reviewed publication. Changes/Problems:The rate of expenditure may appear slow from the funding agency's perspective for the following reasons: Both PI and Co-PI have been spending very economically by making use of support from their respective departments. In particular, when the project started on 01/15/2022, the Ph.D. student from the PI's group has already been in the middle of a Research Assistantship appointment funded by our industry partner, which covers his Spring 2022 semester. Therefore, he started studying deep learning and developing machine learning models for this project without the need to spend grant funds on his stipend and tuition for Spring 2022. During Summer 2022, the PI accepted an offer to move from South Dakota State University to the Florida Institute of Technology. In addition, the Ph.D. student obtained a paid summer internship to work on a related machine-learning project at our industry partner. Therefore, the PI did not spend any funds during this transition period. During the initial period of the project, Jan-May 2022, the winter weather in South Dakota prevented us from organizing proposed field trips for data collection. Instead, the trips happened in the Summer of 2022. During this time the Co-PI department had a surplus in budget and offered to cover the Co-PI's MS student's summer. The Co-PI did spend funds on his summer salary. Despite the above events, there is no significant deviation from the proposed research plan and goals. We have made very promising progress with a large amount of field data collected during Summer 2022. We also had a paper accepted for publication in Computers and Electronics in Agriculture and another paper under review. Both papers acknowledged the support from this grant. This is accomplished while preserving a significant portion of the grant for future research activities to be conducted in this project. What opportunities for training and professional development has the project provided?An M.S. student (Sravanthi Bachina) and a Ph.D. student (Praneel Acharya) are working on this project. The M.S. student from the Co-PI Kris Osterloh's research group is in charge of data collection and labeling (Objective 1). The Ph.D. student from the PI Kim Nguyen's group is in charge of developing the machine-learning models and performance metrics validation (Objectives 2 and 3). The training and professional development activities for these students include learning knowledge and skills related to machine learning and data processing, collecting soil sampling data, and innovating and applying machine learning techniques to the data to accomplish the goals. How have the results been disseminated to communities of interest?The techniques, which we have been developing in this project, have been applied to data obtained in a different project. The results have been reported in the form of two papers, both of which were submitted to the Computers and Electronics in Agriculture journal. One paper was recently accepted for publication, while the other paper is currently under review. What do you plan to do during the next reporting period to accomplish the goals?The PI has moved from South Dakota State University to the Florida Institute of Technology. The PI is requesting to transfer the grant to his new institution to continue the proposed research. This Final Report is a required component of the grant transfer process. After the grant transfer process is completed, the remaining research tasks will be performed in close collaboration at two different institutions: Florida Institute of Technology (PI, Objectives 2 and 3) and South Dakota State University (Co-PI, subaward, Objective 1). Below are the remaining objectives to be accomplished in the remaining of the project period: Objective 1: Establish a cyberinfrastructure of landscape data and soil sampling annotations for the training of deep convolutional neural networks In Summer 2022, we arranged a number of field trips to the Edinger Brothers Partnership farm in Mount Vernon, SD to collect soil sampling data from over 60 fields, most of which are at least 150 acres. We were also granted access to their soil sampling data collected from 2011 to date. Since non-dynamic soil properties (texture, rock fragments, etc) vary by landscapes, the fields were selected in such a way that they represent a diverse group of common landscapes. In the next phase of the project, the final delineation of training data will be completed using the geographic information system (GIS) software. The collected digital elevation model (DEM) will also be used to generate a suite of landscape derivatives such as slope, aspect, curvature, etc. These derivatives will be used as features in the deep-learning pipeline. DEM derivatives are straightforward and can be automatically generated, unlike hillslope profile positions. Thus, strong relationships between hillslope profiles and soil properties will create a more robust final product. The collected data will be annotated and fed into a deep-learning pipeline that refines and extracts the optimal locations for soil sampling given a landscape area as described in Objective 2. Objective 2. Develop a deep-learning pipeline to learn, analyze, and refine landscape data for automated selection of soil sampling locations. In the next phase of this research task, we will focus on developing and testing the remaining proposed modules of the machine-learning pipeline: Region proposal: This module takes a feature map from the feature extraction process as input. At each coordinate of a feature map, we will place a fixed number of rectangles called anchor boxes of different dimensions. Anchor boxes are the set of pre-defined boxes and are crucial for optimal sampling zone detection in our deep-learning framework. They can be thought of as zones where the network initially predicts the probability that a set of coordinates contain desirable soil sampling zones (i.e. initial guess). The algorithm then refines and re-sizes these anchor boxes when it learns more about the characteristics of optimal sampling spots with guidance from the ground truth. Each anchor box in a feature map covers a region on an input landscape that might have desirable sampling zones in it. Therefore, a region proposal neural network needs to be formulated and well-trained with labeled ground truth data to effectively find regions that contain optimal sampling zones with high probability. This is another step of data refinement. Region-of-Interest (RoI) pooling: Coming out of the Region Proposal process, not all regions are of the same size. The main goal of ROI pooling is to separate individual proposed regions and resize them to a fixed size while still maintaining all key features. The outputs of this RoI pooling process are well-defined regions of the same size that may contain desirable soil-sampling zones. Inference: This module is a final refinement step in addition to the previous processes. Here, we will formulate another multilayer neural network that takes the above ROIs as inputs and outputs regions containing optimal sampling locations with high certainty and high precision. Similar to the Regional Proposal module, the neural network will be trained via an optimization scheme. Nonetheless, with much more refined and specific landscape data input (produced by ROI pooling), the results are expected to be more precise and with a higher agreement with the ground truth. How close to the ground truth is defined by a set of performance metrics, which dictate the number of times this entire deep-learning pipeline will be iterated. The only constraint is that additional iterations require additional training data and time. Objective 3. Design and implement a set of metrics to assess the success rate of the optimal sampling site prediction tool. To compute the performance of the developed algorithm, we will run the trained network through a set of evaluation landscape data. Each evaluation landscape has a ground truth, in which every optimal soil sampling spot is manually labeled with a bounding box by the Co-PI soil pedologist. The ground truth is the best possible performance from careful inspection by a domain expert. We will pass each landscape dataset through the trained deep-learning pipeline, compare the results with the ground truth, and store the following information for each evaluation landscape: The total number of optimal soil sampling spots missed by the trained model (MD) for a given landscape compared to its ground truth. The total number of incorrect detections (FD) made by the trained model for a given landscape compared to its ground truth. These incorrect detections can occur when the trained model detects a desirable sampling spot, but the ground truth does not consider it optimal. The total number of detections (DT) by the trained model for the given input landscape. Note that not all detections contain optimal sampling spots. For example, FD consists of detections counted in D_T that are not optimal sampling spots when compared to the ground truth. The true detection number (TD) computed by the difference between the total number of sampling spots detected by the trained model for the given landscape and the total number of incorrect detections for that input landscape. Finally, the accuracy is computed as TD/(TD+FD+MD ). This is a simple set of metrics we will start with. We will formulate and implement more complicated performance metrics if needed. With this set of metrics, we penalize the algorithm for every missed detection and any wrong detection. The value of accuracy will be 1.0 if the sampling site detection algorithm results exactly match the ground truth. With multiple input landscape datasets, the overall accuracy is the average accuracy over all the experimental landscapes. The accuracy defined here will be used as a performance metric to quantify the ability of the model to detect optimal soil sampling sites.
Impacts What was accomplished under these goals?
Objective 1: Establish a cyberinfrastructure of landscape data and soil sampling annotations for the training of deep convolutional neural networks (50% Accomplished) The first phase of the project focused on data collection and establishing a cyberinfrastructure of landscape data and soil sampling annotations. The official project start date was 01/15/2022. We were not able to perform field trips to collect soil sampling data since it was winter weather in South Dakota from January to May 2022. In Summer 2022, we arranged a number of field trips to the Edinger Brothers Partnership farm in Mount Vernon, SD, to collect soil sampling data from over 60 fields, most of which are at least 150 acres. We were also granted access to their soil sampling data collected from 2011 to date. In Fall 2022, we will focus on organizing, analyzing, and labeling the collected data to make it ready to train the machine-learning models developed in Objective 2. Objective 2: Develop a deep-learning pipeline to learn, analyze, and refine landscape data for automated selection of soil sampling locations (20% Accomplished) We accomplished about 20% of the task of developing a deep-learning pipeline to learn, analyze, and refine landscape data for automated selection of soil sampling locations. During the reporting period, we used a small soil sampling data set, since the proposed data collection had not been done yet. We developed the feature extraction module with a neural network designed to extract important characteristics of an optimal sampling zone that can be used to uniquely identify it from other zones that are not desirable for soil sampling. The feature extraction process results in reduced data size, since only zones with desirable characteristics are outputs from this module. Hence, instead of directly dealing with the original landscape data size, the optimal sampling zone detection algorithm works with feature maps that have reduced size, i.e. much less information to be processed and reduced processing time and computational power. Our algorithm achieves this data refinement while preserving unique characteristics of each optimal sampling zone present in the input landscape. However, this module is only a rough refinement process. The artificial neural network's parameters will be tuned in such a way that only coordinates that do not contain desirable characteristics with high certainty are removed from further processing to avoid over-filtering. The remaining modules to be developed and tested in this task are: Region proposal, Region-of-Interest pooling, Inference, and testing of the entire machine-learning pipeline altogether. Objective 3: Design and implement a set of metrics to assess the success rate of the optimal sampling site prediction tool (0% Accomplished) We will start working on this research task after the first two goals have been accomplished.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2022
Citation:
Acharya, P., T. Burgers, K.D. Nguyen. 2022. AI-enabled droplet detection and tracking for agricultural spraying systems. Computers and Electronics in Agriculture. Accepted.
- Type:
Journal Articles
Status:
Under Review
Year Published:
2022
Citation:
Acharya, P., K.D. Nguyen. 2022. A deep-learning framework for spray pattern segmentation and estimation in agricultural spraying systems. Computers and Electronics in Agriculture.
|