Recipient Organization
VISIMO LLC
400 MAIN ST
CORAOPOLIS,PA 151081629
Performing Department
(N/A)
Non Technical Summary
VISIMO proposes to develop a risk management platform, referred to as Demeter, that leverages a custom Machine Learning (ML) model to enhance the efficiency, profitability, and safety of small and mid-sized farms. Demeter will advance the state of farm safety by identifying risks, providing mitigating actions, and offering economic cost-benefit analyses for suggested mitigations. The proposed application connects to agricultural-related manufacturing technology by improving the safety of agricultural practices, increasing productivity, improving operational management, and training farmers and producers.Farming is among the most dangerous occupations in the United States with an annual death rate of 20/100,000 persons (U.S. Bureau of Labor Statistics, 2021). Nationally, workers in Agriculture, Forestry, and Fishing (AFF) industries are up to 33 times more likely to die on the job than workers in other industries (U.S. Bureau of Labor Statistics, 2021). The average annual cost of occupational injuries in agriculture is $8.3B, including both medical costs and lost productivity, and an estimated 167 agricultural workers per day lose productivity due to occupational injuries (Agricultural Safety & Health Council of America, 2015). Farming injuries represent 30% higher cost rates for personal injuries per year than the national average across all industries (Leigh et al., 2001).People under the age of 16 years old are especially impacted by agricultural hazards. Farms, ranches, and other agricultural operations are often both worksites and homes (Reames, L., Jan 31, 2023 VISIMO interview). Of the 731,000 youths working in agriculture, 65% of them work on a family farm (ASHCA, 2015), while a total of 893,000 total youths lived on farms in the U.S. (NIOSH, 2016). In agriculture, one child fatality occurs approximately every three days (NCCRAHS, 2020; Goldcamp et al., 2004; Hendricks & Hard, 2014), and 38 children are injured on farms daily (ASHCA, 2015). Young workers are 7.8 times more likely to be fatally injured in agriculture compared to the average injury rates of all other industries combined (14.57/100,000 Full Time Employee (FTE) vs. 1.87/100,000 FTE) (NIOSH, 2019).Despite high injury rates in agriculture, there are no fully comprehensive, pre-existing datasets for agricultural hazard analysis and existing datasets are limited in their composition. AgInjuryNews (Weichlet & Gorucu, 2018), which tracks agricultural injuries by compiling news articles, is one of the most comprehensive existing data sources, but it is limited by the small set of binary variables that it tracks, which do not fully encompass all agricultural risk.VISIMO proposes the development of a customizable decision-support tool, Demeter, which will allow producers to assess and mitigate risk in real time, increasing the safety, efficiency, and productivity of small and mid-size farms.The primary outcome of the Phase II effort will be a prototype of the Demeter application. Demeter will accurately assess risk and provide mitigation suggestions, including cost-benefit analyses, for each suggestion. As development progresses, improvements will be made to the GNN and to the arithmetic formula to improve risk mitigation. Over the period of performance, VISIMO will collect between 5,000 and 7,000 data entries on small and mid-size beef and dairy farms.In addition to the creation of a prototype, Phase II will also result in a dataset for agricultural risks far more detailed and voluminous than any existing dataset. Current datasets rely heavily on limited information identified after a safety incident. In many cases, not all the variables are known because the data source (e.g., a news article) does not include this information. Through the creation of a new dataset, VISIMO will enable new research and education.VISIMO's primary customers include extension agencies, cooperatives, insurers, banks and crediting agencies, professional associations, and educational organizations that seek to address occupational safety issues affecting U.S. agriculture.
Animal Health Component
60%
Research Effort Categories
Basic
20%
Applied
60%
Developmental
20%
Goals / Objectives
VISIMO proposes the development of a customizable decision-support tool, Demeter, which will allow producers to assess and mitigate risk in real time, increasing the safety, efficiency, and productivity of small and mid-size farms.The primary outcome of the Phase II effort will be a prototype of the Demeter application. Demeter will accurately assess risk and provide mitigation suggestions, including cost-benefit analyses, for each suggestion. As development progresses, improvements will be made to the GNN and to the arithmetic formula to improve risk mitigation. Over the period of performance, VISIMO will collect between 5,000 and 7,000 data entries on small and mid-size beef and dairy farms.Phase II Technical Objectives:Gather Requirements:Formally document R&D specifications for the work identifying necessary features. Establish metrics and benchmarks for future validation. Recruit testers an d end users.Enhance PWA and Backend:Gather preferences from potential end users and design a user-friendly User Interface (UI). Test functionality. Iterate the online backend, using Django with GraphQL; use React JS for the frontend.Develop NLP and Object Detection Tools:Add features that incorporate data uploaded by users and that uses NLP to facilitate data. These will be established early for ongoing data collection. Models will be trained on real user data to support NLP and CV.Conduct Alpha Testing:Identify end users and deliver an Alpha version of the PWA to testers; assist with install and use. Collect user feedback, focusing on functionality issues and support, continually gathering data and feedback to iteratively improve components of Demeter. Compile data points collected in Alpha testing for use in updating synthetic data and train the GNN.Conduct Beta Testing:Deliver Beta version of PWA to testers and assist with install, use, and support. Collect feedback from users, focusing on usability concerns. Analyze usability concerns and enhancement prioritization, gathering feedback to iteratively improve components of the application. Continue compiling data points collected during Alpha testing, updating synthetic data for GNN training.Perform Iterative Improvements:Release a 1.0 version of the PWA available to a general audience, allowing users to provide feedback. Review Phase I assumptions in synthetic data generation against incoming real data and make corrections. Use data to continue improving the existing ML components of the application.Proposed Phase II Success Metrics:Process Based:Triplet Loss: Measures how well embedding space translates similarities and differences between hazards.Data Quantity: Number of data entries collected.Application Performance: Metrics tracking speed, bug frequency, and prominence.Usage Statistics: Quantity and demographics of users.Outcome Based:Precision: Fraction of actual accidents among predicted accidents.Recall: Fraction of actual accidents among predicted non-accidents.User Satisfaction: Survey results about user experience.MAE: The amount of agreement between the arithmetic formula and SMEs.
Project Methods
The R&Dphase will be divided into three steps, with Step I as the creation of a mobile Progressive Web Application (PWA).The PWA will be constructed using ReactJS for the frontend and Django, a Python web development framework, for the backend.The goal of Step II is to gather as much ground-truth data as possible, and VISIMO will encourage end users to submit records daily. A single data record is a hierarchy consisting of: (1) an activity, (2) one or more categories, and (3) one or more hazard observations for each category. A hazard refers to an object, structure, or condition that might negatively influence safety. As an example of a full observation, a common activity on dairy farms is scraping and cleaning manure. Some categories underneath this activity include manure storage pond, and above-ground manure storage. Specific hazards within the category of a manure storage pond include the quality of the fence that surrounds the pond, and whether appropriate warning signs have been placed around the pond. Based on user feedback from VISIMO's construction JSA tool,producers will have to spend less than three minutes inputting variables prior to a task to receive mitigation suggestions, with an accompanying cost-benefit analysis. After a period of work, users will update which mitigations they chose to act on, and whether a safety incident occurred.VISIMO expects Demeter's ML components will require around 5,000 to 7,000 individual data entries on which to train. Because the scope of Demeter is still limited to beef and dairy farms, it uses only 100 variables. Therefore, the number of data entries required for model training decreases significantly. With an estimated safety incident rate below 1%, collecting at least 5,000 data entries also helps to ensure that records with injuries will be present in the data. VISIMO expects to collect this amount of data within 18 months through our committed collection partners Virginia Future Farmers of America (FFA) and VA Farm Bureau. This collection window will span all four seasons, which will impact the collected data. For example, during the winter, pasture management activities are less frequent, and there is a 17% increase in the number of total injuries in agriculture (Farm Injury Resource Center, 2018). In spring and fall, there will be an increased frequency of hazards related to milking cows, as some farms milk seasonally instead of year-round (NIWA, 2023). Understanding and accounting for these seasonal impacts will help eliminate bias and improve Demeter's performance. While collecting data, VISIMO will establish both a means of uploading image data to the database as well as accepting written descriptions of hazards on the scene, in addition to Demeter's standard drop-down menu options for risk analysis. These alternative forms of data will set the foundation for the Computer Vision (CV) and NLP components discussed in Step III.In Step III, VISIMO will incorporate ML into Demeter's risk estimation capabilities, using the data gathered in Step II to train its ML models. Performing risk analysis using ML will enable the tool to model the intricate and hidden relationships between hazard variables and to create more accurate risk estimates. For ML modeling, a GNN will be implemented as GNNs aredesigned to handle complex, nested data structures (i.e., graphs), such as the hierarchical tree structure described above and used to define the observations. When reviewing an individual node on the observation structure, a traditional neural network is unable to reference non-local information to make decisions (Zhou et al., 2020), meaning network types are unable to look at the rest of the tree while evaluating a specific item. GNNs, on the other hand, make continual use of non-local information during both training and prediction, as they are focused on learning the relationships between different nodes on the tree, and can identify complex, non-linear relationships between these different variables.Phase II will focus on obtaining real-world data to train and test the model built in Phase I. Phase II will also focus on end user and purchaser testing to ensure the developed prototype meets the needs of the market. Demeter's Phase II use will create an unprecedented dataset within agriculture, providing researchers and practitioners with access to hundreds of variables and thousands of case studies. VISIMO has secured commitments from individual producers to allow data collection on their farms during Phase II, as well as a commitment from VA FFA and VA Farm Bureau to collect data for Demeter. VA FFA will utilize their 300 teachers who work with the 520 dairy and 23,000 beef farms across the state, and VA Farm Bureau will collect data during their 150 annual farm visits for safety auditing services.Precision measures how well the model avoids false positives; i.e., how likely a scenario identified as high-risk by the model actually is high-risk; while recall measures how well the model avoids false negatives; i.e., the probability that a truly high-risk scenario will be flagged as high-risk by the model. Evaluating the performance of the GNN on the synthetic data resulted in a precision of 0.3521 and a recall of 0.8333. Since the GNN can learn on the synthetic data which models the complex relationships found in real data, its performance when trained on real data is likely to achieve similar results. Continually training the model with a combination of the SME-created synthetic data and a larger volume of real data from VISIMO's committed partners for Phase II increases the likelihood of higher performance.The second method for evaluating the GNN compares its performance to that of the arithmetic formula, using the synthetic data as an evaluation set. VISIMO developed this arithmetic formula (the third result of Phase I) as a way to provide useful risk estimates to users even before sufficient data has been collected to train the ML models. Since this formula is based on SME estimates and proceeds via straightforward calculations, it can provide risk estimates in the early stages of Demeter's deployment, when comparatively little data has been collected.While agriculture is a riskier industry than others, injuries are relatively rare, with only 25 to 50 injuries expected in a dataset with 100,000 records. Therefore, the use of SMEs enables risk estimation in the initial phases of data collection prior to recording incidents. The SMEs assigned a risk value on the same scale as the arithmetic formula and VISIMO computed the MAE between the arithmetic risk scores and the SME risk scores. The MAE score indicates how well the formula agrees with SME scoring. The arithmetic formula had an MAE of 0.2110 on a scale of 0 to 1, suggesting it was effective in estimating SME scoring. After validating that the arithmetic formula was capable of approximating SME scoring, VISIMO used it as an additional means of evaluating the GNN. Because the formula outputs a value between 0 and 1, and the GNN outputs a binary prediction, the formula scores were converted to binary predictions. To do this, any observation with a score greater than 0.39 was considered a high-risk observation, while any with a score below 0.39 were considered not high-risk (the threshold value of 0.39 was determined to be optimal via simple calculations.) The formula was then applied to the synthetic data and was able to achieve a recall of 0.7666 and a precision of 0.0653. The GNN was able to far surpass these scores with a recall of 0.8333 and a precision of 0.3521, indicating that the GNN model will perform more accurately than the SME-validated formula on real-world data.