Source: SEEDLLINKED LLC submitted to NRP
CROWDSOURCING VARIETY PERFORMANCE DATA TO BOOST PERFORMANCE, VARIETY ADOPTION AND FARMER EMPOWERMENT
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1015902
Grant No.
2018-33610-28279
Cumulative Award Amt.
$99,475.00
Proposal No.
2018-00206
Multistate No.
(N/A)
Project Start Date
Jul 1, 2018
Project End Date
Dec 31, 2019
Grant Year
2018
Program Code
[8.12]- Small and Mid-Size Farms
Recipient Organization
SEEDLLINKED LLC
1761 DEWEESE ST
FORT COLLINS,CO 80526
Performing Department
(N/A)
Non Technical Summary
In today's tight agriculture economy, one of the most important decisions a farmer makes throughout the year is what type of seed to plant. Optimum yield, maturity, and disease resistance for a specific environment will secure farm revenue by creating better production and decreased use of inputs. However, in a highly consolidated industry where the top 3 players control 60% of the seed market, data is tightly controlled and transparency profoundly lacking. Seed options are decreasing, with varieties developed to maximize profit margins for the seed companies rather than performance for the farmer. The results are low adoption rates for new varieties, yield plateaus on the farm, a lack of options, price increases, and high input agro-industrial ecosystems rather than diverse management/prevention-based agricultural systems, such as organic agriculture. The global vegetable farmer's community, a $543 billion global market at the farmgate, will benefit having granular data to make better variety decision, and so more secure production.The customers: At Seedlinked we identify an information gap between growing regional vegetable seed companies/plant breeders and seed users (farmers and gardeners) in addition to a lack of market access for regional seed companies due to consolidation. Seedlinked will focus its minimum viable product (MVP) on vegetable seed/seed markets, as it is the most mature in online distribution. Our two main customer groups are seed companies/breeders and seed users - farmers and gardeners. Our second customer group, the seed user, themselves, has two pools: vegetable farmers and gardeners. Due to lack of available data in today's vegetable seed market, $3.8 billion in value in the US - seed users tend to base variety choice on personal experience, trusted recommendation, or a relationship to a local seed dealer. This is especially true for small to medium sized farms, 85% of all farms in America today, often due to a lack of access to expensive digital farming platforms most available to large scale farming enterprises. In addition, the vegetable market has a large gardener's customer, smaller in market value to farmers, but 1500 time larger in number - 32 million household plant vegetables every year. This vast gardener customer base is a very large data opportunity. Gardener experience and so data quality is too often underestimated.The value Proposition: Following a citizen science based approach, SeedLinked is proposing to combine an intuitive smartphone application with seed user triadic collected observations to generate valuable data to help guide seed selection and choice on both ends. On one end breeders will have access to variety performance from hundreds to thousands of locations (contrast this to an average dozen data locations for most current breeding programs). On the other end, seed users will have access to granular data from their own bioregion such as USDA hardy zone, indicating what seed will work best for their specific regional needs. SeedLinked will provide a tool to improve farm management by providing a seed recommendation platform; generate empowerment by directly engaging the farmer in the variety testing and breeding; and increase seed knowledge. SeedLinked also demonstrates a new seed enterprise model bringing new access to markets, and increased opportunities for independent breeders and emerging seed companies.Innovation: By crowdsourcing large amounts of data from the end user, we are proposing a radically different approach to seed selection. SeedLinked is developing a platform, that can take weather data, soil data, and crowdsourced observation from participants following a triadic model (Bradley Terry) and, using ranking models (Plackett-Luce ranking), generate recommendations for what to plant, and ultimately provide a valuable tool to help guide important decision making in the plant breeding process. Our major challenge to overcome is demonstrating that crowdsourcing variety performance data following triadic comparison from farmers and gardeners using our application will generate significant variety difference. To demonstrate feasibility of our phase 1 hypothesis, proper ontology, data collection processes and accuracy, and information design will be tested for maximum variety discriminative ability of the platform in parallel with conventional trialing. If phase 1 data indicates significant accuracy, this approach would be validated and a major breakthrough in variety selection, testing, and recommendation would result.
Animal Health Component
80%
Research Effort Categories
Basic
(N/A)
Applied
80%
Developmental
20%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20573101081100%
Goals / Objectives
Our major challenge is to demonstrate that crowdsourcing variety performance aggregated by farmers and gardeners via web and mobile platform could predict performance as well as, or better than, conventional trialing.Four tasks will be performed to fulfill this goal: 1) Successfully combining Plackett-Luce ranking and triadic comparison models with internal software in an intuitive platform using an iterative lean method. 2) Design proper traits ontology, data architecture, and data collection processes for maximum observation accuracy and ease. Data accuracy (CV, P value, ranking) and adoption (% of active participant) will be a measure of success. 3) Successfully test 12 varieties of peppers and carrots using a large pool of (>50) farmers and (>100) gardeners with crowdsourcing triadic method in paralleled to RCBD trials system with a positive correlation for at least 3 traits above 0.6; 4) Assess variety discriminative ability of the platform: evaluating reliability and validityThese 4 tasks are designed to answer the following questions: what data architecture and traits ontology are needed in the back end of a crowdsourcing platform? What design, architecture, and information design are needed to generate meaningful information for the user? Can users effectively submit accurate varietal data to the platform? How strong is the discriminative ability of triadic and Plackett-Luce ranking models? How many users are needed to have significant discriminative ability?Tasks 1, 2, and 4 will be done at Seedlinked with advice from Dr. Dawson. Task 3 will be performed on UW research stations and Dr. Dawson's farm networks. The project duration will be from June 2018, to November 2019, an 18-month period, in order to have a full growing season to validate the approach (early April to the end of October).Critical Milestones to bring product to marketLean data structure: using digital means to collect a minimalistic set of indicators at a frequent rate that allow monitoring of what is going on.Crowdsourcing variety data : front end, back end algorithm that can centralized and analyses dataBig data: a method to aggregate different source of data: testing Plackett-Luce ranking model to consolidate many source of data and define performance ranking.Information design: to making data available in formats that allow users to derive insights to inform decisions and boost adoption
Project Methods
SeedLinked plans to apply a citizen science approach first proposed by Van Etten (2011). The approach is based on the Plackett-Luce ranking model to consolidate many sources of data and define performance ranking. Self-user review, standard variety trials, and a tricot approach (described below) would be merged into one recommendation or ranking tied to environmental data. Simko & Pechenick (2010) and Lobell et al (2011) present methods to aggregate different sources of crop trial data showing great potential. Recently, some preliminary work has been conducted in developing countries by Bioversity, led by Dr. Van Etten. In Nicaragua and India, two projects using crowdsourcing principles to collect variety performance on wheat and common bean demonstrated that farmer evaluations converge on those of seed professionals taken on the same plots (Van Etten, 2017). This same project demonstrated higher adoption rates of newly released varieties than those common to the traditional plant breeding models dominating today.The method that will be tested in phase 1 is the Triadic Comparisons of Technologies model, which we refer to here as the tricot approach. Tricot can be used to assess a range of agricultural technologies. Triadic comparisons are a proven method in ethnobiological research (Martin, 2004). By fitting a Bradley-Terry (BT) model (Bradley and Terry 1952) a large sample of data may lead to a correct result even when individual data entries vary strongly (low reliability) as long as an unbiased aggregate measure can be calculated from the data (high validity) (Van Etten, 2011). Such models are not yet common in agricultural applications, but have been demonstrated in Central America and India (Van Etten, 2017).SeedLinked plans to have farmers observe a set of three varieties and evaluate different aspects of their performance at different points in time, using a simple ranking format called triadic comparisons. The total number of varieties in the trial can be much greater than three, and would be partitioned out to farmers in groups of three following an incomplete block experimental design. Each farmer then communicates their observations on the three varieties they are testing on their own farm via their mobile device (fig 2). The farmer-generated observation data is analyzed using statistical methods for ranking data. Given an adequate number of partial rankings, a preference scale for all varieties included in the experiment can be constructed by fitting a Bradley-Terry model.The obtained values are relative indices called abilities. Even though abilities are relative values and may be negative, they have a linear relationship with the underlying yields (Coe, 2002). When more varieties are analyzed, the Bradley- Terry model estimates the abilities from multiple comparisons between varieties using a logistic regression model and provides additional statistics, such as the standard errors and confidence intervals of the estimated abilities.Assessing accuracy: Accuracy has 2 components: 1) Reliability: repeated consistent results; 2) Validity: in line with larger group of results. In crowdsourcing models and certainly for the wisdom of the crowd concept, validity is the more important of the two components. In triadic model, when more observers are engaged, the frequency of a pairwise combination of 2 varieties increases, leading to more accuracy. The accuracy also increases when the number of varieties tested increase, if the numbers of testers increase as well. However, if more varieties are tested with a number of testers staying constant, frequency of a pairwise comparison decrease, in addition to the accuracy. To assess the variety discriminative ability of our model, it will be key to evaluate the number of users needed, as well as the number of varieties that can be tested. Proper traits and traits ontology, will be key to the design, as certain observations and traits appear to have different discriminative ability.For statistical analysis, we will use the R software (R Core Team 2014) with the following package: Kendall's tau coefficient with the Kendall function of package Kendall (McLeod 2011), MBT models with the glm function, paired comparison matrices with the patt.design function of package prefmod (Hatzinger &Dittrich 2012) and extracted p-values with the stars.pval function of package gtools (Warnes et al. 2014). Kendall's W using the kendall function of package irr (Gamer, Lemon & Puspendra 2012). For the discriminative ability, TM models with the thurstone function of package eba (Wickelmaier & Schmid 2004) and BT models using the functions countsToBinomial and BTm of package BradleyTerry2 (Turner & Firth 2012).As soon as the crop season is complete and all data aggregated, we will test more in depth the Plackett Luce model and validated ranking compared to conventional trialing conducted in parallel. We will use Kendall's tau coefficient, standard deviation, and Kendall“s W. We will assess accuracy and feasibility of the method. Then a pilot test of the most important functionality will be conducted. The discriminative ability will be assessed following Steinke et al, method.Finally, we will assess the possible integration/acceptance of a smart phone with daily farming practice, working closely with collaborator farmers and gardeners. Currently, vast majorities of people are using apps in their daily life to capture data, from sport or exercise tracking, to weather, or finance. Most farmers participating in the Seed to Kitchen trials have smart phones and use them to coordinate marketing, schedules, field-work and other daily tasks on their farm. Several have asked for an application to enter data for the Seed to Kitchen trials because they always have their phone with them but not their paper datasheets. Platforms and apps require fast adoption for commercial success. Through ubiquitous computing, large amounts of data can be collected. However, it is key that these large data sets be converted into meaningful insights through proper information design. New visualization data formats that are user friendly and give value added information to users will be critical. Following the lean method, we will reiterate front end, wire frame visualization in direct collaboration with end users to refine those functionalities.

Progress 07/01/18 to 12/31/19

Outputs
Target Audience:Seedlinked platform is split into 2 audiences, and so 2 distinct profiles: Vegetable growers (small farmers and gardeners) and trial managers (Seed companies, universities, non profit, farmer organisations). In 2019, we planned to reach 50 farmers and 100 gardeners. We ended up with 6X more farmers and 5X more gardeners than planned: Number of farmers: 319 farmers Number of gardeners: 535 Population group/age: 30% of users were in 35-44 yrs old group and 26% in 65+ years old; 62% of participants were female (Mailchimp analytics, 2019). We had 28 TM ( Trial managers) who use the platform across 57 trials when we planned to have 2 trials and only UW Madison as a trial manager. Seedbank, universities, non-profit organizations and seed companies used the platform as a beta test. Changes/Problems:Weather is the number one unknow when farming and trialing. 2019 saw a very cold spring that was detrimental to cucurbitaceae such as cucumber. We had more crop failure and data variability in our cucumber trial as a result. Five times more people joined the program which created an overwhelming flow of feedback and software issue early on. However, it very much helps the reiteration process to have a more intuitive platform and prioritize functionalities development Four times more crops were tested than initially planned. It brought some challenges to have a proper ontology. However, it gave us the opportunity to test the concept and measure variability on many more crops, locations and users. What opportunities for training and professional development has the project provided?For growers using the platform we focus on 3 initiatives: 2 Webinars conducted in April 2019 in dual language: English and French. > 50 growers attended Growers factsheet created to help use the platform sent to more than 400 participants We created 8 tutorial videos to help use the platform. Some tutorials have more than 400 views. For organizations managing trials, we did 27 one-on-one demo sessions via video conference or in person. In addition to training for users of the platform, three graduate students at UW Madison have been significantly involved in the seedlinked trials and platform. This has provided them with training on participatory research methods and opportunities to build their professional networks. How have the results been disseminated to communities of interest?Seedlinked platform is a tool to disseminate seed knowledge by itself by sharing crowdsource knowledge. Information design, functionalities were built to share in real time trial results. Support videos were made to help users. In addition, we went to 5 conferences and were covered in 2 media. Garden talk WPR, Feb 2019 with more than 500K listeners BC Seed gathering conference Vancouver: 2 workshops/presentation: Nov 6 to 8th Vancouver Stone Barns Young Farmers conference: 19YFC dec 4-6th Edible Madison quarterly print Dec 2019, with more than 25K readers. NEVF conference Dec 10th, The World of Seed Farmers to Farmers session (N>50) Cornell University Plant Breeding Seminar Series, Ithaca, NY, October 28th Great Lakes Fruit and Vegetable Expo, Grand Rapids, Michigan, December 10th 7 Planned presentations and conferences in 2020 Organic Seed Alliance conference 2020: Half day trialing workshop (N=50) and one-hour presentation during main conference. Organic Vegetable Production Conference, Madison, WI Feb 1, 2020 Variety trialing roundtable Garden Expo, Madison WI, Feb 7 2020, Variety trialing for gardeners Washington State University Crop and Soil Science Seminar series, Feb 10th, 2020 Michigan State University Horticulture Seminar Series, Feb 20, 2020 Clemson University Organic Plant Breeding Institute May 4, 2020 UC Davis Plant Sciences Symposium May 6, 2020 What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Software development Trait ontology: we created a common ontology with 11 traits (optional) and 4 physiological dates, one binary question and an open box comment. An API driven Web application using AngularJS, NodeJS and MySQL was built to create Trials, invite users to trials, manage planting lists, review and analyses the data generated. We design and build a MySQL database. 2 profile created in platform: Trial manager to create trial, Participants to review 2 apps were built to crowdsource data: Android and IOS published to the App and Play store. We built an API from PC platform to linked to the mobile phone app. The app made review and picture collection ubiquitous and easy. Each picture is linked to a variety, a location and a time. Model integration We integrated Placketluce ranking via API with Climmob portal and Plackett Luce Ranking R package. 9 triadic trials across 9 vegetable crops were tested using it in 2019 through 2 organizations: UW Madison and Seed Savers Exchange. Because existing testing groups were using a 1 to 5 scale for scoring, we also created within Seedlinked a separate trial design to accommodate this type of scoring. We were able to then analyze user preference for the type of review design between a Best/Worst out of 3 or 1-5 scale scoring on an unrestricted number of varieties. We also transform the 1-5 score into triadic to combine data set. As flavor is becoming one of the most important traits, we also built a QR taste test option in the Platform. Design We followed an iterative process with wireframe to design platform. We interviewed 85 users (in person and via phone) during the season to get feedback and improve UI/UX. Adoption and participation 17,380 single reviews were submitted into the platform across 37 states/regions in the USA and Canada by 864 growers. Participation rate (People who reviewed and completed trial vs total people who initially began the trial) ranges from 48% to 70% depending on the network/organization of growers. On average, farmers have a 10% higher participation rate than gardeners. We also saw a very big differences across crops. Some crop appeared as more difficult to grow and review. Tomato and Lettuce were the most successful with 72% and 81%; Carrot and cucumber were the hardest with 52%. In comparison, in 2018, using paper data sheets, the response rate was much lower at the end of the season, and increased to 59% with a lot of additional work for the trial management team calling farmers to get responses, which delayed reporting of trial results and increase cost. This meant that results were available only after the point where most farmers had purchased seed for the following season. With Seedlinked, data is shared in real time when a trial is completed allowing proper decision making for grower's prior seed purchase. Appearance was the most frequently rated traits by gardeners at a response rate of 75%, while yield and harvest date were the least frequently rated at 54%. Vigor, Disease, flavor, marketability and earliness averaged 64%. The trend was the same for farmers. Disease resistance appeared harder for gardeners as completion rates drop. Testimonials: Erin, farmer "if you are not already using the SeedLinked app I highly recommend it. I have an old apple phone and downloaded the app yesterday and it is so user friendly, intuitive, and pretty." Rob "SeedLinked has drastically improved the way trial results are recorded. It's very easy"; Mark "I liked using the app for uploading photos in the field. It made it really easy to complete the trial without writing anything down." By crop: We determined the discriminative ability, defined as the number of varieties that can be statistically distinguished from the best variety (p < 0.05), using PlackettLuce coefficients, closer to one the better. Basil had the highest discriminative ability (0.75). Then Snap Bean, Lettuce and corno di Torro (0.6 to 0.67); Then Carrot and Tomato (0.46, 0.4 respectively). Finally Snow Pea, Asian cucumber and Bell pepper had the lowest (0.31, 0.25, and 0.2). Cucumber was very much impacted by the cold weather which created a lot of variability. Different levels of genetic variability within a trial also impacted the ability to discriminate among varieties within the trials. By trait: Overall precision of trait across basil, Carrot, Corno, Lettuce, Bell pepper and tomato, disease resistance and yield had the lowest agreement among observers. This result is in line with Steinke et al. (2017). Vigor/appearance/Earliness Were the easiest to review, similarly to the results of Steinke for plant architecture (eq to appearance?) and Vigor. Surprisingly, flavor shows strong agreement among reviewers and across crops. This confirms the results from the Seed to Kitchen Collaborative that, contrary to popular belief, flavor is actually a trait that shows consistency in chef evaluations. Finally, Overall score (combining all trait ranking together) had the highest agreement across all traits Correlation Only Asian Cucumber, Carrot and Corno di Toro peppers had both triadic collaborative trial and 2 locations randomized (RCBD) design trials. The 2 RCBD trials were in Wisconsin, one near Madison in Hardiness zone 5a and one in northern WI in hardiness 3B. The Corno pepper triadic trial were across 9 states and 9 hardiness zones. The carrot triadic trial was across 10 states and 7 hardiness zone; The Asian Cucumber was across 11 states and 7 hardiness zones. However, across the 3 crop, 65% to 69% of reviewer were in same hardy zone as RCBD trials. Pepper corno di toro: N= 46 for triadic trial. The correlation (Pearson) between RCBD yield (marketable weight in grams) vs Triadic Pl coefficient for yield is: R-Squared= 0.35 with P value = 0.21. However, if we combine Triadic reviews (Yield, vigor, Disease, appearance and earliness) : R-Squared= 0.57. with P value = 0.085. Kendall correlation factor is 0.46 (ranking correlation). Equation: Avg. RCBD Yield = 291.914*Avg. Pl Coefficient + 626.012 So an increase of 20% of PL coefficient or 1 to 5 score (Overall traits) results in 21% increase in quantitative yield (grams) for corno di toro pepper. When score increase from 3 to 3.5, yield increase of 133 grams. Carrot N= 38 The correlation (Pearson) between RCBD yield (marketable weight in grams/linear meter) vs Triadic Pl coefficient for yield is: R-Squared= 0.42 with P value = 0.24. However, if we combine Triadic reviews (Yield, vigor, Disease, appearance and earliness) : R-Squared= 0.74. with P value = 0.06. Kendall correlation factor is 0.40 (ranking correlation). Equation: Avg. RCBD Yield = 1656.82*Avg. Pl Coefficient + 5615.21 When carrot overall score increases from 3 to 3.5, yield increase of 1Kg/linear meter. We see a stronger agreement between quantitative yield from replicated trial in 2 locations vs combining multiple visual agronomic review. Perfect agreement is impossible as triadic trial covered from 30 to 35 sites while Replicated only 2 sites. Those results demonstrate how powerful are crowdsource visual rating is to discriminate between variety and help farmers and seed stakeholders to make better decisions. Conclusion: Many more people than expected participated 8 out of 9 crops showed strong discriminative ability with N from 15 to 46. All trait showed some significance difference, with appearance, vigor and overall trait showing the strongest agreement across observers. Some traits like yield show strong agreement in one crop and weak in another one. Visual rating can then vary from crop to crop. We saw very strong ranking and performance agreement between RCBD and triadic for carrot and pepper. The wisdom of the crowd principle via triadic and scoring is then demonstrated opening tremendous potential for a platform like Seedlinked.

Publications

  • Type: Websites Status: Published Year Published: 2019 Citation: www.seedlinked.com