Source: UNIV OF IDAHO submitted to NRP
FACT: FIELD CROP VARIETY DATA HUB FOR INFORMATION ACCESS AND KNOWLEDGE DISCOVERY USING HISTORICAL AND CURRENT TRIAL DATA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1025692
Grant No.
2021-67021-34255
Cumulative Award Amt.
$493,800.00
Proposal No.
2020-08868
Multistate No.
(N/A)
Project Start Date
Mar 1, 2021
Project End Date
Feb 28, 2025
Grant Year
2021
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
UNIV OF IDAHO
875 PERIMETER DRIVE
MOSCOW,ID 83844-9803
Performing Department
Statistical Programs
Non Technical Summary
Over fifty U.S. states and territories previously had or currently have public variety testing programs to evaluate performance of new varieties of regionally impactful field and horticultural crops. These variety testing programs are renowned across the United States for providing unbiased information to farmers, agriculture professionals, extension educators and the agricultural research community about agronomic performance of varieties in local conditions. Crop variety cultivar performance data are an invaluable resource to multiple field crop industries and represent a considerable historical investment. While summaries of individual trials conducted by these programs are available online in annual reports for anyone with internet access, comparing or combining data across years or different programs is challenging. Furthermore, the full version of data sets from previous field trials are at risk of permanent loss as program managers retire or relocate. Data organization is a critical part of fully utilizing and leveraging data sets for farmer decision making and meeting research aims. The goal of our project is to organize variety testing data from Idaho, Oregon and Washington public variety testing programs into a data hub to enable access by farmers, researchers and other industry professionals. This region hosts robust variety testing programs for wheat, barley, chickpeas, dry peas, lentils and canola, which are all crops commonly grown in this region and form a cornerstone of U.S. small grains production.Our methods are to first collect and curate the data sets for inclusions in a database. We will correct errors, merge data sets, and establish crop-specific standardized templates for reporting variety trials. We will establish web-based tools to browse, query and download trial data both interactively and through an application programming interface. We will also build statistical tools for single trial and multi-trial analysis, prediction of varietal performance in new locations, and quantification of environmental effects on crop performance. This data hub will protect data integrity, ensure high-quality historical crop performance data is widely available, and facilitate novel research that would benefit from access to high-quality historical crop performance data. The establishment of this data hub, the first of its kind in the United States, can serve as example to other variety testing programs. Ultimately, this variety testing data hub is intended to improve industry competitiveness by providing an advanced tool for aiding farmer decision making and enable downstream research applications across a broad array of disciplines.
Animal Health Component
20%
Research Effort Categories
Basic
0%
Applied
20%
Developmental
80%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
9032410108075%
9032410209025%
Goals / Objectives
Our goal is for all Northwest public crop variety testing data to be organized in a well-managed database that enables data access and usage for the agricultural industry and researchers. Towards this goal we have three project objectives. Objective 1 is to establish data standards and a framework for the organization of Pacific Northwest variety testing data. Objective 2 is to assemble and build a database of variety testing data for Pacific Northwest annual field cropping systems. Objective 3 is to enable access to this database through a multifaceted set of tools for the targeted stakeholders: researchers, farmers and other industry professionals.
Project Methods
For objective 1, we will establish data standards and the framework for the database. Data from variety testing programs will be acquired and examined across multiple years, with particular focus on identifying common data structures, naming conventions, controlled vocabularies and relationships. Based on this examination, we will develop a framework of standardized templates for organizing the trial data and associated metadata. A database schema will be created in MySQL defining table structures, relationships between tables, and constraints on individual parameters. The indicators that this objective have been met are (1) trial data and metadata templates established for all crops; and (2) a MySQL database established for these data. Guided by the data standards and framework established previously, objective 2 focuses on preparing the raw data for inclusion into the variety testing database - a suite of steps collectively referred to as "data curation". This process varies for each data set, but most data sets require error checking, instituting common formats for certain data types (e.g., calendar date), checking for duplicate rows, instituting common column naming practices, transposing tables, and merging common data sets. Trial data will be grouped by variety testing program and crop for curation. During this process, controlled vocabularies will be established for all categorical variables indicating acceptable values. This step is extremely helpful for avoiding the proliferation of near duplicates (e.g., "Westbred 470", "WestBred 470", "WestBred470"). These processes will be carried out using command line tools R, bash, python, OpenRefine and other software. The use of command line tools will result in a clear record of data provenance. This level of tracking is helpful for data and research reproducibility, making it possible for anyone to retrace and/or recreate the steps we took. Our indicator of completion for all curation steps will be cleaned data files and the scripts that documents these processes saved to a remote repository under version control.A similar process will be undertaken for gathering trial metadata. These metadata will include at minimum, the program which conducted the trial, location (e.g., nearest town or research station), year, nursery type, experimental design, planting date and harvest date. When available, trial metadata will also include plot size, fertilizer regimes, chemical applications, soil test results, soil type, geo-coordinates, and additional agronomic conditions (e.g., irrigation status). Collecting these data will likely require some manual data entry. Curated data and metadata will be imported through a web interface as comma-separated values files and stored in a MySQL database. By completing objectives 1 and 2, we will establish a set of practices for reporting trial data. The extent of adoption of these practices of variety testing programs is how we will evaluate the success of our efforts were in achieving a change in data handling behavior.For objective 3, we will build web-based querying tools so all imported data can be easily accessed through browsing, searching and filtering on single or multiple conditions. This includes annual trial data, trial metadata, and information on cultivars/unreleased germplasm. The Drupal platform will be used as the web interface. Drupal is open-source and feature-rich, with customizable user permissions, search tools, and responsive layouts. Data access will also be enabled through an API. We will include statistical analytical capabilities within the data hub to conduct single and multi-trial analysis and visualize those results. This will be accomplished with the R programming language. These tools will be farmer-focused additions that use trial summaries based on the analytical approaches described previously. While these visualizations rely on advanced statistical modeling techniques, understanding the output does not require a deep understanding of the underlying statistics. Data visualization tools would allow users to compare varietal performance across years and sites, rank varieties by the trait of their choosing, and provide custom summary plots for user-specified years and locations. We will work with our advisory panel to ensure these tools meet farmer, industry, and research needs. The deployment of the database and these accompanying tools to a public website is the measure of success for meeting this objective.Once we complete step 3 and the data hub is live, our efforts to achieve changes in knowledge, action and condition can be realized. With the widespread availability of variety testing data, we intend that farmers and researchers alike will use this data hub to access information from current and historic variety trials. They will be able to leverage the analytical capabilities built into the hub and use that knowledge to improve the state of regional agriculture through improved planting decisions and downstream research. The extent to which our efforts result in the desired changes in knowledge, action and condition can be evaluated by (1) communicating with our advisory panel (who represent our stakeholders) about usage and utility; and (2) track website traffic, page views, downloads, and similar activities on our data hub directly with tools such as Google Analytics.

Progress 03/01/21 to 02/28/25

Outputs
Target Audience:For the project period, we reached two target audiences through release of our project data tools for Pacific Northwest wheat variety trial data. 1. The agricultural research community was reached through release of the wheat database. This was presented at the annual meetings of the American Society of Agronomy in 2024. This audience consists of agronomists, crop geneticists, soil scientists, and crop scientists in different career stages from undergraduate to emeritus. We were invited by the T3 Triticeae Toolbox to add our data to their database. 2. We also reached Pacific Northwest producers through release of the wheat database and beta versions of the phone application. The app was shared with Idaho wheat producers for feedback. This audience consists of small grains producers who farm in Idaho, Oregon and/or Washington and other people that support this industry (e.g. extension educators, equipment and supplies companies, other governmental agency officials). 3. Members of the general public were also reached through our project website (www.westernagdata.org) and through coverage in local news outlets including the Farm Bureau, the Lewiston Tribune and the Idaho Wheat Commission. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?In August of 2024, Emily Galvin, a technician for this project, attended the Posit PBC conference in Seattle, Washington. This is a premier conference for R and python programmers who work in data science. Emily's attendance of the conference and a one-day workshop associated with the workshop raised her skill set and confidence in R programming. How have the results been disseminated to communities of interest?In the last year, the data were dispersed to research communities by sharing the data the the T3/Triticeae Database. The data were also distributed to the target producers audiences by attending the annual Tri-State Grain Growers' Meeting held November 19-21 in Coeur d'Alene, Idaho. We also requested and received and additional funding from the Idaho Barley Comission for 2024-2025 to launch a barley version of our mobile phone app. In the application process, we cotinued to reach and inform our target audience of our work, outputs and how it can support small grains producers in the Pacific Northwest. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? All goals and objectives have been achieved during the four years this project has been funded. In our final year of the project, we finalized the phone app, completed curation and upload of all remaining data, and trained all project members in usage of the commercial database specific for variety testing, Genovix.

Publications

  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2025 Citation: Piaskowski JL. 2025. Coefficient of Variation and Variety Testing Field Trials. SCC-33 Multi-state Project for variety testing. Feb 3, 2025 (Melbourne, FL and online).
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Piaskowski JL. 2024. Coefficient of variation and variety testing field trials. Conference on Applied Statistics in Agriculture and Natural Resources. May 13-16, 2024, Ames, IA.


Progress 03/01/23 to 02/29/24

Outputs
Target Audience:In the reporting period, we reached two target audiences through release of our projectdata tools for Pacific Northwest wheat variety trial data. 1. The agricultural research community was reached through release of the wheat database. This was presented at the annual meeting of the American Society of Agronomy. This audience consists of agronomists, crop geneticists, soil scientists, and crop scientists in different career stages from undergraduate to emeritus. We were invited by the T3 Triticeae Toolboxto add our data to their database. 2. We also reached Pacific Northwest producers through release of the wheat database and beta versions of the phone application. The appwas shared with Idaho wheat producers for feedback. This audience consists of small grains producers who farm in Idaho, Oregon and/or Washington and other people that support this industry (e.g. extension educators, equipment and supplies companies, other governmental agency officials). Members of the general public were also reached through our project website. This was fully revamped during the reporting period. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Emily Galvin, a technician who is responsible for project data curation, attended the Posit conference, an annual event focused on R programming, virtually. This conference has and continues to be a very important and valuable event communicatingnew developments in R the programming language and applied data science in the R ecosystem(R is the primary language used by this project for data curation). How have the results been disseminated to communities of interest?Representatives from this project attended the Tri-State Grain Growers meeting, the premier annual event for small grain producers in Idaho, Oregon and Washington. We also communicated directly with regional commodity groups: the Idaho Wheat Commission, the Washington Grains Commission and the Oregon Wheat Commission. ? What do you plan to do during the next reporting period to accomplish the goals?During the next (and final) project reporting period, we plan to launch the remaining publicweb databases for barley, canola and cool-season legumes. This involvesfully populating our internal Genovix instance with raw data from variety testing trials and using that data to populate the public web interface. The web interfaces need to be constructed, following the template established for the wheat database. These data willalso be staticallysummarized across multiple conditions and presented in an interactive phone app, using the base code developed for the wheat phone app. We will finalize the phone app for release after receiving feedback from our industry partners (currently solicited). We are also working to transition this project to be managed by the regional variety testing leaders who have been participating in this project. These individuals will be responsible for data upload and regular data maintenance so the project can continue once our NIFA funding ends.

Impacts
What was accomplished under these goals? Objective 1 was met during previous project reporting periods. In the reporting project period, we completedmany aspects of Objective 2. The wheat and canola internal databases (using the commercial software 'Genovix') was fully populated with wheat and canola data. We have begun to add the barley variety testing data. We met objective 3 goals by launching the public wheat web interface and launching a beta version of a phone app (available for the iPhone and Android phones) showcasing small grains variety testing results.

Publications

  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Piaskowski, J. L., Galvin, E., Marshall, J., Schroeder, K. L., Walsh, O. S., Finkelnburg, D., Davis, J. B., Graebner, R., Neely, C. B., & Jones, S. S. (2023) Tools and Computing Infrastructure for a Wheat Variety Testing Datahub [Abstract]. ASA, CSSA, SSSA International Annual Meeting, St. Louis, MO. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/149882
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Piaskowski, J. L., Galvin, E., Tay, J. (2023) R Packages for Curating Agricultural Data Sets [Abstract]. ASA, CSSA, SSSA International Annual Meeting, St. Louis, MO. https://scisoc.confex.com/scisoc/2023am/meetingapp.cgi/Paper/149956


Progress 03/01/22 to 02/28/23

Outputs
Target Audience:We have two major target audiences: (1) researchers in agriculture and other adjacent sciences, and (2) industry representatives such as producers, crop consultants, food processors, packing housing, et cetera. Last year, we officially branded this project as "WAVE": Western Agricultural Variety Explorer". During the reporting report, we had three major outreach events: We attended the 2022 Tri-State Grain Growers meeting (https://www.wawg.org/convention/), held in Coeur d'Alene, Idaho in December, 2022. This is an important regional event for producers and industry representatives in grain production from Idaho, Oregon and Washington. In that event, we provided a public face to the project, providing updates and information and receiving feedback and/or suggestions from our primary stakeholders. This was attended by project Co-PI's Julia Piaskowski, Juliet Marshall, Ryan Graebner, Kurt Schroeder and Clark Neely. The primary audience we reached at this event was industry stakeholders (several hundred attend this convention). The second event we attended was the Pacific Northwest Wheat Quality Council, also held in Coeur d'Alene, Idaho in January, 2023. Project lead PI Julia Piaskowski was invited to speak at a panel regarding how the WAVE database funded by this grant will support challenges in improving wheat end-use quality. The primary audience we reached at this event were researchers. We also reached both researchers and industry stakeholders through a regional extension publication, Dryland Field Day Abstracts (https://s3.wp.wsu.edu/uploads/sites/3122/2022/09/FDA-2022-complete.pdf), a joint publication of Washington State University, University of Idaho and Oregon State University, providing research highlights relevant to regional producers. We also requested funds from the Northwest Potato Research Consortium (https://www.nwpotatoresearch.com/) and spoke at their annual meeting about the project. They ultimately declined to fund the addition of Pacific Northwest potato variety testing data to our database. However, the potato variety testing team remains interested in a collaboration with WAVE. Changes/Problems:Throughout the project, we have struggled to find a good database fit for our needs. In the first year of the project, we pivoted from using a custom in-house database to Breedbase, a commercial product from Cornell University. This database ultimately did not serve our needs well, so after considering several different options, we decided to use Genovix. This is the newest product from Agronomix, a company with a long history of providing variety testing databases to public and private plant breeding programs. While no commercial database solution will be a perfect fit, this one does appear to be meeting our needs thus far. It is more expensive than other variety testing solutions, but ultimately, the costs are reasonable and affordable when shared across several programs. We think the improvement in efficiency is worth the additional costs. The costs at this moment are $15,000 per year for 5 concurrent licenses. We can scale up or down as needed depending on the number of participating variety testing programs and the costs will adjust linearly.? Overall, the project is proceeding slightly slower than originally planned for, although there is regular progress. The database difficulties are a major cause of this delay, along with the general unpredictable nature of data curation. We plan to apply for an extension of the project period of one year. What opportunities for training and professional development has the project provided?Both the lead Project PI and staff hired for this project (Emily Galvin) attended professional conferences for the R programming language that enabled them to listen to talks relevant to their work and meet other individuals doing similar work. Specifically, we attended the 2022 UseR! conference (https://user2022.r-project.org/), the 2022 R/Shiny Conference (https://appsilon.com/appsilon-shiny-conference-2022-announcement/, original conference website since taken down) and the 2022 RStudio::Conference (https://posit.co/blog/rstudio-conf-2022-is-open-for-registration/, original conference website since taken down). How have the results been disseminated to communities of interest?We continue to use our website, www.westernagdata.org (also, .com and .net) to advertise our final product. As mentioned earlier in this report, we also attended 2 major regional events that we used to communicate project progress with key stakeholders and invite feedback from them. What do you plan to do during the next reporting period to accomplish the goals?The next year is primarily focused on Objectives 2 and 3. We have largely completed data assemblage and curation (although this is a never-ending task since new data are generated every year). The next step is populating the Genovix database instances with our curated data. This is largely completed for the wheat database. In this next year, we plan to launch the public facing aspects of the WAVE database, focusing on wheat first, and then moving on to the other crops (each having its own database). We will link the Genovix database app to the researcher portal for data querying and download, and to the industry app for data exploration. We are in the process of rebuilding the industry-focused app using Flutter, an open-source language for building mobile apps. Our goal is to have a public launch of the wheat variety testing data tools in WAVE in the next 6 to 9 months. As time allows, we also want to launch the other crop database tools. We also want to transition to a more sustainable funding model (i.e. one not reliant on obtaining Federal grants). Once we have a usable product, we will begin approaching the appropriate parties about long-term support of this project.? It is likely we will need one more year to finish aspects of this project such as finalizing the database launch, training variety testing manager on usage of Genovix, and final data curation. We have sufficient funds in the current project budget for a one-year extension.

Impacts
What was accomplished under these goals? While objective 1 was largely completed in year one of the project, we continued to refine data standards in year two. We have cleaned the majority of the highest priority data: variety testing programs for wheat, barley, brassica oil seeds and cool-season legumes. We continue to pursue adjacent data that is second highest priority: disease rating and end-use quality data of the primary crops and variety testing data from Northern Utah (this was requested by Southern Idaho grain producers). Third level priority data includes other crops: alfalfa, potato and sugar beets. For objective 2, we had to change databases from Breedbase to Genovix because Genovix is a better fit for our data model, a more mature software product, and has more tools that the project PI's want. While it is more expensive than Breedbase, it's overall costs are reasonable and supportable when the project period ends. We have thus far constructed the wheat database. While not without challenges, Genovix ultimately provides a rich set of tools that help us meet our goals. For objective 3, we hired a graphic designer to redesign the website (www.westernagdata.org) and the industry-targeted app. That person provided a comprehensive collection of professional and user-friendly designs that we are working to implement. In this next year, we plan to launch the public facing aspects of this database, focusing on wheat first, and then moving on to the other crops (each having its own database).

Publications


    Progress 03/01/21 to 02/28/22

    Outputs
    Target Audience:We communicated with Idaho wheat growers through the Idaho Wheat Commission (June, 2021) about the project. We also reached regional pulse producers through the Dry Pea and Lentil Council scientific advisory board (September, 2021). We were able to communicate with regional small grains producers at the Tri-State Grain Growers conference held in Spokane, Washington(Nov/Dec, 2021). We also gained valuable insight fromgrowers and industry representatives from Washington, Oregon, and Idaho for wheat, barley, pulses and canola who serve on our projectadvisory board. Changes/Problems:We had originally planned to develop our own database system (that is, a schema, user interface, API, and set of upload tools), but we decided to use an existing open-source product, BreedBase (https://breedbase.org/). There are several advantages to this approach: BreedBase was written for breeding and genetics programs. It was developed for our situation of many institutions wanting to share data through a common interfacewhile maintaining separate workspaces. The database design team has already encountered and solved many of the problems we are experiencing. They have already developed tools we need: data upload, an API, error-checking, etc. It is quite affordable compared to other database solutions available. It is 'BrAPI-compliant', meaning that it will work with other software solutions that follow this data standard (notably, FieldBook). It's a mature product with an extensive user base that has tested the product. It's also under continual development to keep apace with advances and changes in computing. As an open-source product, we can transfer it to our own servers; there is no vendor lock-in. There are drawback to this software: itspermissions levels are crude; and to some extent, it isslightly too full featured for a variety testing program (for example, we don't need to track pedigree information). We have developed some work-arounds to deal with these drawbacks. Weighing all the advantages and disadvantagesa, we do think this is the best solution to meet project needs. What opportunities for training and professional development has the project provided?The project requires continual self-guided training in R programmingfor the datacurators and project director. The project director and project curator attended the 2021 R Users Conference (held online). How have the results been disseminated to communities of interest?Presentations Given: Pacific Northwest Variety Testing DataHub. Presentation to the U.S. Dry Pea & Lentil Council (online).September 2, 2021. Idaho Wheat Variety Testing Database, an update. Presentation to the Idaho Wheat Commission (Moscow, ID). June 7, 2021. What do you plan to do during the next reporting period to accomplish the goals?Objective 1 is an ongoing effort that is continually refined with additions of new data. We will continue to incorporate new data into the project and use that to update our controlled vocabularies. We are seeking additional data that supports this project such as disease resistance data, crop end-use quality data and data from local private industry. We will curate canola and pulse data. Objective 2: We will launch 'BreedBase' database instances for wheat, barley and canola and populate it with data. We may be able to accomplish this for other crops (time permitting). Objective 3: We will connect the industry-focused app with the running BreedBase instances. We also plan several improvements to the core app code: load testing of the app along with accompanyingcode improvements to improve performance, writeunit tests of the code to enable automated checking of the app code, and employing a UX designer to advise us on usability and overall design. We also plan to implement suggested changes from the advisory board which include minor design changes and major additions to the functionality.

    Impacts
    What was accomplished under these goals? Objective 1: Datastandards were established and refined for wheat and barley. We continued to developed controlled vocabularies that define acceptable variables and allowable values for each variable. Objective 2: Data were assembled from 6 of the 7 target variety testing programs. All programs were trained in minimal data standards expected. We decided to use an open-sourcedatabase solution, 'BreedBase' for data organization. This is a mature project that remain under continual development and use across a wide range of breeding and genetic programs with needs similar to our own. Objective 3: We produced an updated version of the industry-focused app. We assembled anadvisory board representing small grains and oilseeds agricultural interested throughout the Pacific Northwest and met; this advisory board provided valuable feedback on the app and suggestions for how to improve it.

    Publications