Source: UNIVERSITY OF CALIFORNIA, DAVIS submitted to
PREDICTION MODELS FOR GRAIN NUTRIENT LEVELS IN THE U.S. MAIZE NAM PANEL, AND GENOMIC ASSOCIATIONS UNDER DROUGHT AND HEAT STRESS IN ZIMBABWE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
1021465
Grant No.
2017-67011-30938
Project No.
CA-D-PLS-2563-CG
Proposal No.
2018-09561
Multistate No.
(N/A)
Program Code
A7101
Project Start Date
Sep 15, 2019
Project End Date
Jan 14, 2022
Grant Year
2020
Project Director
Diepenbrock, C. H.
Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671
Performing Department
Plant Sciences
Non Technical Summary
This project supports the mission of the Agricultural Experiment Station by addressing the Hatch Act area(s) of: plant and animal production, protection, and health; human nutrition; and sustainable agriculture; and molecular biology. Maize varieties with increased grain carotenoid (provitamin A) and tocochromanol (vitamin E) concentration have been identified as an effective intervention for improved nutritional status in developing nations and certain population segments of developed nations, including the U.S. These traits are expensive to quantify in the lab and are controlled by relatively few, large-effect loci in the genome, many of them coding for enzymes in the direct biochemical pathway that produces these nutritional compounds. Thus, prediction models based on genetic markers that target these loci may provide a highly accurate, more efficient, and lower-cost strategy for improving levels of these nutrients in maize through breeding. This project will develop highly accurate genomics-based prediction models in a panel of approximately 5,000 maize inbred lines. The panel was developed by crossing 25 diverse parents for many generations to break the genome into smaller pieces through chromosomal crossover events, or recombination, and thus increase resolution for finding the important genetic loci. The prediction model for this panel will be built upon and expanded from previous findings in a substantially smaller maize inbred diversity panel for use in breeding for improved levels of carotenoids and tocochromanols. Further, there is an urgent need both within provitamin A breeding efforts and more generally to develop varieties that will be highly productive even as climate change takes place at the global level and exerts its impacts regionally. In this light, 350 subtropically adapted maize hybrids are being evaluated in southwest Zimbabwe under combined drought and heat stress vs. heat stress only (well-watered) conditions, in collaboration with the International Maize and Wheat Improvement Center (CIMMYT). Genetic and genomic analyses will be conducted for yield and other agronomic traits of interest, and methods will be shared with and taught interactively to students and fellow researchers in the U.S., Zimbabwe, and eastern Africa. Key objectives will be to identify the genetic loci contributing to natural variation in yield performance under combined drought and heat stress vs. heat only (well-watered), and to predict yield performance based on genetic information. Within this work, inbred and hybrid lines exhibiting favorable performance will be identified and incorporated into breeding programs for orange, provitamin A-dense maize, and the identified genetic loci will be used (including through use of genomic prediction models) to refine selection strategies. These outcomes are of immediate interest to maize breeders and consumers in the U.S., southern Africa, and elsewhere in ensuring a food supply that is sufficient in quantity and quality in the long term.
Animal Health Component
0%
Research Effort Categories
Basic
100%
Applied
0%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2031510108140%
2011510108140%
2011510101020%
Goals / Objectives
1. Assess predictive ability (GP) for carotenoid levels in diverse maize inbreds of relevance to U.S. maize breeding.Status: Genomic prediction in the U.S. nested association mapping (NAM) panel is complete for carotenoids and tocochromanols, comparing genome-wide approaches with others that target varying numbers of quantitative trait loci for these classes of nutrients. I have tested three types of regression (ridge regression, LASSO, and elastic net). My group and I will test random forest and reproducing kernel Hilbert spaces and examine predictions between the U.S. NAM panel and 281-line inbred diversity panel. We will also test the use of these two panels for genomics-assisted prediction of grain carotenoid levels, as measured in a CIMMYT (International Maize and Wheat Improvement Center) inbred association panel evaluated in Mexico.2. Evaluate 350 diverse maize hybrids under managed conditions of combined drought and heat stress vs. heat only (well-watered) in a target environment for orange, provitamin A-dense maize, in collaboration with CIMMYT/HarvestPlus-Zimbabwe.Status: three hybrid trials (two seasons in one location, one season in a second location) in southeast Zimbabwe are complete. This included all steps from generation of hybrid seed to field phenotyping to shipment of dry grain samples to the U.S. I have analyzed the key agronomic traits from season one, and my group and I are currently calculating best linear unbiased predictors that incorporate the season two data.3. Identify associated genetic loci (via genome-wide association study; GWAS) and assess genomics-assisted predictive ability (GP) for yield and other agronomic traits under managed conditions of combined drought and heat stress vs. heat only (well-watered) in Zimbabwe.4. Identify hybrids having (and/or favorable alleles associated with) high yield and favorable agronomic properties under managed conditions of combined drought and heat stress and/or heat only (well-watered).5. Integrate findings into HarvestPlus breeding programs for orange, provitamin A-dense maize.Note:This project has been transferred twice (and the funds had not yet been deposited at the second institution, of three). The first transfer was requested due primarily to a backlog in the collaborating lab for nutritional analysis. We have decided in this second transfer to support a Ph.D. student in completing genetics and genomics analyses of the yield and agronomic data that have been collected from three field seasons (and analyses of relevant diversity panels in the U.S., as detailed above), rather than supporting nutritional analysis on a subset of the samples. These activities represent both a highly relevant portion of the student's predoctoral training and the key remaining step needed for completion of this project. Grain carotenoid traits (which tend to be highly heritable) were examined in Mexico for many of these lines in inbred form (as the hybrids being used in this project were generated from member lines of three inbred association panels, one of which was for carotenoids; Suwarno et al. 2015,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544543/). While a genome-wide association study (GWAS) has been conducted in this inbred panel, marker-assisted selection efforts are still limited to one or a couple of markers, and GP/GS strategies [still not yet conducted or underway for these traits in tropical/subtropical material, to our knowledge] could offer direct utility in selection programs given the few to several genes exhibiting major or moderate effects for the multiple carotenoid traits that are of interest in breeding programs. We propose to examine the predictive ability of genomic prediction models for carotenoid traits in this inbred panel (as trained within the panel itself in a cross-validation approach, or using the models that we have developed in diverse U.S. inbreds), as an alternative to having directly measured carotenoids in the Zimbabwe panel.
Project Methods
Genome-wide association studies (GWAS) will be conducted within R (R Core Team 2017) for yield and agronomic traits using the Genome Association and Prediction Integrated Tool (GAPIT; Lipka et al. 2012). Principal component analysis will be conducted on the genotypic data, and a kinship matrix (based on the same data) and/or an optimal number of PCs will be included in the final GWAS model to account for population structure. Flowering time will be tested as a covariate given that avoidance (e.g. through earliness) is a known mechanism of drought response.Genomic prediction/selection strategies will also be implemented in R. Types of regression that have been tested for nutritional traits in maize (example in Owens et al., 2014) include ridge-regression best linear unbiased prediction (RR-BLUP; Endelman, 2011), Least absolute selection and shrinkage operator (LASSO; Tibshirani, 1996), and elastic net analysis, which implements a weighted average of the RR and LASSO penalties (Zou and Hastie, 2005). Two approaches with ties to machine learning will also be tested that have shown superiority in recent studies (Spindel et al. 2015, Heslot et al. 2014, Crossa et al. 2010): Random Forest (Breiman 2001) and Reproducing Kernel Hilbert Spaces (Gianola and Van Kaamm 2008). Predictive abilities will be calculated through five-fold cross validation, as the correlation of predicted and observed values (Zhou et al., 2017). Iterations of this procedure will be conducted (as in Ovenden et al., 2018) to ensure that a single randomized assignment of genotypes to folds is not influential.

Progress 09/15/19 to 01/14/22

Outputs
Target Audience:Training graduate and undergraduate students: ThreePlant Biology Ph.D. students (one of whom also was also an undergraduate researcher in our lab)and one undergraduate student (a Computer Science major)have received training in quantitative genetics and/or abiotic stress tolerance on this project. PD Diepenbrockalso received pre-doctoral training (in Plant Breeding &Genetics)on this grant prior to its conversion to a standard grant. Teaching graduate and undergraduate students: PD Diepenbrock has given guest lectures that include this work in the following courses: Biotechnology 091 (one lecture, twoyears) Plant Sciences 100C (two lectures and one discussion, two years) Horticulture & Agronomy 200A (one lecture, two years) Nutrition 219B (one lecture, one year) Plant Biology 291 (one lecture, one year) Plant Biology graduate recruitment (one talk, twoyears) Disseminating this work through invited research talks and seminars (many of which were delivered virtually, whether as wasplanned and/or as wasrequired due to the COVID-19 pandemic): Invited research talk, Plant & Animal Genome XXIX, San Diego, CA, Jan. 2022. Crop Science Society of America (CSSA) Translational Genomics Session; held virtually due to COVID-19. Invited virtual seminar, Food for Future conference, University of Cologne, Sept. 2021 Invited virtual seminar, Plant Biology Club, North Carolina State University, Feb. 2021 Invited virtual seminar, Corn Breeding Research Meeting, Feb. 2021 Invited virtual seminar, College of Agriculture, Food and Environmental Sciences, California Polytechnic State University, Oct. 2020 Invited virtual seminar, Department of Crop Sciences, Univ. Illinois. "Genetics of carotenoids, kernel color, and vitamin E in maize grain." Aug. 2020. Invited virtual research talk, UC Davis Plant Science Symposium, Davis, CA. "Provitamin A and vitamin E in maize: genomics to market."May 2020. Invited virtual seminar, Plant Biology Graduate Group, Davis, CA. "Genetics of carotenoids, kernel color, and vitamin E in maize grain."Jan. 2020. Invited research talk, Plant & Animal Genome XVIII, San Diego, CA. "Genetics of carotenoids, kernel color, and vitamin E in maize grain." Flavor, Nutrition, and Post-Harvest Genomics Workshop.Jan. 2020. Invited seminar, Agricultural and Environmental Chemistry Graduate Group, Davis, CA. "Opportunities and challenges in plant breeding for nutritional quality."Dec. 2019. Invited seminar, UC Davis Institute for Global Nutrition, Davis, CA. "Opportunities and challenges in plant breeding for nutritional quality."Nov. 2019. Changes/Problems:--A project meeting in Mexico was not able to be held for this project due to the pandemic; meetings took place via Zoom instead. --Manuscript preparation and submission timelines have been longer than projected for reasons that are describedin the Changes/Problems sections of the annual progress reports. However, the two major components of this project (Zimbabwe field trials and genomic prediction) have been progressing,and concrete timelinesfor manuscript submission are in place as final analyses are wrapped up. What opportunities for training and professional development has the project provided?One graduate student and one undergraduate student were trained on the carotenoid genomic prediction component (withthe graduate student on that component also gaining valuable experience asthey mentored the undergraduate student). Another graduate student was trained on the Zimbabwe field trial component. This project has facilitated cross-cultural collaboration in which these students are directly involved. One of the graduate students hosted a maize nutritional quality-focused collaborator from a CGIAR institution (who is a collaborator and will be a co-author on the carotenoid genomic prediction work) for a virtual seminar visit, which also involved a meeting with our lab group in which students could ask questions regarding both research-related aspects and careers more generally. How have the results been disseminated to communities of interest?Several invited talks and guest lectures (as detailed in the target audiences and products section) were delivered to researchers (including in the domains of plant biology, plant breeding, plant genetics, and/or human nutrition) and graduate and undergraduate students. The graduate student working on the genomic prediction component of this project was registered to present a poster on their work on this project at the Plant & Animal Genome Conference in Jan. 2022, which was cancelled due to COVID-19 (coincident with a surge in the omicron variant). PD Diepenbrock was scheduled to give a talk which included this project at the same conference in the CSSA: Translational Genomics workshop. While the conference was cancelled, this workshop was still held virtually, and Diepenbrock delivered her talk via live Zoom. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Goal 1: Within-environment genomic prediction analyses are complete for RR-BLUP, Lasso, and elastic net. Across-environment genomic prediction analyses are now complete for RR-BLUP. RKHS has been encoded and integrated into the master genomic prediction script (with intermediate checks underway before results are produced on the association panel under study). Thetwo students working on this component also have plans to implement one additional method for genomic prediction with which they have relevant experience.Further details with regards to within-environment results are in the latest annual progress report. Predictive abilities in across-environment analyses were found to be similar to those in within-environment analyses (with limited exceptions), which would be a major factor in support of using genomic prediction in breeding for carotenoid-dense tropical/subtropical maize. Goal 2: 350 diverse maize hybrids were evaluated under combined drought and high-temperature conditions vs. high-temperature conditions only (well-watered). These trials were conducted in Chiredzi, Zimbabwe (two seasons) and Chisumbanje, Zimbabwe (one season). Note that results from one combined drought and high-temperature treatment and two high temperature only (well-watered) treatments are the focus of the manuscript in preparation, based on repeatabilities across replicates within those treatments in a given field season. Traits measured in the field include grain yield, moisture, flowering time (days to anthesis and days to silking), plant height, ear height, number of ears per plant, and (in certain treatments) chlorophyll content via SPAD meter and visual senescence scores. Spatial model fitting was conducted to account for field spatial effects, and best linear unbiased predictors were calculated and used in genome-wide association studies and genomic prediction (Goal 3). Trait relationships within and across treatments were additionally examined in the form of Pearson correlations between untransformed best linear unbiased predictors, and are summarized below (as also reported in the most recent progress report). • Plant height, ear height, grain yield, and days to anthesis and silking exhibited moderate to high repeatabilities in both treatments. • Plant height, ear height, and grain yield exhibited moderate heritabilities in the well-watered (high temperature only) treatment, for which data were available from multiple environments. • Within and across treatments, plant height and ear height were highly positively correlated, as were days to anthesis and days to silking (DTA and DTS). In the drought treatment, DTA and DTS were moderately negatively correlated with grain yield (-0.509 and -0.486, respectively). • Within the well-watered (high temperature only) treatment, number of ears per plant exhibited repeatabilities and heritabilities of 0.37 to 0.49 and was highly correlated with grain yield (r = 0.635). • Yield in the well-watered (high temperature only) treatment exhibited correlation of r = 0.449 with yield in the drought treatment. Goal 3: GWAS and GP analyses are complete with full phenotypic data from Zimbabwe field trials and genotyping-by-sequencing (GBS) data for 250 of the 346 experimental hybrids under study. Further details with regards to results of these analyses are in the latest annual progress report. GBS data were only located in the collaborating institution's database for 250 of 346 experimental hybrids, which appears to be limiting resolution in one or both of these analysis types. Efforts are underway to locate GBS data for as many of the remaining accessions as possible (at which point these analyses will be re-run prior to manuscript submission; full analytical pipelines are in place to do so in an efficient manner). Goal 4: Field phenotypic data and untransformed best linear unbiased predictors (from Goal 2) have been examined to identify favorable hybrids with high yield and favorable agronomic properties in one or both treatments (Goal 4). While yield in the two treatments exhibited correlation of r = 0.449, it appears that this relationship may have primarily been driven by genotypes that were low-performing in both treatments, and thus may not be indicative of a large number of genotypes performing well in both treatments. Indeed, of the 350 genotypes evaluated, 19 genotypes were in the bottom 10% (lowest-performing 34 lines) for both treatments, whereas only three genotypes were in the top 10% (highest-performing 34 lines) for both treatments. Goal 5: Periodic updates are being provided to the collaborative teams in subtropical/tropical breeding programs. Those collaborators who are participating in the genomic prediction component were to be co-authors on the poster to be presented in Jan. 2022.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2022 Citation: LaPorte, M., Suwarno, W., Crossa, J., Palacios Rojas, N.,, and Diepenbrock, C.H. 2022. Predicting carotenoid breeding values and kernel color in maize. Poster abstract for Plant and Animal Genome Conference, San Diego, CA, Jan. 8-12, 2022 (conference cancelled due to COVID-19, such that the poster was not presented and NIFA support could not be acknowledged; we plan to present this work at the 2023 conference instead).
  • Type: Conference Papers and Presentations Status: Other Year Published: 2020 Citation: Diepenbrock, C.H. Invited research talk, Plant & Animal Genome XVIII, San Diego, CA. Genetics of carotenoids, kernel color, and vitamin E in maize grain. Flavor, Nutrition, and Post-Harvest Genomics Workshop. Jan. 2020.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2020 Citation: Invited research talk, UC Davis Plant Science Symposium, Davis, CA. Provitamin A and vitamin E in maize: genomics to market. May 2020.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2022 Citation: Invited research talk, Plant & Animal Genome XXIX, San Diego, CA, Jan. 2022. Crop Science Society of America (CSSA) Translational Genomics Session; held virtually due to COVID-19.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Invited virtual seminar, Food for Future conference, University of Cologne, Sept. 2021
  • Type: Conference Papers and Presentations Status: Other Year Published: 2021 Citation: Invited virtual seminar, Corn Breeding Research Meeting (held virtually), Feb. 2021.


Progress 09/15/20 to 09/14/21

Outputs
Target Audience:Presentations Upcoming: PD Diepenbrock and the graduate student working on the genomic prediction component will be presenting the work involved in this project at the Plant and Animal Genome Conference (January 8 to 12, 2022). Diepenbrock will present in the Crop Science Society of America-sponsored Translational Genomics session. The graduate student will present a poster, which is linked to the genome-wide association study and genomic selection session. Additional presentations by Diepenbrock in which this work was discussed. All of these presentations, with the exception of one guest lecture, were delivered virtually due to the COVID-19 pandemic. These presentations reached (in total) hundreds of graduate and undergraduate students across several disciplines, as well as faculty members and plant breeding/genetics colleagues with a broadrange of expertise. Guest lectures in UC Davis undergraduate courses (Biotechnology 091, Plant Sciences 100C), graduate courses (Horticulture & Agronomy 200A, Nutrition 219B, Plant Biology 291), and Plant Biology graduate recruitment Invited seminar, Food for Future conference, University of Cologne, Sept. 2021 Invited seminar, Plant Biology Club, North Carolina State University, Feb. 2021 Invited seminar, Corn Breeding Research Seminar, Feb. 2021 Invited seminar, College of Agriculture, Food and Environmental Sciences, California Polytechnic State University, Oct. 2020 Changes/Problems:This work wasdelayed while the genotypic data were being curated by a collaborator, while a risk assessment of the software that we use for spatial model fitting (of field phenotypic data) was being conducted, and while we transitioned to remote work in midst of the COVID-19 pandemic;all of these points have now been resolved. Across-environment genomic prediction analyses were temporarily affected by a rare, partial outage of a supercomputing cluster for which repair was delayed due to supply chain shortages. This issueis now resolved. The trip that wasplanned to conduct a working meeting with researchers at the International Maize and Wheat Improvement Center (CIMMYT)-El Batán, Mexico, including within the grant budget, will unfortunately not be able to take place during the project term, due to the ongoing nature of the pandemic and related restrictions on travel. We have continued interactions with that team via e-mail and virtual teleconference. What opportunities for training and professional development has the project provided?Two graduate students are beingtrained on this project: one working on the carotenoid genomic prediction component (while largely funded by a separate fellowship), and one working on the Zimbabwe field trial component. This project has provided exposure and relevant work in the applied breeding realm, in which the students are able to directly collaborate with and review literature related to breeding for nutritional quality and abiotic stress tolerancein tropical/subtropical maize. One undergraduate intern will additionally work on this project this winter, and the graduate student working on the carotenoid genomic prediction component will gain a valuable mentoring opportunity therein (for which they are well-prepared). How have the results been disseminated to communities of interest?Preliminary results have been disseminated through ongoing communication with collaborators in tropical/subtropical maize breeding programs. These results have also been discussed in conference presentations, invited seminars, and guest lectures to graduate and undergraduate students. What do you plan to do during the next reporting period to accomplish the goals? Complete preparation and submission of two manuscripts to peer-reviewed journals. Please note that the remaining reporting period will end in January 2022, and submission is projected to take place in Q2of calendar year 2022 for the Zimbabwe field trial component and later in 2022 or Q1 of calendar year 2023 for the carotenoid genomic prediction component. (The graduate student working on the latter component will need to complete a practicum experience for their separate fellowship in Summer 2022, which requires that they not work on their dissertation research that is based at their home institution during that time; this may somewhat slow down the publication timeline, andthe timeline designatedabove is somewhat conservative accordingly.) Complete upcoming presentations at the Plant and Animal Genome conference in January 2022. Continue to describe this work in invited seminars and guest lectures. This grant will continue tobe acknowledged as a funding source throughout the above-listedactivities.

Impacts
What was accomplished under these goals? Goal 2 was complete prior to this project term. Progress on goals 3 and 4: Traits from the Zimbabwe field trials have undergone spatial model fitting and genome-wide association studies(GWAS) and genomic prediction (GP). We are finalizing the within- and across-environment analyses (as certain additional traits were collected only in certain environments or treatments--e.g., based on whether the trait was segregating in that environment and/or treatment), and preparing final graphics for publication. Selected results in the form of text (and captions for completed figuresand tables) are enclosed below; please note that these are unpublished data. Treatments discussed herein are acombined drought and high-temperature treatmentand a well-watered (hightemperature only) treatment. A manuscript draft has been in preparation concomitantly, which we plan to submit to Crop Science or another breeding/genetics journal in Q2of 2022, with acknowledgment of this grant as a funding source. We are also following up on marker-trait associations detected in GWAS. While linkage disequilibrium decay has been found to take place at shorter distances in diverse tropical/subtropical maize inbreds (including a panel directly relevant to this study), we anticipate that the GWAS and candidate gene identification portions of this study will be de-emphasized in the final manuscript due to the genetic architecture of the traits of interest beingrelatively polygenic(e.g., compared to certain nutritional quality traits), and the extent to which the agronomic trait outcomes themselves and potential utility of GP models to predictbreeding values forthese traits are of interest. Plant height, ear height, grain yield, and days to anthesis and silking exhibited moderate to high repeatabilitiesin bothtreatments. Plant height, ear height, and grain yield exhibited moderateheritabilities in the well-watered (high temperature only) treatment, for which data were available from multiple environments. Within and across treatments, plant height and ear height were highly positively correlated, as were days to anthesis and days to silking (DTA and DTS). In the drought treatment, DTA and DTS were moderately negatively correlated with grain yield (-0.509 and -0.486, respectively). Within the well-watered (high temperature only) treatment, number of ears per plant exhibited repeatabilities and heritabilities of 0.37 to 0.49 and was highly correlated with grain yield (r= 0.635). Yield in the well-watered (high temperature only)treatment exhibited correlation ofr= 0.449 with yield in the drought treatment. Table 1. Line-mean repeatabilities (within environment) and heritabilities (across environments) by treatment and season, estimated using Cullis method (Hung et al. 2012, eq.4). Note that the drought treatment in Season 2 was not analyzed further due to low repeatability across replicates within treatment. Season 1: Chiredzi (the location of the field experiment station in southeast Zimbabwe) in the first field season of 2016. Season 2: Chiredzi in the second field season of 2016. Figure 1. Heat map depicting correlations of untransformed best linear unbiased predictors (BLUPs) among agronomic traits assayed in one or both treatments. Table 2. Correlations among untransformed BLUPs (same as depicted in Figure 1) in tabular format. Thick boxes are placed around traits assayed within the same treatment (within or across environments). Table 3. Accuracy in genomic prediction; five-fold cross-validation, 20 iterations; run in TASSEL. Progress on goals 1 and 5: Genomic prediction (GP) analyses for grain carotenoid traits are complete within each of four field environments, via three regression types: RR-BLUP, Lasso, and elastic net. Selected results in the form of text (and captions for completedfiguresand tables)are enclosed below. To summarize, predictive abilities varied by trait and environment and were generally moderate to high. RR-BLUP exhibited reasonable performance with occasional outperformance by another of the methods tested (e.g., for a given trait-environment combination).The graduate student working on this component is now conducting across-environment analyses and preparing to test use of different marker sets (e.g., targeting two or 12 genes identified for carotenoid traits vs. genome-wide markers) and relevant ancillary data sets (including for purposes of additional validation and/or crossvalidation). An undergraduate intern will focus this winter (on a research credit basis) on implementing additional regression types, namely those in the realm of machine learning. This intern will be mentored by the graduate student who is working on this component. This component now has an expanded scope compared to its initial conception, in part due to an opportunity to partner even more directly with a tropical/subtropical maize breeding program in its implementation--which is important for translational impact (and can also affect which types of validation are most relevant and informative). This work will likely thus result in a larger publication shortly after the end of the project term, with acknowledgment of this grant as a funding source. Figure 2. Predictive ability for grain carotenoid traits in genomic prediction in four field environments. Regression types tested: ridge regression-best linear unbiased prediction (RR-BLUP), least absolute shrinkage and selection operator (LASSO), and elastic net. An alpha value (along the x-axis) of 0 corresponds to RR-BLUP, 1 corresponds to LASSO, and intermediate values represent elastic net. Trait abbreviations: lutein (LUT), zeaxanthin (ZEA), beta-cryptoxanthin (BCRY), beta-carotene (BCAR), and provitamin A (PROA). BCAR9 and BCAR13 further specify isomers of beta-carotene. Please note that these are unpublished data. PD Diepenbrock and the graduate student working on the GP component have published two papers in 2021 that will directly inform the carotenoid genomic prediction component of this project. Namely, twelve genes (eleven in Diepenbrock et al. 2021 and one in LaPorte et al. 2021) have been identified in association with natural variation for carotenoid traits in maize grain. This larger set could complement the two genes that have been deployed in marker-assisted selection efforts for accelerated gains. Additionally, these gains could be achieved for more priority traits, allowing simultaneous improvement of zeaxanthin and lutein (important for eye health) and kernel color, alongside provitamin A. Genomic prediction/selection could be a viable strategy for the integration of this larger gene set (and potential smaller effects throughout genetic backgrounds) into genomics-assisted selection strategies. Note that M. Vachev and M. Fenn also worked on the latter manuscript while rotating in PD Diepenbrock's lab as first-year graduate students. C.H. Diepenbrock†, D.C. Ilut, M. Magallanes-Lundback, C.B. Kandianis, A.E. Lipka, P.J. Bradbury, James B. Holland, J. Hamilton, E. Wooldridge, B. Vaillancourt, E. Góngora-Castillo, J.G. Wallace, J. Cepela, M. Mateos-Hernandez, B.F. Owens, T. Tiede, E.S. Buckler, T. Rocheford, C.R. Buell, M.A. Gore†, and D. DellaPenna†. 2021. Eleven biosynthetic genes explain the majority of natural variation in carotenoid levels in maize grain. The Plant Cell 33(4): 882-900. M.-F. LaPorte, M. Vachesv*, M. Fenn*, C.H. Diepenbrock. Simultaneous dissection of grain carotenoid levels and kernel color in biparental maize populations with yellow-to-orange grain. bioRxiv 2021.09.01.458275. Accepted to a peer-reviewed journal contingent on minor revisions, which were submitted on Nov. 19, 2021.

Publications

  • Type: Conference Papers and Presentations Status: Under Review Year Published: 2021 Citation: Laporte, M., and Diepenbrock, C.H. 2021. Predicting carotenoid breeding values and kernel color in maize. Poster abstract for Plant and Animal Genome Conference, San Diego, CA, Jan. 8-12, 2022 (upcoming).


Progress 09/15/19 to 09/14/20

Outputs
Target Audience:Two Plant Biology Ph.D. students and two undergraduate students who are being trained in quantitative genetics methods, including on this project. We are interacting with two tropical/subtropical maize breeding teams (in Zimbabwe and Mexico) related to the implementation of genomic prediction models and further use of the results in this study. This work was additionally included in presentations at the Plant & Animal Genome conference and seminars both internally at UC Davis (multiple departments and graduate groups, across the Institute for Global Nutrition, Agricultural and Environmental Chemistry, Plant Biology, etc.) and externally (at the University of Illinois and California Polytechnic State University; both virtual presentations). These presentations were tailored for undergraduates, graduate school applicants, and graduate students, as well as postdoctoral researchers, faculty members, etc. Changes/Problems:We had faced delays in obtaining one of the involved (i.e. genotypic) data sets from a collaborating institution (due to the need for de novo collation for this germplasm panel), and in renewing licensing for an important software for spatial model fitting. Both of those are now in place, and the core and ancillary analyses are well underway. We still plan to integrate Reproducing Kernel Hilbert Spaces and Random Forest into the regression types that we test routinely in genomic prediction. With COVID ongoing the number of logistical hurdles have been such that it has not been feasible to interact with a rotation student who could bring this capacity on board. However, we plan to integrate RKHS via the BGLR R package, and Random Forest via the RandomForests R package. Each of these will take only a small new section of code in their final formulations, but require a certain extent of cross-validation of one or more involved (hyper)parameters even if default values are otherwise used in the program's execution. The collaborative teams interacting on this work are additionally in regular contact, and we are prepared to spend the next few months on preparation of manuscripts from this work What opportunities for training and professional development has the project provided?Two Plant Biology Ph.D. students and two undergraduate students (along with a couple of rotating first-year graduate students, in a smaller-scale manner) are being trained in quantitative genetics methods--including but not limited to GWAS and GP--on this project. We are also using this opportunity to review procedures for (and principles of) spatial model fitting and estimation of best linear unbiased predictors. Our entire research group had the opportunity to meet (and ask questions, including related to careers, of) one of the collaborators on this project (who is based at an international breeding center in Mexico), and one of the graduate students hosted them for a virtual seminar at UC Davis.Once in-person workshops are enabled again, we plan to co-develop a free interactive workshop (focused on GWAS and GP methods) targeted for graduate and undergraduate students in other research groups as well, which will use this data set as one of a couple of examples. PD Diepenbrock has taught such workshops in the past, and would like to incorporate this data set (as the stress component that it involves would represent an additional layer of complexity), and with the additional regression types incorporated (so that the participants then have the code base on hand to also test these in their future analyses). How have the results been disseminated to communities of interest?This work was included in presentations at the Plant & Animal Genome conference and seminars both internally at UC Davis (multiple departments and graduate groups, across the Institute for Global Nutrition, Agricultural and Environmental Chemistry, Plant Biology, etc.) and externally (at the University of Illinois and California Polytechnic State University; both virtual presentations). These presentations were tailored for undergraduates, graduate school applicants, and graduate students, as well as postdoctoral researchers, faculty members, etc. The graduate students have not yet been able to present their work at conferences due to the pandemic, but we are prepared for them to do so once those activities are enabled again. Sharing and discussion of results with collaborators is also taking place. What do you plan to do during the next reporting period to accomplish the goals?We are prepared to spend the next few months on completion of final statistical analyses and preparation of manuscripts from this work. We will additionally continue to work with the two involved breeding teams to integrate the findings and developed models into their workflows, for purposes of longevity (with continued collaboration as well) beyond the project term.

Impacts
What was accomplished under these goals? Parts 1 and 2 were already largely in place prior to this project term, with the exception that we will also test Reproducing Kernel Hilbert Spaces and Random Forest as regression types in genomic prediction once we have optimized those (being developed for the first time within in our group, for use in the 350-line and adjacent carotenoid panels as well). 3. Identify associated genetic loci (via genome-wide association study; GWAS) and assess genomics-assisted predictive ability (GP) for yield and other agronomic traits under managed conditions of combined drought and heat stress vs. heat only (well-watered) in Zimbabwe. Trait distributions were examined both before and after calculation of best linear unbiased predictors (BLUPs). The terminal drought stress treatment--targeted to commence by time of flowering, as a particularly susceptible stage for maize--was successful in both of the environments being used in these analyses, with substantial yield reductions observed in the combined drought + heat treatment compared to heat only (well-watered). We generated BLUPs for the diverse experimental genotypes evaluated in this study, in each treatment, across the two field environments. The field design was an augmented alpha (0,1) lattice design, with two master checks in each block of 16 plots, and two replicates per treatment. The model fitting process was conducted in AsREML-R v4 and considered check and treatment as fixed effects, and the following as random effects: genotype, environment, treatment by environment, replicate within treatment within environment, block within replicate within treatment within environment, genotype by treatment, and genotype by environment. We have fit this model for yield and will do so shortly for the other traits of interest in this panel (flowering time, plant height, ear height, etc), having used yield to refine and test our analytical workflow and conduct overall examinations of trial outcomes. We have conducted a genome-wide association study (GWAS) via TASSEL ('Trait Analysis by aSSociation, Evolution, and Linkage'; Bradbury et al. 2007, https://pubmed.ncbi.nlm.nih.gov/17586829/), and GAPIT (Genomic Association and Prediction Integrated Tool; Lipka et al. 2012, https://academic.oup.com/bioinformatics/article/28/18/2397/252743). Both TASSEL and GAPIT are open-source software programs for the analysis of genetic diversity and associations. 365,433 single-nucleotide polymorphisms (assayed via genotyping-by-sequencing) passed quality filters, with a minor allele frequency less than 0.05. This diversity panel was indeed quite unstructured, with the first three principal components cumulatively explaining less than 10% of the variance. We have GBS data (and successful matching of genotype and phenotype) for 250 panel accessions thus far, and have requested the existing data that is available for certain of the remaining panel accessions so that we can conduct these analyses again with the entirety of the panel included. A couple of hits preliminarily looked promising in the drought/heat or heat only (well-watered) treatments (one appeared to be statistically significant compared to the Bonferroni threshold, which is quite conservative; 0.05/365,433 = 1.368 x 10-7). GWAS will additionally be conducted for the other, aforementioned traits of interest, which could be expected to be more oligogenic. We also plan to conduct GWAS on the difference between performance in the two treatments, to test for genetic associations with that difference. We have now run genomic prediction for yield performance in both TASSEL and R (the latter using the rrBLUP package; Endelman 2011, https://acsess.onlinelibrary.wiley.com/doi/full/10.3835/plantgenome2011.08.0024). Preliminary predictive abilities for yield in the well-watered treatment look to be just exceeding 0.4 in five-fold cross-validation (with small standard errors across folds and iterations). Preliminary predictive abilities in the water-limited treatment have been somewhat lower, just exceeding 0.3 (still with small standard errors). Alongside additional regression types, we are also preparing to test multiple additional configurations of within- and across-environment models, with the goal of identifying models with increased predictive abilities and examining scenarios in which predictive ability breaks down; examples of these are found in Cuevas et al. (2016; https://doi.org/10.3835/plantgenome2016.03.0024). We have additionally tested genomic prediction methods for carotenoid traits in a somewhat related diversity panel, which we had planned to examine within the scope of this study, as mentioned in Part 5. (The 350-line panel examined in this work was a combination of three diversity panels, which were previously examined separately for carotenoids, drought, and low nitrogen.) Preliminary predictive abilities for carotenoid traits that are priorities for human health (lutein, zeaxanthin, beta-cryptoxanthin, beta-carotene, and provitamin A) in that panel were0.54 to 0.75 (ranges are across the traits) in one location in 2010; 0.45 to 0.67 in the same location in 2011; and 0.41 to 0.50 in another location in 2012, with small standard errors in all cases (SEs of 0.006 to 0.017). The same within- and across-environment scenarios mentioned above will be further tested here as well. 4. Identify hybrids having (and/or favorable alleles associated with) high yield and favorable agronomic properties under managed conditions of combined drought and heat stress and/or heat only (well-watered). In examination of BLUPs, eight genotypes were in the top 10% of genotypes (i.e. top 25 genotypes) in both treatments in terms of their yield performance. The panel genotypes were additionally highly competitive with local checks. We have extracted the identities of several favorable genotypes, and will groundtruth them against breeder/collaborator observations (and recent selections) once the examination of yield stability across environments and other analyses that examine individual accessions by name are complete, to avoid introducing bias intermediate to these analytical steps. 5. Integrate findings into HarvestPlus breeding programs for orange, provitamin A-dense maize. Promising inbreds have already been identified based on their performance in the isolation crossing block used to generate seed for this project. The hybrids identified in the section described above will be integrated into breeding efforts in Zimbabwe and potentially also in partnering (e.g. other CIMMYT and regional) maize breeding programs. We additionally plan to test the use of genomic prediction models in selection itself (within the breeding programs, who have expressed interest in doing so), and to determine to what extent those models recapitulate and/or suggest somewhat different selection candidates (in terms of breeding values) than those being identified and advanced through phenotypic selection. Finally, we will further examine candidate genes proximal to genomic associations for yield and other agronomic traits in one or both treatments.

Publications

  • Type: Journal Articles Status: Other Year Published: 2021 Citation: Two manuscripts are in preparation. We had faced delays in obtaining one of the involved (i.e. genotypic) data sets from a collaborating institution (due to the need for de novo collation for this germplasm panel), and in renewing licensing for an important software for spatial model fitting. Both of those are now in place, and the core and ancillary analyses are well underway.