Progress 01/01/22 to 12/31/24
Outputs Target Audience:1. Computational Biology Community: Presented posters and student seminars to the Cornell computational biology program, Plant and Animal Genome Conferences. 2. Applied Plant Breeders in precision agriculture: Presented posters to the Plant and Animal Genome Conference and National Association of Plant Breeders Conference. 3. Underserved communities in science, technology, engineering, and mathematics (STEM): National Convention for National Society of Black Engineers and Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) National Diversity in STEM Conference. Changes/Problems:Taylor graduated in Spring of 2024, 10 months before this Fellowship termed. Taylor was offered and accepted a position as a Senior Data Scientist working in Applied Biological AI & Strategy & Gene Editing, at Corteva Agriscience. What opportunities for training and professional development has the project provided? Mentoring: One-on-one mentorship program with Bayer Crop Sciences professional Weekly 2-hour writing group with Black PhD Students Workshops: Scientific Writing for Journal Articles, Creating an Industry-Ready Resume Seminars: Cornell Plant Breeding and Genetics Seminar, Cornell Department of Computational Biology SeminarsAttendance of Conferences: National Society of Black Engineers 49th Convention, Plant and Animal Genome Conference, Corteva DELTA, National Association of Plant Breeders, Society for the Advancement of Chicanos/Hispanics and Native Americans in Science conference How have the results been disseminated to communities of interest?In 2022, I gave poster presentations at the International Conference for Arabidopsis Research, National Society of Black Engineers 49th Convention, and the Plant and Animal Genome Conference. To share my research and give K-12 students an idea of scientific life, I gave a Black History Month presentation for Black and Indigenous students of the Ithaca area middle school. To share findings with the greater computational biology community, I have gave a student seminar to the Cornell University Department of Computational Biology and the AfroBiotech Symposium. In 2023, I presented my research at the Plant and Animal Genome Conference, the National Convention for the National Society of Black Engineers, the National Association of Plant Breeders Conference, and the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) National Diversity in STEM Conference. My paper on regulatory network-based machine learning for gene expression was put on biorXiv, a preprint server, to allow open access to my work and results. In 2024, I gave a dissertation seminar to the Cornell Community on my thesis work, "Learning Regulatory Contributions to Gene Expression Variation in Grasses." What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
I developed PLExBench, the first benchmark suite designed for evaluating gene expression prediction in plants, focusing on Arabidopsis thaliana and Zea mays. PLExBench provides a standardized set of tasks that enables rigorous comparison of state-of-the-art prediction methods, offering a critical tool for assessing their strengths and weaknesses. Additionally, I applied deep learning models to dissect cis-regulatory contributions to tissue-specific gene expression in maize. Leveraging perturbation-driven experimental data, I systematically uncovered key regulatory mechanisms and demonstrated the power of deep learning approaches in accurately predicting tissue-specific gene expression profiles.
Publications
- Type:
Other Journal Articles
Status:
Published
Year Published:
2024
Citation:
Wrightsman T, Ferebee TH, Romay MC, AuBuchon-Elder T, Phillips AR, Syring M, Kellogg ES, Buckler ES (2024). Current genomic deep learning architectures generalize across grass species but not alleles. bioRxiv https://doi.org/10.1101/2024.04.11.589024
- Type:
Theses/Dissertations
Status:
Published
Year Published:
2024
Citation:
Learning Regulatory Contributions to Gene Expression Variation in Grasses
TH Ferebee - 2024
|
Progress 01/01/23 to 12/31/23
Outputs Target Audience:1. Computational Biology Community: Presented posters and student seminars to the Cornell computational biology program, Plant and Animal Genome Conferences. 2. Applied Plant Breeders in precision agriculture: Presented posters to the Plant and Animal Genome Conference and National Association of Plant Breeders Conference 3. Underserved communities in science, technology, engineering, and mathematics (STEM): National Convention for National Society of Black Engineers and Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) National Diversity in STEM Conference. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Mentoring: One-on-one mentorship program with Bayer Crop Sciences professional Weekly 2-hour writing group with Black PhD Students Workshops: Scientific Writing for Journal Articles, Creating an Industry-Ready Resume Seminars: Cornell Plant Breeding and Genetics Seminar, Cornell Department of Computational Biology SeminarsAttendance of Conferences: National Society of Black Engineers 49th Convention, Plant and Animal Genome Conference, Corteva DELTA, National Association of Plant Breeders, Society for the Advancement of Chicanos/Hispanics and Native Americans in Science conference. How have the results been disseminated to communities of interest?In 2023, I presented my research at the Plant and Animal Genome Conference, the National Convention for the National Society of Black Engineers, the National Association of Plant Breeders Conference, and the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) National Diversity in STEM Conference. My paper on regulatory network-based machine learning for gene expression was put on biorXiv, a preprint server, to allow open access to my work and results. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
I developed PLExBench, the first benchmark suite designed for evaluating gene expression prediction in plants, focusing on Arabidopsis thaliana and Zea mays. PLExBench provides a standardized set of tasks that enables rigorous comparison of state-of-the-art prediction methods, offering a critical tool for assessing their strengths and weaknesses. Additionally, I applied deep learning models to dissect cis-regulatory contributions to tissue-specific gene expression in maize. Leveraging perturbation-driven experimental data, I systematically uncovered key regulatory mechanisms and demonstrated the power of deep learning approaches in accurately predicting tissue-specific gene expression profiles.
Publications
- Type:
Other Journal Articles
Status:
Published
Year Published:
2023
Citation:
Schulz AJ, Zhai J, AuBuchon-Elder T, El-Walid M, Ferebee TH, Gilmore EH, Hufford MB, Johnson LC, Kelloff EA, La T, Long E, Miller ZR, Romay MC, Seetharam AS, Stitzer MC, Wrightsman T, Buckler ES, Monier B, Hsu SK (2023). Fishing for a reelGene: evaluating gene models with evolution and machine learning. bioRxiv https://doi.org/10.1101/2023.09.19.558246
- Type:
Other Journal Articles
Status:
Published
Year Published:
2023
Citation:
Ferebee TH, Buckler ES (2023). Exploring the utility of regulatory network-based machine learning for gene expression prediction in maize. bioRxiv https://doi.org/10.1101/2023.05.11.540406
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2023
Citation:
Khaipho-Burch M, Ferebee T, Giri A, Ramstein G, Monier B, Yi E, Romay MC, Buckler ES (2023). Elucidating the patterns of pleiotropy and its biological relevance in maize. PLoS Genetics 19(3):e1010664. https://doi.org/10.1371/journal.pgen.1010664.
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2023
Citation:
Ferebee, T. H., & Buckler, E. S. On the effectiveness of graph neural networks for maize gene expression prediction [Poster]. Plant and Animal Genomes Conference, San Diego, United States of America.
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2023
Citation:
Ferebee, T. H., & Buckler, E. S. Applications of machine learning for understanding regulatory contributions to gene expression variation in grasses [Poster]. National Society of Black Engineers 49th Convention, Kansas City, United States of America.
|
Progress 01/01/22 to 12/31/22
Outputs Target Audience:I have interacted with the following communities: Computational Biology Community: Presented posters and student seminars to the Cornell computational biology program, Plant and Animal Genome Conferences and the International Conference on Arabidopsis Research. Applied Plant Breeders in precision agriculture: Presented posters to the Plant and Animal Genome Conference and the International Conference on Arabidopsis Research communities. Underserved communities in science, technology, engineering, and mathematics (STEM): Gave a 30 minute talk to Black Women in Computational Biology, to Black and Indigenous elementary students of the Ithaca area, presented poster at the 49th National Convention for National Society of Black Engineers Changes/Problems:I contracted COVID-19 which slowed down progress for around 9 weeks. The next reporting period will be where I can catch up on the unmet goals. What opportunities for training and professional development has the project provided? Mentoring: One-on-one mentorship program with Bayer Crop Sciences professional Weekly 2 hour writing group with Black PhD Students LinkedIn Learning Modules: Crafting and Sharing Your Personal Story, Creating the Perfect Elevator Pitch, PowerPoint for Mac Essential Trainings, 8 Easy Ways to Make Your PowerPoint Stand Out, Intermediate SQL and Data Analysis Workshops: Scientific Writing for Journal Articles, Creating an Industry-Ready Resume Seminars: Cornell Plant Breeding and Genetics Seminar, Cornell Department of Computational Biology Seminars, Resisting Erasure: Black Woman Scholars In Defense of Themselves, Application of Deep Learning on Graphs Attendance of Conferences: National Society of Black Engineers 49th Convention, Plant and Animal Genome Conference, International Conference for Arabidopsis Research How have the results been disseminated to communities of interest?I have given poster presentations at the International Conference for Arabidopsis Research, National Society of Black Engineers 49th Convention, and the Plant and Animal Genome Conference. To share my research and give K-12 students an idea of scientific life, I have given a Black History Month presentation for Black and Indigenous students of the Ithaca area middle school. To share findings with the greater computational biology community, I have given a student seminar to the Cornell University Department of Computational Biology and the AfroBiotech Symposium. What do you plan to do during the next reporting period to accomplish the goals?Goals for the next reporting period include 1) developing and testing the next set of models for predicting gene expression and 2) preparations for journal submissions. For developing and testing the next set of models, we would like to first test feature representations for the regulatory sequences. These representations will allow us to ask questions about the effects of perturbations in regulatory features, which is one of the goals not yet met. Models that will be tested will be autoencoders, local linear embeddings, and matrix factorizations. Next, we will be taking the expression data and using it to determine plant contexts, such as plant tissue. Finally, we will combine the best performing sequence representations and the plant context expression data in order to predict gene expression between different contexts. This portion of the model will give some important information on how a possible dysregulation could impact the plant, as stated in one of the unmet objectives. Finally, to complete the 2nd main goal, these results will be gathered into a research journal article for publication.
Impacts What was accomplished under these goals?
In this reporting period, we aimed to explore the efficacy of using machine learning (ML) to predict gene expression in maize using sorghum, rice and thale cress. Gene expression prediction models have the opportunity to provide genetic engineers and breeders insights on the interactive effects of modifying gene targets. To this end, we created machine learning gene expression prediction models (graph autoencoders) based on the structure of the relationships between genes. To compare our results, we also looked at a simpler model with no network information to provide a basis for comparison. To address our first goal of data collection and processing, we collected RNA sequencing data from Zea mays, Sorghum bicolor and Oryza sativa. We also collected a gene regulatory network, heavily verified with experimental data, from thale cress. We used these data to predict gene expression within and between the species. Within species, non-network models and network deep learning models predicted abundance with correlations of 0.7 and 0.6, respectively. When predicting out of species expression abundance in unseen experiments, the network deep learning model predicted expression with correlations of 0.5. Our results suggest that gene expression prediction accounting for discrete biological network structure can be used to predict target gene expression values in unseen experiments with moderate accuracy. For the next set of aims, we collected gene expression, transcription-factor binding sites, and regulatory sequences from over 30 grass species so that we can begin to predict gene expression in those species. We performed quality controls on the sequencing data, as well as, built and implemented a gene expression quantification pipeline.
Publications
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
Ferebee, T. H., & Buckler, E. S. Applications of machine learning for understanding regulatory contributions to gene expression variation in grasses [Poster]. National Society of Black Engineers 49th Convention, Kansas City, United States of America.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
Ferebee, T. H., & Buckler, E. S. On the effectiveness of graph neural networks for maize gene expression prediction [Poster]. Plant and Animal Genomes Conference, San Diego, United States of America.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Ferebee, T. H., & Buckler, E. S. Cross-species prediction of maize gene expression using graph neural networks [Poster]. International Conference for Arabidopsis Research, Belfast, Northern Ireland.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Ferebee, T. H., & Buckler, E. S. Cross-species prediction of maize gene expression using graph neural networks [Talk]. Cornell Computational Biology Seminar Series, Ithaca, United States of America.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Ferebee, T. H., & Buckler, E. S. Cross-species prediction of maize gene expression using graph neural networks [Talk]. AfroBiotech Conference, Virtual, United States of America.
|
|