1
0
mirror of https://github.com/ami-sc/AgAdapt.git synced 2024-07-04 10:47:49 +02:00
Multimodal data fusion for maize phenotype prediction across environments
Go to file
Emilio Soriano 2ba1cd3e5b
Add license
2023-11-22 15:10:26 -06:00
Code Add XGBoost Model Training script. 2022-04-19 08:46:07 -05:00
Diagrams Add Genotype Data Processing Diagram. 2022-05-26 03:28:37 -05:00
Notebooks Revert "SNP feature selection" 2022-11-16 17:20:06 -06:00
Results Revert "SNP feature selection" 2022-11-16 17:20:06 -06:00
.gitignore Ignore source files for diagrams. 2022-05-26 03:28:18 -05:00
AgAdapt_Logo.png Add project logo. 2022-02-06 11:51:04 -06:00
LICENSE Add license 2023-11-22 15:10:26 -06:00
README.md Add project description. 2022-02-06 11:51:13 -06:00

Multimodal Data Fusion for Maize Phenotype Prediction across Environments


Summary

The AgAdapt algorithm aims to provide multimodal phenotype prediction while using the minimum number of predictor features possible.

A challenging problem in biology is incorporating large-scale data from multiple sources into machine learning models to predict organism traits. We employ deep-learning dimensionality reduction techniques for condensing large data into meaningful predictor variables. Models are then trained using a gradient-boosting regression approach.

Our AgAdapt algorithm can serve as a tool for efficient crop production and breeding.

References

General References

[1] McFarland, B.A., AlKhalifah, N., Bohn, M. et al. Maize genomes to fields (G2F): 20142017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets. BMC Res Notes 13, 71 (2020). https://doi.org/10.1186/s13104-020-4922-8

[2] C J Battey, Gabrielle C Coffing, Andrew D Kern, Visualizing population structure with variational autoencoders, G3 Genes|Genomes|Genetics, Volume 11, Issue 1, January 2021, jkaa036, https://doi.org/10.1093/g3journal/jkaa036

Software and Packages

[1] Peter J. Bradbury, Zhiwu Zhang, Dallas E. Kroon, Terry M. Casstevens, Yogesh Ramdoss, Edward S. Buckler, TASSEL: software for association mapping of complex traits in diverse samples, Bioinformatics, Volume 23, Issue 19, 1 October 2007, Pages 26332635, https://doi.org/10.1093/bioinformatics/btm308

[2] Jombart T (2008). adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics, 24, 1403-1405. https://doi.org/10.1093/bioinformatics/btn129

[3] Knaus, B.J. and Grünwald, N.J. (2017), vcfR: a package to manipulate and visualize variant call format data in R. Mol Ecol Resour, 17: 44-53. https://doi.org/10.1111/1755-0998.12549