Research

I work in multidisciplinary teams to develop and apply AI/ML, as well as mathematical and computational technologies, to biology and health. Rapid advances in areas like machine learning-based artificial intelligence (AI) means that we have a greater and more powerful range of tools than ever before to support scientific advances in biology and translate this to impact in areas like personalised medicine.

I recently joined the University of Melbourne (School of Mathematics & Statistics) as a Research Fellow in Biological Data Science, where I'm working within Melbourne Integrative Genomics (MIG) and the ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS) to advance AI-related research. I am working with Dr. Heejung Shim (Shim Lab) on statistical and machine learning methods to analyse complex and large-scale biological data.

My PhD research developed probabilistic machine learning and deep learning methods for personalised medicine applications, aiming to address how to meet the statistical inference needs of individual-level analyses, while effectively utilising the power of large datasets that capture complex relationships explaining patient outcomes. My PhD was advised by Prof. Samuel Kaski (Aalto University, University of Manchester) at the Finnish Center for Artificial Intelligence (FCAI) and Probabilistic Machine Learning (PML) group at Aalto University, in collaboration with the INTERVENE consortium and Institute for Molecular Medicine Finland (FIMM). This work made extensive use of large-scale health and biological data sources (genetic biobanks, population-scale health registers, electronic health records, etc.), including the FinRegistry, UK Biobank and FinnGen datasets.

I started my research career at the University of Sydney (School of Mathematics and Statistics), where I received First Class Honours in Applied Mathematics and worked with Prof. Eduardo Altmann and Dr. Lamiae Azizi on complex systems and network science research.

publication list

Preprint

Sophie Wharrie, Lisa Eick, Lotta Mäkinen, Andrea Ganna, Samuel Kaski, FinnGen, Bayesian Meta-Learning for Improving Generalizability of Health Prediction Models With Similar Causal Mechanisms, arXiv preprint, 2024, https://arxiv.org/abs/2310.12595

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Andrea Ganna, Samuel Kaski. (2023). Characterizing personalized effects of family information on disease risk using graph representation learning. Proceedings of the 8th Machine Learning for Healthcare Conference, New York, USA, in Proceedings of Machine Learning Research (PMLR), 219:824-845. https://proceedings.mlr.press/v219/wharrie23a.html

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes, Bioinformatics, Volume 39, Issue 9, September 2023, https://doi.org/10.1093/bioinformatics/btad535

Journal/conference publication

Sophie Wharrie, Lamiae Azizi, Eduardo G. Altmann, Micro-, meso-, macroscales: The effect of triangles on communities in networks, Physical Review E, Volume 100, Issue 2, August 2019, https://link.aps.org/doi/10.1103/PhysRevE.100.022315

Workshop paper

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: An efficient tool for generating large-scale genetics datasets from limited training data, NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, New Orleans, USA, 2022

Journal/conference publication

Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike O Heyne, Nina Mars, Genes & Health Research Team, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert, Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning. The American Journal of Human Genetics, 111.7 (2024): 1431-1447.

Academic
publication list

Preprint

Sophie Wharrie, Lisa Eick, Lotta Mäkinen, Andrea Ganna, Samuel Kaski, FinnGen, Bayesian Meta-Learning for Improving Generalizability of Health Prediction Models With Similar Causal Mechanisms, arXiv preprint, 2024, https://arxiv.org/abs/2310.12595

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Andrea Ganna, Samuel Kaski. (2023). Characterizing personalized effects of family information on disease risk using graph representation learning. Proceedings of the 8th Machine Learning for Healthcare Conference, New York, USA, in Proceedings of Machine Learning Research (PMLR), 219:824-845. https://proceedings.mlr.press/v219/wharrie23a.html

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes, Bioinformatics, Volume 39, Issue 9, September 2023, https://doi.org/10.1093/bioinformatics/btad535

Journal/conference publication

Sophie Wharrie, Lamiae Azizi, Eduardo G. Altmann, Micro-, meso-, macroscales: The effect of triangles on communities in networks, Physical Review E, Volume 100, Issue 2, August 2019, https://link.aps.org/doi/10.1103/PhysRevE.100.022315

Workshop paper

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: An efficient tool for generating large-scale genetics datasets from limited training data, NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, New Orleans, USA, 2022

Journal/conference publication

Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike O Heyne, Nina Mars, Genes & Health Research Team, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert, Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning. The American Journal of Human Genetics, 111.7 (2024): 1431-1447.

© 2025 Sophie wharrie