research

Ai/ML FOR Health and precision medicine

Developing new methods and software for improving prevention, diagnosis and treatment of disease

Developing new methods and software for improving prevention, diagnosis and treatment of disease

View Publication List
Gradient
Gradient
Gradient

Research Interests

Artificial intelligence and machine learning (AI/ML) tools can work alongside human experts to enable improvements in prevention, diagnosis and treatment of disease. Applications range from drug discovery and improving the effectiveness of clinical trials, to targeted treatment advice and digital health tools for patients and clinicians to better personalise care.

I work with international teams of collaborators in the mathematical/computer sciences and the life sciences to develop AI/ML methods and software for applications in healthcare and precision medicine. Developing machine learning solutions in this domain is challenging due to characteristics of medical datasets such as patient-level data variability, as well as data privacy concerns and a need for personalised, interpretable models that seamlessly integrate into clinical and research workflows.

My research interests include:

Challenges for AI/ML in precision medicine: How can we make AI/ML fit-for-purpose to unlock the benefits of this technology for precision medicine? An example of my work in this area is working with UK Biobank data to develop a new probabilistic machine learning technique for making AI/ML work better for individual patients, by modelling similarities and differences in causal mechanisms of disease. Read more

Deep learning for large-scale biomedical data: I worked with the Institute of Molecular Medicine Finland (FIMM) to develop deep learning methods for Finnish health registry data (nationwide electronic health records for >7 million patients). Our approach improves predictions of various health outcomes and aims to provide explanations of how family factors influence disease, by modelling the effect of family history as a graph representation learning task for time series data. Read more

Building software tools for computational biology: An example of this is my work with the INTERVENE consortium, developing new software tools for genetic risk scoring and synthetic datasets for genetics-based precision medicine applications. Read more

Artificial intelligence and machine learning (AI/ML) tools can work alongside human experts to enable improvements in prevention, diagnosis and treatment of disease. Applications range from drug discovery and improving the effectiveness of clinical trials, to targeted treatment advice and digital health tools for patients and clinicians to better personalise care. I

work with international teams of collaborators in the mathematical/computer sciences and the life sciences to develop AI/ML methods and software for applications in healthcare and precision medicine. Developing machine learning solutions in this domain is challenging due to characteristics of medical datasets such as patient-level data variability, as well as data privacy concerns and a need for personalised, interpretable models that seamlessly integrate into clinical and research workflows.

My research interests include:

Challenges for AI/ML in precision medicine: How can we make AI/ML fit-for-purpose to unlock the benefits of this technology for precision medicine? An example of my work in this area is working with UK Biobank data to develop a new probabilistic machine learning technique for making AI/ML work better for individual patients, by modelling similarities and differences in causal mechanisms of disease. Read more

Deep learning for large-scale biomedical data: I worked with the Institute of Molecular Medicine Finland (FIMM) to develop deep learning methods for Finnish health registry data (nationwide electronic health records for >7 million patients). Our approach improves predictions of various health outcomes and aims to provide explanations of how family factors influence disease, by modelling the effect of family history as a graph representation learning task for time series data. Read more

Building software tools for computational biology: An example of this is my work with the INTERVENE consortium, developing new software tools for genetic risk scoring and synthetic datasets for genetics-based precision medicine applications. Read more

Working in collaboration with

Working in collaboration with

Academic publication list

Academic
publication list

Preprint

Sophie Wharrie, Samuel Kaski, Meta-Learning With Hierarchical Models Based on Similarity of Causal Mechanisms, arXiv preprint, 2024, https://arxiv.org/abs/2310.12595

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Andrea Ganna, Samuel Kaski. (2023). Characterizing personalized effects of family information on disease risk using graph representation learning. Proceedings of the 8th Machine Learning for Healthcare Conference, New York, USA, in Proceedings of Machine Learning Research (PMLR), 219:824-845. https://proceedings.mlr.press/v219/wharrie23a.html

Journal/conference publication

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes, Bioinformatics, Volume 39, Issue 9, September 2023, https://doi.org/10.1093/bioinformatics/btad535

Journal/conference publication

Sophie Wharrie, Lamiae Azizi, Eduardo G. Altmann, Micro-, meso-, macroscales: The effect of triangles on communities in networks, Physical Review E, Volume 100, Issue 2, August 2019, https://link.aps.org/doi/10.1103/PhysRevE.100.022315

Workshop paper

Sophie Wharrie, Zhiyu Yang, Vishnu Raj, Remo Monti, Rahul Gupta, Ying Wang, Alicia Martin, Luke J O’Connor, Samuel Kaski, Pekka Marttinen, Pier Francesco Palamara, Christoph Lippert, Andrea Ganna, HAPNEST: An efficient tool for generating large-scale genetics datasets from limited training data, NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, New Orleans, USA, 2022

Journal/conference publication

Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike O Heyne, Nina Mars, Genes & Health Research Team, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert, Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning. The American Journal of Human Genetics, 2024.

© 2024 Sophie wharrie