I develop artificial intelligence and machine learning (AI/ML), as well as mathematical and computational technologies, for biology and health. I am a Research Fellow in Biological Data Science at the University of Melbourne, and was previously a Doctoral Researcher at the Finnish Center for Artificial Intelligence (FCAI).

About Me
About Me

RESEARCH

AI in biotech & healthtech

My research develops AI/ML and mathematical and computational technologies for data-driven innovation in biology and health

My research develops AI/ML and mathematical and computational technologies for data-driven innovation in biology and health

Learn More
Learn More

research highlights

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

Machine Learning for Healthcare Conference (MLHC)

Keywords

graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease


Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

Machine Learning for Healthcare Conference (MLHC)

Keywords

graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease


Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More

Published in

BIOINFORMATICS

Keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More

Published in

BIOINFORMATICS

Keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

BLOG

© 2025 Sophie wharrie

research highlights

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

Machine Learning for Healthcare Conference (MLHC)

Keywords

graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease


Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More

Published in

BIOINFORMATICS

Keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

research highlights

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

Machine Learning for Healthcare Conference (MLHC)

Keywords

graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease


Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More

Published in

BIOINFORMATICS

Keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

researcH highlights

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Technical

keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Published in

Machine Learning for Healthcare Conference, PMLR 2023

Technical

keywords

graph neural networks, geometric deep learning, deep learning for time series data, electronic health records, genetics, familial factors of disease

Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

BIOINFORMATICS

Technical

keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More

researcH highlights

Improving GeneraliSability of Health Prediction Models

Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.

Learn More

Published in

Preprint, Under review

Technical

keywords

Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL

Published in

Machine Learning for Healthcare Conference, PMLR 2023

Technical

keywords

graph neural networks, geometric deep learning, deep learning for time series data, electronic health records, genetics, familial factors of disease

Modeling disease risk in families with graph neural networks

Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.

Learn More

Published in

BIOINFORMATICS

Technical

keywords

computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring

Synthetic data FOR GENETICS RESEARCH

HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.

Learn More