Open research projects
The projects listed below are part of the upcoming call for applications, opening on August 19. Additional projects may be added throughout the call period.
Abstract
Adaptive Learning Dynamics of the Immune System
Abstract
The goal of this project is to invert the paradigm of applying extrinsic AI/ML algorithms to practically understand biological data. Instead, we want to study how instrinsic biological learning functions in the immune system work by employing recent theoretical understanding about dynamics of artificial neural networks. The main setting we are going to investigate is the immune system. There has been recent theoretical progress suggesting that self-adaptation capabilities could be key to enhancing its function considerably. Yet, this requires a better understanding of the main mechanisms via a combination of adaptive network models in combination with statistical data assimilation techniques of macroscopic observables. In this project we aim to lay substantial groundwork for this program combining methods from biology, computation, data science, dynamics, and machine learning. We are going to focus on building benchmark models for this context validating them via simulation with forward uncertainty propagation and bifurcation analysis.
Adaptive Learning Dynamics of the Immune System
Domain: Medicine & Health / Life Sciences
Supervisors: Christian Kuehn, TUM, Fabian Theis, Helmholtz Munich/TUM
Abstract
AI-Guided Design of Disordered Protein Regions
Abstract
Intrinsically disordered protein regions (IDRs) are widespread in the human proteome but defy the classical structure–function paradigm. IDRs often engage in regulatory interactions with structured protein domains, where their dynamic interfaces can modulate access to functional sites, regulate enzymatic activity, or mediate complex assembly. The goal of this project is to develop novel computational strategies for designing IDRs that dynamically interact with structured protein domains across kinetic and thermodynamic ranges. To this end, the project will explore combinations of state-of-the-art generative models to design IDR sequences with tunable regulatory properties. Experimental validation will be performed using biochemical assays and solution NMR spectroscopy to assess binding affinity, kinetics, and specificity of the designed IDRs. These experiments will be primarily carried out by a dedicated postdoc in close collaboration with the PhD candidate. The doctoral candidate will focus on computational model development, while integrating experimental feedback to iteratively improve model performance in a “lab-in-the-loop” workflow.
AI-Guided Design of Disordered Protein Regions
Domain: Medicine & Health / Life Sciences
Supervisors: Iva Pritisanac, Helmholtz Munich, Thomas Reid Alderson, Helmholtz Munich
Abstract
Closed-loop dynamical control of brain activation patterns in mice and neuronal organoids
Abstract
This project will develop a predictive model of brain activation patterns using calcium imaging and electrophysiology data from mouse cortex and human organoids. Combining global pharmacological modulation with precise sensory and optogenetic stimulation will enhance the contrastive learning framework for dynamics identification developed by the Schneider Laboratory to recognize increasingly complex neural patterns (goal 1) to then deploy a predictive model for closed-loop control of targeted brain regions (goal 2). The study will validate the model’s generalizability while developing applications for memory restoration through pattern-specific interventions. The methodology parallels clinical neuroimaging approaches (fMRI/TMS), with the potential for advancing personalized neuromodulation therapies. Additionally, findings will inform human organoid engineering for tissue replacement and brain-machine interface applications.
Closed-loop dynamical control of brain activation patterns in mice and neuronal organoids
Domain: Medicine & Health / Life Sciences
Supervisors: Steffen Schneider, Helmholtz Munich, Gil Westmeyer, Helmholtz Munich/TUM
Abstract
Statistical Methods for Multi-Modal Data Analysis in Human Disease Research
Abstract
Non-communicable diseases (NCDs) are responsible for 70% of global deaths, with cardiovascular diseases, cancers, respiratory diseases, and diabetes being the most prevalent. The growing burden of these diseases necessitates a shift from reactive treatment to predictive and preventive healthcare. This PhD research aims to develop novel statistical methodologies to analyze multi-modal longitudinal omics data, integrating infrared (IR) molecular fingerprinting, mass spectrometry (MS)-based proteomics, and nuclear magnetic resonance (NMR)-based metabolomics. Using data from the Health for Hungary (H4H) and German National Cohort (NAKO) studies, this research will focus on statistical trajectory modeling and machine learning to detect early disease markers and predict disease onset. The objectives include designing statistical models for disease progression, integrating multi-modal data sources, applying machine learning algorithms for disease prediction, optimizing statistical study design, and validating findings with independent datasets. The methodology involves, among others, mixed-effects models and functional data analysis for trajectory modeling. Multi-modal data integration will leverage dimension reduction techniques. Machine learning approaches such as ensemble learning, and interpretable AI methods will enhance predictive modeling. Statistical study design optimization will include sample size determination, missing data imputation, and cross-validation strategies. This research is expected to contribute novel statistical frameworks for disease trajectory analysis, improve predictive modeling of disease progression, and establish scalable methodologies for large-scale health studies. The outcomes will support early intervention strategies, enhance personalized healthcare, and contribute to global efforts in preventive medicine.
Statistical Methods for Multi-Modal Data Analysis in Human Disease Research
Domain: Life Sciences
Supervisors: Göran Kauermann, LMU, Ferenc Krausz & Kosmas Kepesidis, LMU, Annette Peters, Helmholtz Munich
Abstract
Uncovering the mechanisms of lung remodeling following acute respiratory lung infection using spatial transcriptomics
Abstract
Lung anatomical structures are often severely damaged following an acute respiratory infection, sometimes leading to death in the most extreme cases. Yet, remarkably, the lung demonstrates a significant capacity for regeneration and repair. The mechanisms underlying this resilience remain poorly understood. In this project, we aim to develop computational models that integrate spatial, temporal, and perturbation data to better understand how the lung undergoes remodeling. Situated at the intersection of computational biology, clinical practice, and pathology, this project holds the potential to uncover novel therapeutic strategies to promote lung repair.
Uncovering the mechanisms of lung remodeling following acute respiratory lung infection using spatial transcriptomics
Domain: Medicine & Health / Life Sciences
Supervisors: Malte Lücken, Helmholtz Munich, Emmanuel Saliba, HIRI
Abstract
Multimodal AI Models for Patient Stratification and Prognostic Biomarkers in Osteoarthritis
Abstract
Osteoarthritis (OA) affects over 500 million people worldwide, yet the development of disease-modifying treatments remains a major challenge due to the condition’s heterogeneity, slow progression, and complex regulatory landscape. Within the framework of the PROBE consortium, this PhD project aims to harness advanced artificial intelligence (AI) and multimodal data integration to improve patient stratification, prognosis, and the identification of novel endpoints for OA clinical trials. The candidate will analyze deeply phenotyped datasets spanning clinical, imaging, and multi-omics modalities, leveraging a secure federated data platform developed within PROBE. Using methods ranging from advanced probabilistic models such as multi-omics factor analysis to multi-modal AI models trained with contrastive learning, the project will generate latent representations that act as composite biomarkers, capturing the complexity of OA progression. These representations will enable the unsupervised identification of patient subgroups, prediction of disease trajectories, and translation into scalable proxy biomarkers for larger cohorts. Close collaboration with consortium partners and iterative alignment with uni-modal foundation models developed in other PROBE workpackages will ensure methodological robustness and clinical relevance. Ultimately, this project will deliver both methodological advances in AI for multimodal biomedical data and translational insights to guide therapeutic development, supporting more effective and patient-centered treatment strategies for OA.
Multimodal AI Models for Patient Stratification and Prognostic Biomarkers in Osteoarthritis
Domain: Life Sciences
Supervisors: Matthias Heinig, Helmholtz Munich, Elefteria Zeggini, Helmholtz Munich
Abstract
Decoding and targeting the PDAC ecosystem DEFEAT-PDAC
Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest and most therapy-resistant cancers, characterized by late detection, rapid metastasis, and poor response to conventional treatments. Standard chemotherapy has remained largely unchanged over the past three decades, offering only marginal survival benefits (6–12 months) and causing severe side effects. Most patients are ineligible for surgery, and recurrence is almost inevitable. While immunotherapies and targeted treatments have advanced outcomes for other cancers, they have shown limited success in PDAC due to its complex and adaptable tumor ecosystem, composed of cancer cells, fibroblasts, and immune cells that collectively promote resistance and aggression. However, recent scientific and technological breakthroughs offer renewed hope. Innovations such as AI-guided drug discovery, protein and cell engineering, and single-cell analytics are opening new therapeutic avenues. Notably, the development of RAS inhibitors, once considered undruggable, and evidence linking T cell activity to long-term survival suggest promising immunotherapeutic directions. Nonetheless, these approaches have so far benefitted only a small subset of patients, underscoring the need for deeper mechanistic insights and individualized strategies.
Decoding and targeting the PDAC ecosystem DEFEAT-PDAC
Domain: Medicine & Health
Supervisors: Fabian Theis, Helmholtz Munich/TUM, Dieter Saur, TUM
Abstract
Deep Learning Integration of Genomic Sequences, Transcriptomics and Interaction Networks for Phenotype Prediction in Eukaryotes - PhenoPred
Predicting phenotypes from genotypes is a grand challenge of biology with substantial translational implications. Using deep learning, we have already made significant progress in capturing genotype-phenotype relationships in prokaryotes. Due to the unique amount of phenotypic and molecular data for essentially all molecular modalities, S. cerevisiae is the ideal model to translate these achievements to eukaryotes. We propose to develop and validate a multi-modal deep learning framework that builds on genomic sequences, transcriptomes, environmental parameters, and regulatory and physical interaction networks to predict growth phenotypes and cell cycle properties. Powerful, AI-ready datasets, e.g. describing 200 distinct phenotypes for 4,000 deletion mutants and unpublished annotated time-lapse imaging data, enable model training and validation. For robust integration across molecular modalities, we will develop new deep learning architectures for complex multimodal data, including hierarchical attention networks to capture relationships in molecular networks. By integrating foundation models pre-trained on large datasets, we will obtain immediate insights into yeast phenotypes and establish a scalable foundation for broader applications. Beyond immediate applications in infection research and biotechnology, our model will set the stage for future expansion towards human cells using transfer learning approaches and subsequent integrative models towards predictive medicine.
Deep Learning Integration of Genomic Sequences, Transcriptomics and Interaction Networks for Phenotype Prediction in Eukaryotes - PhenoPred
Domain: Life Sciences
Supervisors: Pascal Falter-Braun, Helmholtz Munich/LMU, Kurt Schmoller, Helmholtz Munich
Abstract
Genomic traffic control – mapping and preventing transcription-replication conflicts in cancer genomes
Cells must copy their DNA while simultaneously transcribing it, creating a fundamental scheduling problem: replication and transcription use large, processive molecular machines that often compete for access to the same template. When these machineries collide, transcription–replication conflicts (TRCs) can arise, contributing to replication stress and DNA damage, which can result in diseases like cancer. Yet we still lack genome-wide methods to map and predict where such conflicts occur.
This project takes a computational approach to investigate TRCs. We will integrate diverse multi-omic datasets on transcriptional activity and replication dynamics to build a predictive “roadmap” of how these processes interact. In parallel, we will generate ground-truth data with a novel protocol based on Nanopore long-read sequencing technology called TRC-seq, for which we will develop new computational tools for their analysis.
Our core innovation is a hybrid framework: an agent-based simulator of replisome–RNA polymerase II (RNAPII) dynamics coupled with machine learning models trained on genomic sequence and multi-omic tracks. By iteratively fitting simulations to experimental data through inference techniques, we will obtain both mechanistic insight and predictive power. Finally, applying these models to cancer genomes will reveal how TRCs drive mutational patterns and epigenomic alterations.
Beyond the biological problem, this project addresses major computational challenges in modeling stochastic genome-scale processes, integrating heterogeneous omics data, and analyzing single-molecule long-read sequencing. It aims to establish generalizable methods at the interface of computational biology, machine learning, and genome science.
Genomic traffic control – mapping and preventing transcription-replication conflicts in cancer genomes
Domain: Life Sciences
Supervisors: Antonio Scialdone, Helmholtz Munich/LMU, Stephan Hamperl, Helmholtz Munich
Abstract
Towards a multiscale tissue foundation model
The rapid decrease in genomic sequencing costs has led to an abundance of molecular data, including omics profiles and spatial omics. This is often matched with histopathological slides and in parts coarser grained medical imaging of related organ tissue. Traditional modelling methods struggle to capture the complexity of these biological systems, prompting interest in developing multimodal foundation models that integrate diverse data types to tackle broad biomedical challenges. We have recently developed such models on the cellular and local tissue level. Here we propose to combine expertise on AI models on the cellular level with the larger scale of histological and organ architecture data. We aim to build a multi-scale foundation model that integrates molecular, spatial omics, histopathology, and imaging data to offer a holistic view of cellular processes and tissue microenvironments. Leveraging in house and public data with >1 billion imaging readouts, >150 million omics profiles, 100,000+ H&E-stained images and MR images of tissue microstructure, combined with scalable computing resources, we aim to build such a model to advance biological understanding and health research. Initially we will start with a smaller scale integration to combine data across scales along certain bimodal samples that allow imputation (‘bridging’) of modalities. Later we propose to follow up across the large sample base towards a foundational embedding. The resulting model is not only exciting for application reasons but allows us to study complex computational challenges such as how to model partially observed
modalities, to do robust imputation including uncertainty mapping and how to bridge scales using approaches from weak supervision and model distillation.
Towards a multiscale tissue foundation model
Domain: Life Sciences
Supervisors: Fabian Theis, Helmholtz Munich, Daniel Rückert, TUM
Abstract
AI Augmented Target Discovery in Translational Kidney Disease
We propose that artificial intelligence (AI) models trained on large-scale datasets integrating genetic and compound perturbations in kidney organoids—analyzed through single-cell RNA sequencing and complementary modalities such as imaging—can identify perturbations that shift disease phenotypes toward healthy states. This approach aims to advance our understanding of kidney disease mechanisms and support the design of next-generation organoid models. Our broader goal is to establish a comprehensive translational kidney disease atlas that connects organoid and mouse model data with human disease biology, providing a shared resource for both experimental and computational researchers. In this project, we aim to develop and refine AI models capable of guiding therapeutic discovery by predicting interventions that restore cellular health.
AI Augmented Target Discovery in Translational Kidney Disease
Domain: Medicine & Health
Supervisors: Fabian Theis, Helmholtz Munich, Malte Lücken, Helmholtz Munich
Abstract
AI-driven End-to-End Pipeline for Membrane Protein Structure Determination by Cryo-ET
Cryo-electron tomography (cryo-ET) enables visualization of macromolecular structures in their native cellular environment, a key goal in structural biology and pharmacology. However, high-resolution structures of membrane proteins are rare due to the manual effort needed to identify, align, and average low-contrast particles. MemBrain-structure combines AI from Helmholtz Munich with cryo-ET development at MDC Berlin to create the first generalizable, end-to-end pipeline converting raw tilt-series into membrane protein structures. We will generate training data and develop a supervised, then generalizable, model for detecting and aligning subtomograms. The model will function standalone and integrate into TomoBEAR, an established cryo-ET processing workflow. All data, models, and documentation will be open-source, enabling broad community adoption. This workflow will streamline structure determination of key membrane proteins and enhance our understanding of disease-relevant biological processes.
MemBrain-structure: AI-driven End-to-End Pipeline for Membrane Protein Structure Determination by Cryo-ET
Domain: Life Sciences
Supervisors: Tingying Peng, Helmholtz Munich, Mikhail Kudryashev, Max Delbrück Center for Molecular Medicine
Abstract
Deep Learning the Genetic Code: AI-Driven Design of Cell-Type Specific Gene Therapies
This doctoral project offers an unparalleled opportunity at the cutting edge of Artificial Intelligence and Genomic Medicine, focused on designing next-generation, cell-typespecific gene therapies. The core challenge is to computationally decipher and engineer the complex genetic "regulatory code"—the sequences (promoters, enhancers, UTRs) that dictate when and where a synthetic gene is expressed, a crucial step for therapeutic efficacy and safety. The successful candidate will join a unique, high-value collaboration between two leading institutions: the methodological expertise of Helmholtz Center Munich (HMGU) in developing deep learning models for biological sequences, and the translational power of Roche in Genomic Medicine and clinical development. The PhD candidate will develop and adapt cutting-edge Deep Learning and DNA language models (such as Borzoi and Enformer) to design and optimize recombinant adeno-associated virus (rAAV) payloads for precise expression. A key benefit for the student is direct access to Roche’s extensive, proprietary omics data and advanced synthetic biology screening platforms, providing real-world feedback to fuel a design-build-test-learn cycle. The direct link to clinical development offers a profound sense of purpose: successful designs have a clear path toward the clinic, allowing your research to make a tangible difference for patients suffering from previously untreatable diseases. This position trains a future leader at the intersection of data science, synthetic biology, and translational medicine, placing them at the forefront of AI-guided therapeutic development.
Deep Learning the Genetic Code: AI-Driven Design of Cell-Type Specific Gene
Therapies
Domain: Medicine & Health
Supervisors: Johannes Lindner/Julien Gagneur, TUM, Simon Ausländer, Fabian Schmich, Roche
