Cryo-EM and Protein Sequence Representation Learning
Project overview
Investigated multimodal representation learning between Cryo-EM density maps and protein sequences. The goal was to study whether 3D structural density information and amino-acid sequence information could be aligned in a shared latent space.
Problem
Cross-modal alignment between structural density and sequence modality is hard under limited labeled pairs.
Data modalities
Cryo-EM density maps and protein sequences.
Approaches tried
Contrastive learning baseline and JEPA-style latent prediction.
Findings
Dataset-size limitations, plus representation collapse and overfitting diagnostics, were central to model selection and regularization choices.