Cryo-EM and Protein Sequence Representation Learning

Project overview

Investigated multimodal representation learning between Cryo-EM density maps and protein sequences. The goal was to study whether 3D structural density information and amino-acid sequence information could be aligned in a shared latent space.

Problem

Cross-modal alignment between structural density and sequence modality is hard under limited labeled pairs.

Data modalities

Cryo-EM density maps and protein sequences.

Approaches tried

Contrastive learning baseline and JEPA-style latent prediction.

Findings

Dataset-size limitations, plus representation collapse and overfitting diagnostics, were central to model selection and regularization choices.

Links

PDF (placeholder URL) · Code (placeholder URL)