DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

1ShanghaiTech University 2Cellverse Co., Ltd
*Indicates Equal Contribution, Indicates Corresponding Author
MY ALT TEXT For pre-training, we construct a large-scale curated dataset con- taining 529 types of protein data with over 270,000 cryo-EM movies or micrographs. Based on this, we present DRACO, a denoising-reconstruction autoencoder for cryo-EM. A pre-trained DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation for various downstream model adaptions such as micrograph curation and particle picking.

Abstract

Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO's pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines. We will release the code, pre-trained models, and the curated dataset to stimulate further research.

MY ALT TEXT We visualize the denoising results of DRACO and state-of-the-art baselines. Our results show the most significant SNR improve- ment without the loss of the particle structure details. In contrast, Low-pass leads to a severe blur on particles, MAE introduces severe patch-wise artifacts and Topaz only shows either minor SNR improvements or blurred results.
MY ALT TEXT We show the picking results of DRACO and baselines on the test datasets range from small transport proteins to huge ribosomes. Blue, red, and yellow circles denote true positives, false positives, and false negatives, respectively.
MY ALT TEXT Denoising cryo-ET HIV tilt series with DRACO. Figure a and b show the HIV tilt series before and after DRACO's denoising process. Using IMOD, we reconstruct 3D volumes of HIV from both the original and denoised series, showing their slice in Figures c and d. Note the horizontal stripes in these images, which are artifacts due to the missing wedge issue in cryo-ET. Figure e shows a denoised slice from Figure c by DRACO.

Video Presentation (Comming Soon)

Poster (Comming Soon)

BibTeX


@inproceedings{shen2024draco,
  title={Draco: Denoising Reconstruction Autoencoder for CryO-EM},
  author={Shen, Yingjun and Dai, Haizhao and Chen, Qihe and Zeng, Yan and Zhang, Jiakai and Pei, Yuan and Yu, Jingyi},
  booktitle={Proceedings of the 38th International Conference on Neural Information Processing Systems},
  year={2024}
}