easymode training data collection
The easymode training data collection currently consists of 4630 tomograms from 63 distinct sources. It covers a wide range of acquisition parameters including pixel sizes, defocus values, doses, sample types and thicknesses. The collection contains data acquired with K2, K3, and Falcon4i detectors, different energy filters, mostly at 300 keV but also some at 200 keV. It spans 25+ species, with the majority being eukaryotic and a large fraction human.
This would not have been possible without EMPIAR and the CryoET Data Portal, from which more than half of the data sources were obtained.
Data availability
Our goal is to share the training data and annotations publicly. For the time being this is somewhat tricky: we have a large number of private contributions that we are not able to release, and we are also not sure how to appropriately re-publish the public data we used with respect to the original authors. For now, the training subtomograms and labels are available upon request.
Contributing
We also accept contributions to the training collection. If you find that some easymode outputs are not good for your data, we would appreciate it if you submitted the challenging tomograms using the easymode report tool. You can do so either via the command line interface or via the Pom data browser.
Acknowledgements
Public datasets
We gratefully acknowledge the following EMPIAR and CryoET Data Portal datasets:
EMPIAR: 10164, 10466, 10491, 10493, 10499, 10988, 10989, 11058, 11078, 11111, 11198, 11538, 11561, 11747, 11830, 11845, 11896, 11897, 11899, 12176, 12425, 12457, 12460, 12612, 13145, 13281, 13289.
CryoET Data Portal: 10004, 10431, 10434, 10440, 10444, 10452, 10455.
And thank the following people for their private data contributions and help:
Tom Dendooven, Alia dos Santos, Piotr Kolata, Alexander Scrutton, Forson Gao, Cong Yu, Paula Paredes Vergara, Kashish Singh, Eric Wang, Andriko von Kรผgelgen, David Barford, Oda Schiรธtz, Sebastian Tacke, Elisa Lisicki, Tatjana Taubitz, Stefan Raunser for contributing data. Thanks to members of the LMB electron microscopy and scientific computing facilities: Shaoxia Chen, Giuseppe Cannone, Grigory Sharov, Anna Yeates, Bilal Ahsan, Haaris Sadari, Jake Grimmett, Toby Darling, and Ivan Clayson for support.
Contact
All data processing, curation, and annotation for this project was done by Mart G. F. Last (mlast@mrc-lmb.cam.ac.uk) at the MRC Laboratory of Molecular Biology, Cambridge, UK, in the group of Matteo Allegretti.