Abstract
The transition from anaerobic to aerobic life was a pivotal adaptation in Earth’s history, yet the timing and genomic drivers remain poorly resolved. Traditional approaches relying on oxygen-utilizing genes need improvement for obligate anaerobes and fragmentary environmental genomes, where gene absence may reflect poor assembly rather than phenotype. We developed a machine learning model (GBDT40-LR) to predict microbial oxygen requirements using 40 broadly conserved genes, 35 without direct oxygen roles. This approach overcomes incompleteness biases in environmental genomes. Applied to 80,787 bacterial genomes [including metagenome-derived assemblies (MAGs)], the model classified 42,014 aerobes and 38,775 anaerobes, enabling large-scale ancestral reconstruction. Molecular clock dating indicates an emergence of aerobic bacterium prior to the Great Oxidation Event (GOE, 2.5 to 2.3 Ga), likely around ~2.7 Ga. Aerobic lineages subsequently diversified during the GOE and Neoproterozoic Oxygenation Event (NOE, 0.8 to 0.55 Ga), with persistent anaerobe diversity across Earth’s oxygenation. This establishes that aerobic bacteria originated planetary oxygenation, potentially by 200 to 400 My, providing insights into phenotypic evolution and prolonged anaerobe–aerobe coexistence.
| Original language | English |
|---|---|
| Article number | e2515709123 |
| Number of pages | 11 |
| Journal | Proceedings of the National Academy of Sciences |
| Volume | 123 |
| Issue number | 4 |
| Early online date | 20 Jan 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 20 Jan 2026 |
Keywords
- Machine learning
- Phenotype prediction
- Oxygen requirement
- Molecular clock