|
Description
|
Microbial communities play key roles across the oil supply chain, mediating both detrimental processes—such as microbial influenced corrosion and biological souring—and beneficial applications like bioremediation and wastewater treatments. However, the genomic diversity and functional potential of these microorganisms remain fragmented across datasets, limiting comparative and integrative analyses and hindering biotechnological exploration. Here, we present the Petroleum-associated Genome Database (PaGeD), an open-source platform for the exploration of taxonomic and functional diversity in oil-associated microbiomes. PaGeD compiles 3,334 curated prokaryotic genomes retrieved from oil reservoirs, produced waters, hydrocarbon-polluted sites, and related environments worldwide. Genomes were clustered into 2,522 species-level units, over half of which correspond to previously undescribed taxa, highlighting the underexplored microbial diversity in petroleum systems. To demonstrate the usefulness of PaGeD as a powerful tool to unravelling specific metabolic traits of oil-associated microbes, as well as their geographical distribution patterns and phylogenetic affiliation, a curated set of 144 KEGG functional ortholog groups related to hydrocarbon degradation was used to profile the hydrocarbon degradation potential encoded in these genomes. Hydrocarbon degradation central pathways (e.g., catechol and benzoyl-CoA degradation) were broadly distributed across habitats and phylogenetic groups, whereas peripheral pathways were restricted to some taxa. Co-occurrence network analysis revealed a modular architecture, with generalist genomes and core genes forming a central hub, while rare genes and specialized degraders occupied peripheral positions. Despite the geographic and ecological heterogeneity of the samples, functional profiles were largely conserved, suggesting convergent evolution driven by hydrocarbon exposure. PaGeD provides a scalable framework to identify potential keystone taxa and functional markers in petroleum environments. Future versions will incorporate additional functional traits of high industrial relevance, such as genes related to biosurfactant production, biofilm formation, corrosion potential, halotolerance, heavy metal and pressure resistance, genetic mobile elements, and biosynthetic gene clusters. These additions will further strengthen PaGeD’s usefulness for both basic research and industrial applications. PaGeD is freely available at (https://paged.cpqba.unicamp.br/).
|