Web of Life, release 2.0 (WoLr2) ====== Last updated: Feb 12, 2023 The WoL ("Web of Life") project aims to provide curated catalogs of microbial genomes and genes with phylogenetic trees, taxonomic hierarchies and functional annotations. WoLr2 is a significant upgrade from WoLr1. A collection of 15,953 bacterial and archaeal genomes is an even representation of microbial diversity, sampled from non-redundant genomes hosted at NCBI (RefSeq and GenBank, complete and draft). A high-quality reference phylogeny was reconstructed using the uDance workflow. Taxonomy was curated according to phylogeny based on GTDB (default) and NCBI. Mapping to the Greengenes2 taxonomy is provided. Functional annotations of the protein-coding genes following UniRef, GO, EggNOG, Pfam KEGG, and MetaCyc are available. - Citation: Zhu Q, Mai U, Pfeiffer W, et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat Commun. 2019. 10(1):5477. doi: 10.1038/s41467-019-13443-4. - A manuscript describing uDance and the phylogeny is currently under review. Statistics: - Number of genomes: 15,953 - Total length (bp): 48,809,171,826 - Numbers of taxonomic units: - Domains: 2 (Bacteria and Archaea) - Phyla: 124 - Classes: 321 - Orders: 914 - Families: 2,057 - Genera: 6,811 - Species: 12,258 Files and directories: - wol2sop.sh: Shell script automating the standard WoL2 + Woltka workflow for microbiome data analysis. - Woltka is hosted at: https://github.com/qiyunzhu/woltka - genomes/: DNA sequences and metadata of the 15,953 bacterial and archaeal genomes. - phylogeny/: Phylogenetic tree of the genomes inferred using uDance based on. 380 marker genes. - taxonomy/: Taxonomic classification of the genomes following GTDB (default) and NCBI systems, and curated according to the phylogeny. Also provided is a mapping to the Greengenes2 taxonomy. - rrnas/: Ribosomal RNA (rRNA) genes identified from the genomes. - proteins/: Protein-coding genes identified from the genomes. - function/: Functional annotations of the protein sequences. - databases/: Pre-compiled databases for multiple bioinformatics programs to accomplish microbiome data analyses. Contact: - Project leader: Dr. Qiyun Zhu (qiyun.zhu@asu.edu) - Regarding phylogeny: Dr. Siavash Mirarab (smirarabbaygi@eng.ucsd.edu) - Senior PI: Dr. Rob Knight (robknight@eng.ucsd.edu) - Knight Lab Departments of Pediatrics University of California San Diego 9500 Gilman Drive, MC 0763 La Jolla, CA 92093-0763 USA Tel: (858) 822-2379 Fax: (858) 246-1981