eggNOG annotation of ORFs ====== eggNOG is a database of gene orthology and functional annotations. - Website: http://eggnog5.embl.de/ - Citation: Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research. 2019 Jan 8;47(D1):D309-14. eggNOG release 5.0 was used as the reference for annotation. - Source: http://eggnog5.embl.de/download/eggnog_5.0/ eggNOG-mapper v2.1.7 was used to perform annotation. Command: ``` emapper.py --cpu # -i input.faa -o output ``` Statistics: - Total number of matches: 119,723,081 - Number of annotated ORFs: 41,602,267 - Number of matched seed orthologs: 12,449,486 - Number of matched OGs (narrowest): 1,794,179 - Number of matched OGs (all): 2,704,875 Database files: - orf-to-seed.map: Mapping of ORFs to seed orthologs. It is a unique mapping. The seed orthologs were used to derive other annotation fields such as OGs. - seed-to-og.map: Mapping of seed orthologs to Orthologous Groups (OGs). OGs are the basic units of the eggNOG database. Only the narrowest (i.e., most derived, lowest) OGs in the taxonomic lineage are reported. Note: one seed may be mapped to multiple narrowest OGs (although rare). - seed-to-og_all.map: Mapping of seed orthologs to OGs of all taxonomic ranks. - seed-to-category.map: Mapping of seed orthologs to COG categories (single letters). - seed_description.txt: Description of each seed. - seed-to-gene.map: Mapping of seed orthologs to gene names ("Preferred name"). It is a unique mapping. - og-to-category.map: Mapping of OGs to COG categories. Note: one OG may be mapped to multiple categories. - og_taxid.txt: NCBI TaxID of each OG. Note: if one OG is mapped to root (1) and another TaxID (usually 2 or 2157), the latter is kept. - og_description.txt: Description of each OG. Note: Same as above. - cog_category.txt: Definition of COG categories. - Source: https://www.ncbi.nlm.nih.gov/research/cog/ - idmaps/: Mapping of seed orthologs to external databases, including: - GOs, EC, KEGG_ko, KEGG_Pathway, KEGG_Module, KEGG_Reaction, KEGG_rclass, BRITE, KEGG_TC, CAZy, BiGG_Reaction, PFAMs Collapsing order: ``` ORF > seed > OG > category v category ``` Raw eggNOG-mapper output files: - emapper.hits: DIAMOND hit table (-outfmt 6). - emapper.seed_orthologs: Mapping to seed orthologs. Columns: - qseqid, sseqid, evalue, bitscore, qstart, qend, sstart, send, pident, qcov, scov - emapper.annotations: Annotation results. Columns: - query, seed_ortholog, evalue, score, eggNOG_OGs, max_annot_lvl, COG_ category, Description, Preferred_name, GOs, EC, KEGG_ko, KEGG_Pathway, KEGG_Module, KEGG_Reaction, KEGG_rclass, BRITE, KEGG_TC, CAZy, BiGG_ Reaction, PFAMs