MetaCyc annotation of ORFs ====== MetaCyc is a database of metabolic pathways and enzymes. - Website: https://metacyc.org/ - Citation: Caspi R, Billington R, Keseler IM, Kothari A, Krummenacker M, Midford PE, Ong WK, Paley S, Subhraveti P, Karp PD. The MetaCyc database of metabolic pathways and enzymes-a 2019 update. Nucleic acids research. 2020 Jan 8;48(D1):D445-53. MetaCyc release 26 was used as the reference for annotation. DIAMOND v2.0.15 was used to perform sequence alignment. Commands: ``` diamond makedb --threads # --in db.faa --db db diamond blastp --threads # --db db --query input.faa --out output.m8 \ --sensitive --iterate -e 0.001 --top 3 ``` Statistics: - Total number of matches: 24,013,082 - Number of annotated ORFs: 21,280,624 - Number of unique MetaCyc units: - Proteins: 12,548 - GO terms: 5,856 - Genes: 12,536 - Enzymatic reactions: 11,558 - Regulations: 2,577 - Regulators: 822 - Reactions: 7,852 - EC numbers: 3,630 - Compounds consumed: 5,655 - Compounds produced: 6,471 - Compounds: 8,294 - Compound types: 1,851 - Pathways: 2,542 - Pathway types: 613 - All classes: 10,465 - Pathway-to-gene_list mappings: 412 Database files: - orf-to-protein.map: Mapping of ORFs to MetaCyc proteins. Proteins are the entrance to the MetaCyc system. - For other files refer to "Typical filename patterns" of ../README. Collapsing order: ``` ORF v go < protein > gene > pathway v regulation < enzrxn v ec < reaction > compound (left/right) > type v type < pathway > taxonomic range v super pathway v type ``` Notes: The parameter setting of the DIAMOND search was adopted from eggNOG-mapper: - Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Molecular biology and evolution. 2021 Dec;38(12):5825-9. The official search tool for BioCyc (a superset of MetaCyc) is BLAST, with an e-value threshold of 10 by default: - https://biocyc.org/ECOLI/blast.html?refdb=ALL The orthologs in BioCyc were determined using bidirectional BLAST, with an e-value threshold of 0.001, which is consistent with the current DIAMOND parameter setting: - Karp PD, Billington R, Caspi R, Fulcher CA, Latendresse M, Kothari A, Keseler IM, Krummenacker M, Midford PE, Ong Q, Ong WK. The BioCyc collection of microbial genomes and metabolic pathways. Briefings in bioinformatics. 2019 Jul;20(4):1085-93.