MetaCyc annotation of ORFs
======

MetaCyc is a database of metabolic pathways and enzymes.

 - Website: https://metacyc.org/

 - Citation: Caspi R, Billington R, Keseler IM, Kothari A, Krummenacker M,
   Midford PE, Ong WK, Paley S, Subhraveti P, Karp PD. The MetaCyc database
   of metabolic pathways and enzymes-a 2019 update. Nucleic acids research.
   2020 Jan 8;48(D1):D445-53.

MetaCyc release 26 was used as the reference for annotation.

DIAMOND v2.0.15 was used to perform sequence alignment. Commands:

```
diamond makedb --threads # --in db.faa --db db
diamond blastp --threads # --db db --query input.faa --out output.m8 \
  --sensitive --iterate -e 0.001 --top 3
```

Statistics:

 - Total number of matches: 24,013,082
 - Number of annotated ORFs: 21,280,624
 - Number of unique MetaCyc units:
   - Proteins: 12,548
   - GO terms: 5,856
   - Genes: 12,536
   - Enzymatic reactions: 11,558
   - Regulations: 2,577
   - Regulators: 822
   - Reactions: 7,852
   - EC numbers: 3,630
   - Compounds consumed: 5,655
   - Compounds produced: 6,471
   - Compounds: 8,294
   - Compound types: 1,851
   - Pathways: 2,542
   - Pathway types: 613
   - All classes: 10,465
   - Pathway-to-gene_list mappings: 412

Database files:

 - orf-to-protein.map: Mapping of ORFs to MetaCyc proteins. Proteins are the
   entrance to the MetaCyc system.

 - For other files refer to "Typical filename patterns" of ../README.

Collapsing order:

```
              ORF
               v
       go < protein > gene > pathway
               v
regulation < enzrxn
               v
       ec < reaction > compound (left/right) > type
               v
     type < pathway > taxonomic range
               v
         super pathway
               v
              type
```

Notes:

The parameter setting of the DIAMOND search was adopted from eggNOG-mapper:

 - Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J.
   eggNOG-mapper v2: functional annotation, orthology assignments, and domain
   prediction at the metagenomic scale. Molecular biology and evolution. 2021
   Dec;38(12):5825-9.

The official search tool for BioCyc (a superset of MetaCyc) is BLAST, with an
e-value threshold of 10 by default:

 - https://biocyc.org/ECOLI/blast.html?refdb=ALL

The orthologs in BioCyc were determined using bidirectional BLAST, with an
e-value threshold of 0.001, which is consistent with the current DIAMOND
parameter setting:

 - Karp PD, Billington R, Caspi R, Fulcher CA, Latendresse M, Kothari A,
   Keseler IM, Krummenacker M, Midford PE, Ong Q, Ong WK. The BioCyc collection
   of microbial genomes and metabolic pathways. Briefings in bioinformatics.
   2019 Jul;20(4):1085-93.