This data was compiled by Adam Robbins-Pianka, Daniel McDonald, and Greg Caporaso. Questions should be directed to the QIIME Forum (http://forum.qiime.org). This directory contains the IMG amino acid data compiled in a format for convenient use with QIIME 1.5.0-dev (www.qiime.org). This data set should be considered to be in BETA TESTING status. Please post any issues that you notice to the QIIME Forum. Data was compiled from IMG v350 downloaded on March 4, 2012. Files in this directory are as follows: N.B.: These files differ from those in the 7 Oct 2012 data; a bug was discoveredthat resulted in many sequences (aboug half) being left out of aa_sqs.faa and gene_ec.txt. aa_seqs.faa : amino acid sequences for all IMG-annotated genes - derived by exporting all genes with annotated KOs from a custom database created from .genes.faa files. query is as follows: select gene_oid, aa_seq from gene_aa_seqs where gene_oid in (select distinct(gene_oid) from gene_ko) gene_ec_numeric.tsv : semi-colon-separated EC codes associated with all ids in aa_seqs.faa (EC codes in numeric format). EC codes were pulled from the .ko.tab.txt gene_ko_pathway.txt : semi-colon separated KO codes associated with all ids in aa_seqs.faa. gene_ec.txt : semi-colon-separated EC codes associated with all ids in aa_seqs.faa (EC codes in text format). The top three levels of the EC hierarchy were parsed from ftp://ftp.expasy.org/databases/enzyme/enzclass.txt (downloaded Sept 14, 2012), and the most specific level was parsed from ftp://ftp.expasy.org/databases/enzyme/enzyme.dat (downloaded Sept 14, 2012).