Refined data release for the WoL project ====== Last updated: April 20, 2020 This is a refined version of the original data release of the "Web of Life" (WoL) project, which is available from the Globus endpoint "WebOfLife". This directory hosts necessary data files for utilizing WoL in microbiome data analysis. They include reference genome sequences, coordinates of protein- coding genes, phylogenetic trees, taxonomic and functional annotations. ## Files and directories - trees/tree.nwk: Reference phylogeny of 10,575 microbial genomes. - genomes/concat.fna.xz: Concatenated DNA sequences of the reference genomes. - proteins/coords.txt.xz: Coordinates of protein-coding genes on the genomes. - taxonomy: Taxonomic classification of the genomes, including NCBI and GTDB, original and curated based on the tree. - function: Functional annotation of the proteins, including UniRef, MetaCyc, GO and others. ## Instructions Using the provided genome sequences, one may build reference databases for specific metagenomics tools, which can then be used to analyze microbiome data. For example, the following commands build a Bowtie2 index of the genomes, which should resemble the pre-built Bowtie2 index provided in the full data release: ``` mkdir -p databases/bowtie2 xzcat genomes/concat.fna.xz > /tmp/input.fna bowtie2-build --seed 42 --threads 16 /tmp/input.fna databases/bowtie2/WoLr1 rm /tmp/input.fna ``` Once the database is built, one can run Bowtie2 to align input sequencing data to the reference genomes. For example, the following commands run Bowtie2 with parameters optimized for shotgun metagenomic data, as suggested in the SHOGUN pipeline: ``` bowtie2 -p 16 -x databases/bowtie2/WoLr1 -f input.fa -S output.sam \ --very-sensitive -k 16 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" \ --score-min "L,0,-0.05" --no-head --no-unal --seed 42 ``` The resulting alignment file can be further analyzed with the reference tree, the taxonomic and functional annotations, using the program Woltka. A tutorial is included in the Woltka website: - https://github.com/qiyunzhu/woltka ## Contact - Project leader: Dr. Qiyun Zhu (qiyun.zhu@asu.edu) - Senior PI: Dr. Rob Knight (robknight@ucsd.edu) - Knight Lab Departments of Pediatrics University of California San Diego 9500 Gilman Drive, MC 0763 La Jolla, CA 92093-0763 USA Tel: (858) 822-2379 Fax: (858) 246-1981