Pfam annotation of ORFs ====== Pfam is a database of protein families. - Website: https://pfam.xfam.org/ - Citation: Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, Tosatto SC, Paladin L, Raj S, Richardson LJ, Finn RD. Pfam: The protein families database in 2021. Nucleic acids research. 2021 Jan 8; 49(D1):D412-9. Pfam release 35.0 (2021-11) was used to annotate the ORFs. Specifically, the HMM-based mapping (i.e., Pfam-A.hmm) was used. - Source: http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam35.0/ PfamScan v1.6 was used to perform the annotation. Command: ``` pfam_scan.pl -cpu # -fasta input.faa -dir db > output.tblout ``` Statistics: - Total number of matches: 54,185,026 - Number of annotated ORFs: 35,888,191 - Number of unique Pfams: 13,574 Database files: - orf-to-pfam.map: Mapping of ORF to Pfams. There may be multiple Pfams, sorted by their locations in the ORF. Some Pfams may occur multiple times in an ORF. - orf-to-pfam_uniq.map: Mapping of ORF to Pfams, excluding duplicate Pfams (i.e., same repeat occurs only once). Note: There could still be multiple unique Pfams mapped to one ORF. - pfam_name.txt: Names of Pfams. - pfam_description.txt: Descriptions of Pfams. - pfam_type.txt: Types of Pfams. There are six types: - Family (n=7,788), Domain (5,155), Repeat (451), Coiled-coil (110), Motif (55), Disordered (15). - pfam-to-clan.map: Mapping of Pfams to names, which are higher-order groups. This mapping is unique. - clan_name.txt: Names of clans. - clan_description.txt: Descriptions of clans. - pfam-to-interpro.map: Mapping of Pfams to InterPro entries. - pfam-to-go/: Mappings of Pfams to Gene Ontology (GO) terms. Additional files: - pfamscan.tsv: PfamScan output file, converted from HMMER tblout into TSV format, excluding header. Columns: - Collapsing order: ``` ORF > Pfam > Clan v v name name ```