qMSA: quadruple multiple sequence alignment generation == Introduction == This is the package for qMSA (main program: script/qMSA.py), which is an extension of DeepMSA (main program: script/build_MSA.py) for MSA generation in structure prediction tasks. qMSA is based on a different 4-stage algorithm instead of the 3-stage algorithm in DeepMSA. == Example usage == scripts/qMSA.py \ -hhblitsdb=/nfs/amino-projects/zhanglabs/seqdb/uniclust/UniRef30_2020_01 \ -jackhmmerdb=/nfs/amino-projects/zhanglabs/seqdb/uniref90/uniref90.clean.fasta \ -hhblits3db=/scratch/aminoproject_fluxoe/zhanglabs/seqdb/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ -hmmsearchdb=/nfs/amino-projects/zhanglabs/seqdb/mgnify/mgy_clusters.clean.fasta:/nfs/amino-projects/zhanglabs/seqdb/JGI/IMGVR/linclust.fasta:/nfs/amino-projects/zhanglabs/seqdb/tsa/curated/cdhit.fasta \ -tmpdir=/tmp/$USER/$tag \ seq.fasta Here, -tmpdir is used to specify the temporary folder. In the above example, $tag can be the name of your protein. -hhblitsdb must point to a hhsuite2 format database. As of uniclust30 (aka UniRef30) 2019_11 or later, Soeding lab has stop supporting hhsuite2 in uniclust. To convert the new hhsuite3 format uniclust30 database to old hhsuite2 format, you can use scripts/hhblitsdb3to2.py. Note that it is not always possible to perfectly convert hhsuite3 database to hhsuite2 format due to difference in sequence length limitation. Therefore, some query that hit huge protein in the database will cause hhblits2 error. That is why qMSA implements a backup hhblits3 subroutine for searching -hhblitsdb when hhblits2 fails. -hhblits3db can be any hhsuite3 database. In this case, it is bfd (https://bfd.mmseqs.com/). bfd = metaclust + uniprot + plass plass = Soil Reference Catalog + Marine Eukaryotic Reference Catalog This script could require 15GB of physical memory when submitting the job by qsub or sbatch. -hmmsearchdb in the above case includes: mgnify (aka EBI metagenomics), IMG/VR, and NCBI TSV.