hhblits(uniclust30) [neff_cut1]----> jackhmmer(uniref90, top 30000 raw hits build custom db for hhblits) [neff_cut2]----> hmmsearch(metaclust, top 30000 raw hits build custon db for hhblits) build_MSA.py neff_cut1=neff_cut2=64 build_MSA2 series msa selected from best of three round search (if run). build_MSA2.py orignal Chengxin's script, neff_cut1=128, neff_cut2=64, prefer stop at 2nd round search (uniref90) build_MSA2.0.py enforce the searching to 2nd round, if neff jackhmmer(uniref90, top 30000 raw hits build custom db for hhblits) [neff_cut2]----> hmmsearch(metaclust, top 30000 raw hits build custon db for hhblits) main filter 1.in each step build custom db for hhblits step, which may take long time, add psiblast filter with a cut_e-value, all(<30000 for 2nd and <15000 for 3rd) raw (hitted by hmmer) sequences with psiblast e-valuetimeout=18min, then do not build db for hhblits, run psiblast add sequence to query (in some special case, kclust may run a long time!), if clusters number greater than a cutoff (3000 for 2nd, 1500 for 3rd), rank clusters by psiblast evalue, minimal evalue of a cluster will be treated as cluster representative evalue. take top cutoff clusters build_MSA5.py as descriped above, neff_cut1=128, neff_cut2=64, prefer stop at 2nd round search (uniref90) build_MSA5.0.py enforce the searching to 2nd round, if nefftimeout=18min, then do not build db for hhblits, run psiblast add sequence to query (in some special case, kclust may run a long time!). this will reduce the contact prediction accuracy. change it to just using hhblits MSA. bugs found in kClust.py and kClust_filter.py in this package have clear the a3m folder before build custom database for hhblits, in old version, hmmsearch round (3rd round will build 2nd round hits for hhblits again, which means will double time for some targets which need run deeper round, effect may potential affected!) if we want search both custom database, better way is merge second to first one, not build it twice. build_MSA7 series build_MSA7.yang.py is Yang's build_MSA scripts which is basing on build_MSA.py, just change neff_cut1 to 128 neff_cut2 to 128 build_MSA2.py further filter final MSA may to 75% and remove hmmer fragments with less 30% cov; build_MSA7.6.py is basing on build_MSA6.fix.py and change the parameters workflow to build_MSA7.yang.py. both neff_cut set to 128 no 30% cov after hmmer to raw hits, and only filter to 60% cov at final. main results, as add 2 filters, the prfomance is slightly down, when run long time (2nd and 3rd search), build_MSA5 series and build_MSA6 series running faster than build_MSA2 series, with 1% precison down. while the max running time can down a lot. while it also make some targets (stop 2nd in build_MSA2) running time long, because it run in next round (3rd round). so when run shallow (1st or 2nd round), time are no greater difference, while run to deeper (2nd or 3rd round), time is quite different.