hhblits(uniclust30)  [neff_cut1]---->   jackhmmer(uniref90, top 30000 raw hits build custom db for hhblits)
                     [neff_cut2]---->   hmmsearch(metaclust, top 30000 raw hits build custon db for hhblits)

build_MSA.py      neff_cut1=neff_cut2=64

build_MSA2 series

msa selected from best of three round search (if run). 

build_MSA2.py     orignal Chengxin's script, neff_cut1=128, neff_cut2=64, prefer stop at 2nd round search (uniref90)
build_MSA2.0.py   enforce the searching to 2nd round, if neff<neff_cut2=64, then start 3rd metagenome search
build_MSA2.1.py   change both neff_cut to 128
build_MSA2.2.py   change both neff_cut to 128, enforce the searching to 2nd round, if neff<neff_cut2=128, then start 3rd metagenome search   
build_MSA2.3.py   change both neff_cut to 128, enforce the searching to 2nd round, 3rd search


buid_MSA3 and build_MSA4 series are development versions


build_MSA5 series

hhblits(uniclust30)  [neff_cut1]---->   jackhmmer(uniref90, top 30000 raw hits build custom db for hhblits)
       	       	     [neff_cut2]---->   hmmsearch(metaclust, top 30000 raw hits build custon db for hhblits)
        main filter  1.in each step build custom db for hhblits step, which may take long time, add psiblast filter with a cut_e-value,
                     all(<30000 for 2nd and <15000 for 3rd) raw (hitted by hmmer) sequences with psiblast e-value<cut_e-value[0.001] are selected. 
                     2.in kClust step, if kClust running time>timeout=18min, then do not build db for hhblits, run psiblast add sequence to query
                     (in some special case, kclust may run a long time!), if clusters number greater than a cutoff (3000 for 2nd, 1500 for 3rd),
                     rank clusters by psiblast evalue, minimal evalue of a cluster will be treated as cluster representative evalue. take top cutoff
                     clusters 
                     
build_MSA5.py     as descriped above, neff_cut1=128, neff_cut2=64, prefer stop at 2nd round search (uniref90)
build_MSA5.0.py	  enforce the searching	to 2nd round, if neff<neff_cut2=64, then start 3rd metagenome search
build_MSA5.1.py	  change both neff_cut to 128 
build_MSA5.2.py   change both neff_cut to 128, enforce the searching to 2nd round, if neff<neff_cut2=128, then start 3rd metagenome search
build_MSA5.3.py   change both neff_cut to 128, enforce the searching to 2nd round, 3rd search             


build_MSA6 series
build_MSA6.py     basing on build_MSA5.py, if raw hits<1000, do not using psiblst filter the raw hits. 
                  
build_MSA6.e.py   basing on build_MSA5.py, if raw hits<1000, do	not using psiblst filter the raw hits. psiblast e-value_cut=0.01

build_MSA6.fix.py in former version, if kClust running time>timeout=18min, then do not build db for hhblits, run psiblast add sequence to query
                  (in some special case, kclust may run a long time!). this will reduce the contact prediction accuracy. change it to just using 
                  hhblits MSA.

bugs found in 
kClust.py and kClust_filter.py in this package have clear the a3m folder before build custom database for hhblits, in old version, hmmsearch round
                               (3rd round will build 2nd round hits for hhblits again, which means will double time for some targets which need run 
                               deeper round, effect may potential affected!) 

if we want search both custom database, better way is merge second to first one, not build it twice.


build_MSA7 series

build_MSA7.yang.py is Yang's build_MSA scripts which is basing on build_MSA.py, just change neff_cut1 to 128 neff_cut2 to 128
                   build_MSA2.py further filter final MSA may to 75% and remove hmmer fragments with less 30% cov; 

build_MSA7.6.py    is basing on build_MSA6.fix.py and change the parameters workflow to build_MSA7.yang.py. both neff_cut set to 128
                   no 30% cov after hmmer to raw hits, and only filter to 60% cov at final.

main results, as add 2 filters, the prfomance is slightly down, when run long time (2nd and 3rd search),
build_MSA5 series and build_MSA6 series running faster than build_MSA2 series, with 1% precison down.
while the max running time can down a lot. while it also make some targets (stop 2nd in build_MSA2) running time long,
because it run in next round (3rd round). so when run shallow (1st or 2nd round), time are no greater difference, while 
run to deeper (2nd or 3rd round), time is quite different.