HMMER 3.1b2 release notes HMMER3.1 beta test, release 2 http://hmmer.org/ TJW, Sun Feb 22 07:59:45 2015 ________________________________________________________________ 3.1b2 includes the following large changes: -:- New heuristic for accelerating nhmmer roughly 10-fold. We have developed a new algorithm that accelerates DNA search in nhmmer. The acceleration can be tuned, such that greater speed will tend to decrease sensitivity. The default settings yield roughly 10-fold acceleration while retaining nearly complete sensitivity among hits with E-value < 1e-3 (with a modest loss in sensitivity among marginal hits with E > 1e-3) This algorithm requires that the sequence database first be preprocessed into a binary file format. The new tool makehmmerdb performs this task. -:- New method of deciding if a sequence is a fragment. If hmmbuild determines that a sequence is a fragment, all leading and trailing gap symbols (all gaps before the first residue and after the last residue) are treated as missing data symbols, and thus do not count as observed gaps. In H3.0 and H3.1b1, a sequence was called a fragment if its length was less than a specified fraction of the alignment length. In the case of alignments with many sequences, this often resulted in all sequences being labeled as fragments, which could lead to unexpected terminal match states when a small fraction of sequences contained a long terminal extension. Now, a sequence is labeled a fragment if its range in the alignment (the number of alignment columns between the first and last positions of the sequence) is not greater than a specified fraction of the full alignment length. This should improve HMMER's ability to model alignments with ragged ends. Other changes include: -:- The DNA search tool, nhmmer, depends on a value MAXL, which hmmbuild computes as an assertion of the maximum length at which HMMER expects to see an instance of the model. This value could previously become excessively long when building a model from an alignment with many long insertions. The MAXL value computed by hmmbuild for DNA alignments is now limited to 20*M, where M is the # of match states. -:- A new tool, called hmmlogo, that computes letter height and indel parameters that can be used to produce a profile HMM logo. This tool can be thought of as a command-line interface for the data underlying the Skylign logo server (skylign.org). Bugfixes: -:- #h100 hmmalign would segfault on a zero length input sequence. -:- #h101 hmmsearch would segfault when searching a DNA HMM against a protein db (on Linux only). -:- #h102 Marginal hits late in a target sequence database were subject to being filtered in an nhmmer search. This was due to a score filter that (a) was intended to accelerate search, but had essentially no impact on speed, and (b) was an overly aggressive filter. Removed the filter. -:- #h103 Error printing very small E-values. Closely related to #h98, but occuring in the main thread (#h98 fixed the same problem in worker threads). -:- #h104 HMMER would not compile on OpenBSD, because netinet/in.h was not included. This header file is included via arpa/inet.h on most other systems, but not on OpenBSD. -:- #h105 Errors encountered while running 'make clean' and 'make distclean' in binary builds. This was the result of the Makefile trying to remove the userguide folder and LICENSE.txt file, which are already removed in the release process. The Makefile now accounts for this possibility. -:- #h106 H3 failed to read some old H2 HMM files. This happened in the cases that (1) there was an empty DESC field in the file, or (2) the model was not normalized. Both cases have been resolved. -:- #h107 hmmsim only worked for Amino Acid models. It now works for nucleotide models, also.