==================================================================================================== Protein Secondary Structure Prediction (SSpro8) 8-Class DSSP Representation Method Description & Project Documentation ==================================================================================================== Author(s) : Christophe Magnan (cmagnan@ics.uci.edu) Copyright : Institute for Genomics and Bioinformatics University of California, Irvine Modified : 2015/07/02 ==================================================================================================== Method Description ==================================================================================================== SSpro8 is a 8-class secondary structure predictor included in the SCRATCH-1D suite of predictors. This predictor is basically the 8-class version of the well-known SSpro 3-class secondary structure predictor and predicts the secondary structure state of each amino-acid in a protein sequence using all the secondary structure states considered in the DSSP representation: - H (alpha helix) - B (residue in isolated beta-bridge) - E (extended strand, participates in beta ladder) - G (3-helix - 3/10 helix) - I (5-helix - pi helix) - T (hydrogen bonded turn) - S (bend) - C (loop or irregular, coil) SSpro8 is a 3-step predictor where the output of each step becomes the input of the next step. Each step is performed by an independent software delivered with SSpro8. SSpro8 can thus be considered as a wrapper tool enriched with its own set of prediction models (recurrent neural networks). A brief description of the three main components of SSpro8 is provided below: - a sequence profile generator (PROFILpro software) to extract the protein evolutionary information - a set of 100 BRNNs (neural networks trained with 1D-BRNN software) performing the first ab-initio prediction directly from the sequence profiles generated by PROFILpro. - a homology-based secondary structure predictor (HOMOLpro software) improving the initial ab-initio predictions with homology-based ones when homologs can be found in the Protein Data Bank (PDB). SSpro8 is delivered as part of the SCRATCH-1D suite of predictors together with ACCpro, ACCpro20, SSpro, PROFILpro, HOMOLpro, and 1D-BRNN. SCRATCH-1D allows to run all the predictors on multiple sequences using multiple cores in a single run reducing significantly the computation time needed to obtain the predictions in comparison with running each predictor separately. Scripts to run SSpro8 separately are nevertheless provided (documentation provided below) in the 'bin' folder of the SSpro8 release included in the SCRATCH-1D package but we highly recommend to use directly SCRATCH-1D 'bin' scripts to run all the predictors at once and save a significant amount of time. ==================================================================================================== Project Documentation ==================================================================================================== This section provides a description of the project folder and how to use SSpro8. ========================================= Project Folder ========================================= A brief description of the project folders is given below. - bin Main scripts to run SSpro8 - data SSpro8 prediction models (BRNNs) - doc Documentation of the software - env Bash profile for running SSpro8 - lib SSpro8 scripts to predict secondary structure - tmp Temporary work folder for the software ========================================= Software Usage ========================================= SSpro8 comes with only one script to run the predictor : bin/sequence_to_ss8.sh Usage : ./sequence_to_ss8.sh input_fasta output_predictions [num_threads] With: - input_fasta Input protein sequences in FASTA file format - output_predictions Predicted secondary structures - num_threads Number of cores to use to process the dataset (default=1) Three additional scripts are provided for specific cases only: - sequence_to_ss8_ab.sh : returns SSpro8 ab-initio predictions only, the homology analysis will not be performed and predictions will not be improved by this second stage prediction. This script is only provided for evaluation purposes. Usage is identical to 'sequence_to_ss8.sh'. - profiles_to_ss8.sh & profiles_to_ss8_ab.sh : scripts to run SSpro8 directly from the sequence profiles instead of the protein amino-acid sequences. These scripts are used by SCRATCH-1D to optimize computation time and are not expected to be used directly. Usage is similar to the other scripts but the input fasta file must be replaced by the profiles generated by PROFILpro. ======================================= Input Files Format ======================================= Input files must be in the standard FASTA file format. There is no limit for the number of input sequences to process beside the amount of RAM memory available on the machine running the program. When profiles are provided as input instead of protein sequences, the input file format is the same as PROFILpro output files, please refer to the documentation of PROFILpro for more details. ==================================== Output Files Description ==================================== Output files are in the same file format as the input files where the protein amino-acid sequence is replaced by the predicted secondary structure. Headers are reported as provided in input. ==================================================================================================== Release Notes ==================================================================================================== Version 5.2 (2015) Author : Christophe Magnan Description : Minor revision Comments : Repackaged for SCRATCH-1D release 1.1 Version 5.1 (2013) Author(s) : Christophe Magnan Description : Retrained predictor and updated datasets together with new code and packaging Comments : Profiles and homology analysis now performed by PROFILpro and HOMOLpro softwares Versions 4.0 & 4.1 (2005) Author(s) : Jianlin Cheng Description : Retrained predictor with updated datasets Comments : No longer available Versions < 4.0 Author(s) : Gianluca Pollastri Description : Initial versions of the predictor Comments : No longer available ====================================================================================================