==================================================================================================== Protein Secondary Structure Prediction (SSpro) 3-Class DSSP Representation Method Description & Project Documentation ==================================================================================================== Author(s) : Christophe Magnan (cmagnan@ics.uci.edu) Copyright : Institute for Genomics and Bioinformatics University of California, Irvine Modified : 2015/07/02 ==================================================================================================== Method Description ==================================================================================================== SSpro is a widely used protein secondary structure predictor. From an input protein amino-acid sequence, SSpro predicts the secondary structure state of each amino-acid in the protein sequence using the 3-class secondary structure representation deduced from the 8-class DSSP representation: - E (Beta Sheet) DSSP classes E + B - H (Helix) DSSP classes H + G - C (Coil) DSSP classes C + I + S + T SSpro is a 3-step predictor where the output of each step becomes the input of the next step. Each step is performed by an independent software delivered with SSpro. SSpro can thus be considered as a wrapper tool enriched with its own set of prediction models (recurrent neural networks). A brief description of the three main components of SSpro is provided below: - a sequence profile generator (PROFILpro software) to extract the protein evolutionary information - a set of 100 BRNNs (neural networks trained with 1D-BRNN software) performing the first ab-initio prediction directly from the sequence profiles generated by PROFILpro. - a homology-based secondary structure predictor (HOMOLpro software) improving the initial ab-initio predictions with homology-based ones when homologs can be found in the Protein Data Bank (PDB). SSpro is now delivered as part of the SCRATCH-1D suite of predictors together with SSpro8, ACCpro, ACCpro20, PROFILpro, HOMOLpro, and 1D-BRNN. SSpro is no longer made available as a standalone tool. SCRATCH-1D allows to run SSpro, SSpro8, ACCpro, and ACCpro20 predictors on multiple sequences using multiple cores in a single run reducing significantly the computation time needed to obtain the predictions in comparison with running each predictor separately. Scripts to run SSpro separately are nevertheless provided (documentation provided below) in the 'bin' folder of the SSpro release included in the SCRATCH-1D package but we highly recommend to use directly SCRATCH-1D 'bin' scripts to run all the predictors at once and save a significant amount of time. ==================================================================================================== Project Documentation ==================================================================================================== This section provides a description of the project folder and how to use SSpro. ========================================= Project Folder ========================================= A brief description of the project folders is given below. - bin Main scripts to run SSpro - data SSpro prediction models (BRNNs) - doc Documentation of the software - env Bash profile for running SSpro - lib SSpro scripts to predict secondary structure - tmp Temporary work folder for the software ========================================= Software Usage ========================================= SSpro comes with only one script to run the predictor : bin/sequence_to_ss.sh Usage : ./sequence_to_ss.sh input_fasta output_predictions [num_threads] With: - input_fasta Input protein sequences in FASTA file format - output_predictions Predicted secondary structures - num_threads Number of cores to use to process the dataset (default=1) Three additional scripts are provided for specific cases only: - sequence_to_ss_ab.sh : returns SSpro ab-initio predictions only, the homology analysis will not be performed and predictions will not be improved by this second stage prediction. This script is only provided for evaluation purposes. Usage is identical to 'sequence_to_ss.sh'. - profiles_to_ss.sh & profiles_to_ss_ab.sh : scripts to run SSpro directly from the sequence profiles instead of the protein amino-acid sequences. These scripts are used by SCRATCH-1D to optimize computation time and are not expected to be used directly. Usage is similar to the other scripts but the input fasta file must be replaced by the profiles generated by PROFILpro. ======================================= Input Files Format ======================================= Input files must be in the standard FASTA file format. There is no limit for the number of input sequences to process beside the amount of RAM memory available on the machine running the program. When profiles are provided as input instead of protein sequences, the input file format is the same as PROFILpro output files, please refer to the documentation of PROFILpro for more details. ==================================== Output Files Description ==================================== Output files are in the same file format as the input files where the protein amino-acid sequence is replaced by the predicted secondary structure. Headers are reported as provided in input. ==================================================================================================== Release Notes ==================================================================================================== Version 5.2 (2015) Author : Christophe Magnan Description : Minor revision Comments : Repackaged for SCRATCH-1D release 1.1 Version 5.1 (2013) Author(s) : Christophe Magnan Description : Retrained predictor and updated datasets together with new code and packaging Comments : Profiles and homology analysis now performed by PROFILpro and HOMOLpro softwares Versions 4.0 & 4.1 (2005) Author(s) : Jianlin Cheng Description : Retrained predictor with updated datasets Comments : No longer available Versions < 4.0 Author(s) : Gianluca Pollastri Description : Initial versions of the predictor Comments : No longer available ====================================================================================================