==================================================================================================== Protein Relative Solvent Accessibility Prediction (ACCpro20) Accessibility Thresholds 0% to 95% Method Description & Project Documentation ==================================================================================================== Author(s) : Christophe Magnan (cmagnan@ics.uci.edu) Copyright : Institute for Genomics and Bioinformatics University of California, Irvine Modified : 2015/07/02 ==================================================================================================== Method Description ==================================================================================================== ACCpro20 is a 20-class protein relative solvent accessibility predictor included in the SCRATCH-1D suite of predictors. This predictor is basically the 20-class version of the well-known ACCpro solvent accessibility predictor and predicts directly the relative solvent accessibility of each amino-acid in a protein sequence considering twenty different thresholds: 0 (0% to 5% exposed), 5 (5% to 10% exposed), ... , 95 (95% to 100% exposed). For comparison purposes, ACCpro considers amino-acids less than 25% exposed as buried and exposed otherwise. ACCpro20 is a 3-step predictor where the output of each step becomes the input of the next step. Each step is performed by an independent software delivered with ACCpro20. ACCpro20 can thus be considered as a wrapper tool enriched with its own set of prediction models (recurrent neural networks). A brief description of the three main components of ACCpro20 is provided below: - a sequence profile generator (PROFILpro software) to extract the protein evolutionary information - a set of 100 BRNNs (neural networks trained with 1D-BRNN software) performing the first ab-initio prediction directly from the sequence profiles generated by PROFILpro. - a homology-based solvent accessibility predictor (HOMOLpro software) improving the initial ab-initio predictions with homology-based ones when homologs can be found in the Protein Data Bank ACCpro20 is now delivered as part of the SCRATCH-1D suite of predictors together with SSpro, SSpro8, ACCpro, PROFILpro, HOMOLpro, and 1D-BRNN. ACCpro20 is no longer made available as a standalone tool. SCRATCH-1D allows to run SSpro, SSpro8, ACCpro, and ACCpro20 predictors on multiple sequences using multiple cores in a single run reducing significantly the computation time needed to obtain the predictions in comparison with running each predictor separately. Scripts to run ACCpro20 separately are nevertheless provided (documentation provided below) in the 'bin' folder of the ACCpro20 release included in the SCRATCH-1D package but we highly recommend to use directly SCRATCH-1D 'bin' scripts to run all the predictors at once and save a significant amount of time. ==================================================================================================== Project Documentation ==================================================================================================== This section provides a description of the project folder and how to use ACCpro20. ========================================= Project Folder ========================================= A brief description of the project folders is given below. - bin Main scripts to run ACCpro20 - data ACCpro20 prediction models (BRNNs) - doc Documentation of the software - env Bash profile for running ACCpro20 - lib ACCpro20 scripts to predict the relative solvent accessibility - tmp Temporary work folder for the software ========================================= Software Usage ========================================= ACCpro20 comes with only one script to run the predictor : bin/sequence_to_acc20.sh Usage : ./sequence_to_acc20.sh input_fasta output_predictions [num_threads] With: - input_fasta Input protein sequences in FASTA file format - output_predictions Predicted relative solvent accessibility - num_threads Number of cores to use to process the dataset (default=1) Three additional scripts are provided for specific cases only: - sequence_to_acc20_ab.sh : returns ACCpro20 ab-initio predictions only, the homology analysis will not be performed and predictions will not be improved by this second stage prediction. This script is only provided for evaluation purposes. Usage is identical to 'sequence_to_acc20.sh'. - profiles_to_acc20.sh & profiles_to_acc20_ab.sh : scripts to run ACCpro20 directly from the sequence profiles instead of the protein amino-acid sequences. These scripts are used by SCRATCH-1D to optimize computation time and are not expected to be used directly. Usage is similar to the other scripts but the input fasta file must be replaced by the profiles generated by PROFILpro. ======================================= Input Files Format ======================================= Input files must be in the standard FASTA file format. There is no limit for the number of input sequences to process beside the amount of RAM memory available on the machine running the program. When profiles are provided as input instead of protein sequences, the input file format is the same as PROFILpro output files, please refer to the documentation of PROFILpro for more details. ==================================== Output Files Description ==================================== Output files are in the same file format as the input files where the protein amino-acid sequence is replaced by the predicted solvent accessibility. Note that relative accessibility values (0...95) are space-separated. Headers are reported as provided in input. ==================================================================================================== Release Notes ==================================================================================================== Version 5.2 (2015) Author : Christophe Magnan Description : Minor revision Comments : Repackaged for SCRATCH-1D release 1.1 Version 5.1 (2013) Author(s) : Christophe Magnan Description : Retrained predictor and updated datasets together with new code and packaging Comments : Profiles and homology analysis now performed by PROFILpro and HOMOLpro softwares Versions 4.0 & 4.1 (2005) Author(s) : Jianlin Cheng Description : Retrained predictor with updated datasets Comments : No longer available Versions < 4.0 Author(s) : Gianluca Pollastri Description : Initial versions of the predictor Comments : No longer available ====================================================================================================