Features Tutorial: Run Scientific Benchmark

Introduction

The Features Scientific Benchmark is used to compare batches of structures coming from different sources with respect to local chemical and geometric features. This tutorial describes how to run the features scientific benchmark. The general steps are

  1. For each sample source obtain or generate a batch of structure conformations
  2. For each sample source extract features into a features database
  3. For each analysis script compare 2 or more sample sources

Requirements

Batch Requirements

Each batch of structures should be as large and representative as possible. If you are generating structure predictions to compare against experimental data, here are some guidelines:

Computational Requirements

The features scientific benchmark can be run locally or on a MPI-based cluster. The computational time and space requirements differ for the different stages of the analysis.

Database Support Requirements

The features scientific benchmark stores feature data databases. Currently it and works with SQLite3 and support for MySQL and PostgreSQL databases is under development.

Cluster Environment Requirements

The features scientific benchmark supports single-threaded , MPI and Condor computational environments. The feature extraction process only uses the rosetta_scripts application and the jd2 job distributor. So if you are to get those to work on your platform, it should be possible to get the features scientific benchmark to work as well. Specific configuration information for the following job schedulers is provided.

Generate Feature Databases

Generating the feature database involves extracting feature information from each structure. Usually this requires specifying the following information

Inputting Structures

The coordinates of the structures for used to extract the feature information can be supplied in any format recognized by the rosetta_scripts application. NOTE: The input format determines the tags in the structures table in the resulting features database. This is important because the structures table provides a way to connect a struct_id , which is used to identify structures within the features database, with a tag , which is used to identify the structure outside of the database. Here are command line flags that are relevant to the different input types:

Specify Features Reporters

Each FeaturesReporter is responsible for extracting a certain type of features to the features database. Select a set FeaturesReporters and then include them as subtags to the < ReportToDB /> mover tag in the rosetta_scripts script.

Sample Source Templates

The features scientific benchmark has sample source templates which are used to setup configuration information to do feature extraction for an input dataset.

Each sample_source template is a folder in rosetta_tests/features/sample_sources/ with the following files:

Specify which sample sources to use by editing rosetta/main/test/scientific/cluster/features/sample_sources/benchmark.list. See the Sample Sources page for details about each sample source.</li>

Run features scientific benchmark using the features.py script

    rosetta_tests/features.py [OPTIONS] [RUN]

Options for features.py

These are the command line options used to run features.py