Features Tutorial: Run features analysis -->

Comparing sample sources takes one or more databases of features and a set of analysis scripts and generated plots and statistics.

Run compare_sample_sources.R

The compare_sample_sources.R runs feature analysis scripts on feature databases. It takes as input:

It then passes this configuration information to each analysis script. Usually the analysis script will generate the plots and statistcs in

    path_to_output_dir/<sample_source1>[_<sample_source2>[_...]]/<output_format>/

There are several ways to run compare_sample_sources.R , which are described below. If you forget later, try

   compare_sample_sources.R --help

Suggested Directory Layout

When using sqlite3 feature database, I like to setup the following directory structure. (Note: I generate the sample_source directories as a result of a cluster run using rosetta_tests/features/features.py and templates in rosetta_tests/features/sample_sources/ . This is described in detail here .)

    project/
       sample_source_id1_r#######_YYMMDD/                   #e.g. Top8000
           features_sample_source_id1_r#######_YYMMDD.db3   #experimental, X-ray 
           features.xml
           flags
           <log_files>
       sample_source_id2_r#######_YYMMDD/                   #e.g. Top8000_relax
           features_sample_source_id2_r#######_YYMMDD.db3   #relaxed natives
           features.xml
           flags
           <log_files>
       sample_source_id3_r#######_YYMMDD/                   #e.g. Top8000_relax_new
           features_sample_source_id3_r#######_YYMMDD.db3   #relaxed_natives with
           features.xml                                     #alternative score function
           flags
           <log_files>
       features/                                            #run compare_sample_sources.R from here
           compare_sample_sources_iscript.R                 #autogenerated to re-run interactively
           analysis_configurations/
                feature_analysis1.json                      #These are passed in
                feature_analysis2.json                      #with the --config flag
                feature_analysis3.json
           build/
                 sample_source_id1_sample_source_id2_sample_source_id3/
                       analysis_script_id1/
                            output_pdf_huge/
                                <plots.pdf>
                            output_pdf_print/
                                <plots.pdf>
                            ...
                       analysis_script_id2/
                            ...

Notes and thoughts on the directory layout:

Run One Script at a Time

To run a single analysis script on one or more database,

   cd project/features
   path/to/rosetta/rosetta_tests/features/compare_sample_sources.R [OPTIONS] --script <analysis_script.R> path/to/features_<ss_id1>.db3 [path/to/features_<ss_id2>.db3 [...]]

Note:

    >source("compare_sample_sources_iscript.R")

will return the script and leave you to do further interactive analysis.

Run a Configuration File

To run one or more features analysis with an analysis configuration configuration file,

   cd project/features
   path/to/rosetta/rosetta_tests/features/compare_sample_sources.R --config analysis_configurations/<features_analysis>.json

The advantage of running compare_sample_source.R from a configuration script include,

The format for the analysis configuration uses the .json format, which is basically nested python dictionaries and lists with numbers and strings as leaves. Here is an example script that was to generate figure 3 in the Rosetta Scientific Benchmarking paper:

{  
    "output_dir" : "build/general_analysis",  
    "sample_source_comparisons" : [  
        {  
            "sample_sources" : [  
                {  
                    "database_path" : "/home/momeara/scr2/data/sp2_8b/top8000_sp2_r47244_120205/features_top8000_sp2_r47244_120205.db3",  
                    "id" : "Native",  
                    "reference" : true  
                },  
                {  
                    "database_path" : "/home/momeara/scr2/dun10_vs_bicubic/top8000_relax_r46440_111213/features_top8000_relax_r46440_111213.db3",  
                    "id" : "Relaxed Native Score12",  
                    "reference" : false,  
                    "model" : "Score12"  
                },  
                {  
                    "database_path" : "/home/momeara/scr2/data/sp2_no_lj_correction/top8000_relax_sp2_no_lj_correction_r48561_120518/features_top8000_relax_sp2_no_lj_correction_r48561_120518.db3",  
                    "id" : "Relaxed Native NewHB",  
                    "reference" : false,  
                    "model" : "NewHB"  
                },  
                {  
                    "database_path" : "/home/momeara/scr2/data/sp2_8b/top8000_relax_sp2_r47244_120205/features_top8000_relax_sp2_r47244_120205.db3",  
                    "id" : "Relaxed Native NewHB LJcorr",  
                    "reference" : false,  
                    "model" : "NewHB"  
                }  
            ],  
            "analysis_scripts" : [  
                "scripts/analysis/plots/hbonds/hydroxyl_sites/OHdonor_AHdist.R"  
            ],  
            "output_formats" : [  
                "output_small_eps"  
            ]  
        }  
    ]  
}  

Notes: