High-Performance Computing at the NIH

RSS Feed
Biotoolbox on Helix
A collection of various Perl scripts that utilize BioPerl  modules for use in bioinformatics analysis. Tools are included for processing microarray data, next generation sequencing data, data file format conversion, querying datasets, and general high level analysis of datasets.

This tool box of programs relies on storing genome annotation, microarray, and next generation sequencing data in local BioPerl databases, allowing for data retrieval relative to any annotated feature in the database. While referencing genomic annotation and features from a database are convenient, they are not required. Simple Bed style input files are also supported for data collection.

 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

helix% module avail biotoolbox

---------- /usr/local/Modules/3.2.9/modulefiles --------------------
biotoolbox/1.8.0 biotoolbox/1.8.6 biotoolbox/1.9.4

helix% module load biotoolbox

helix% module list
Currently Loaded Modulefiles:
  1) biotoolbox/1.9.4

How To Use
Sample session:
helix% module load biotoolbox

helix% print_feature_types.pl cerevisiae_20100109
  Found 28 feature types in database 'cerevisiae_20100109'
   There are 2 feature types with source 'Publication'
      TF_binding_site
      uORF
   There are 25 feature types with source 'SGD'
      ARS
      CDS
      LTR_retrotransposon
      binding_site
      centromere
      chromosome
      external_transcribed_spacer_region
      five_prime_UTR_intron
      gene
      ....

helix% get_datasets.pl --db cerevisiae_20100109 --feature gene:SGD --data none --out gene_list A program to collect feature data from the database Generating a new feature list from database 'cerevisiae_20100109'... Searching for gene:SGD Found 6607 features in the database. Kept 5778 features. wrote file './gene_list.txt' Completed in 0.0 minutes

 

Documentation

http://code.google.com/p/biotoolbox/