High-Performance Computing at the NIH

RSS Feed
BedTools on Helix

The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

- Intersecting two BED files in search of overlapping features
- Merging overlapping features
- Screening for paired-end (PE) overlaps between PE sequences and existing genomic features
- Calculating the depth and breadth of sequence coverage across defined "windows" in a genome

The fact that all of the BEDTools accept input from “stdin” allows one to “stream / pipe” several commands together to facilitate more complicated analyses. Also, the tools allow fine control over how output is reported. It includes support for sequence alignments in BAM format, as well as for features in GFF and “blocked” BED format.

Program Location

/usr/local/bedtools/bin

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands, 'module load bedtools , as in the example below.

Frequently used module commands:

$ module load AppName
$ module load AppName/AppVersion
$ module unload AppName
$ module avail
$ module avail AppName
$ module list
$ module switch AppName AppName/AppVersion
$ module display AppName

If you use this application very often, in addition to use the 'module' command, you can also set the environmental variables in your /home/UserID/.bashrc or /home/userID/.cshrc file so that it will be done automatically when you login and you don't need to set the environmental variable(s) everytime.

For bash users:

$ export PATH=/usr/local/bedtools_2.7.1/bin:$PATH

For tcsh/csh users:

% set path=(/usr/local/bedtools_2.7.1/bin ${path})

List of programs:

bamToBed
closestBed
complementBed
coverageBed
fastaFromBed
genomeCoverageBed
intersectBed
linksBed
maskFastaFromBed
mergeBed
pairToBed
pairToPair
shuffleBed
slopBed
sortBed
subtractBed
windowBed

Example Usage

Here are some brief examples of common usage (see document in tarball for more examples).

First run 'module' to load path

$ module load bedtools

Find overlaps between segmental duplications and exons

$ intersectBed -a segdups.bed -b exons.bed

Find overlaps between aligned sequences and exons

$ intersectBed -a sequences.bed -b exons.bed

Find those aligned sequences that do not overlap exons

$ intersectBed -a sequences.bed -b exons.bed -v

Find overlaps between both ends of paired-end reads and exons

$ pairToBed -a illumina_pairs.bedpe -b exons.bed -type both

Compute the depth and breadth of coverage of aligned reads and 10kb windows across a genome

$ coverageBed -a reads.bed -b windows.10kb.bed

Mask all of the human genome (hg18) except for your targeted capture probes (plus 500bp in each direction from each probe)

$ slopBed -i probes.bed -b 500 > probes.added500bp.bed
$ complementBed -i probes.added500bp.bed -g hg18.genome > probes.added500bp.complement.bed
$ maskFastaFromBed -in hg18.fa -bed probes.added500bp.complement.bed -fo hg18.allButProbes.masked.fa