eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data. It is based on an online-EM algorithm  that results in space (memory) requirements proportional to the total size of the target sequences and time requirements that are proportional to the number of sampled fragments. Thus, in applications such as RNA-Seq, eXpress can accurately quantify much larger samples than other currently available tools greatly reducing computing infrastructure requirements. eXpress can be used to build lightweight high-throughput sequencing processing pipelines when coupled with a streaming aligner (such as Bowtie), as output can be piped directly into eXpress, effectively eliminating the need to store read alignments in memory or on disk.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands, 'module load express', as in the example below.
In the following two sub-sections, you will run eXpress on a sample RNA-Seq dataset with simulated reads from UGT3A2 and the HOXC cluster using the human genome build hg18. Both the transcript sequences (transcripts.fasta) and raw reads (reads_1.fastq, reads_2.fastq) can be found in the /usr/local/express/sample_data directory. For this example to work, you will need to have both bowtie and samtools in your $PATH (you can use 'module' command), but in general any aligner will work and the conversion to BAM is not necessary unless you have insufficient disk space to store the uncompressed SAM.
Before you begin, you must prepare your Bowtie index. Since you wish to allow many multi-mappings, it is useful to build the index with a small offrate (in this case 1). The smaller the offrate, the larger the index and the faster the mapping. If you have disk space to spare, always use an offrate of 1.
1. Build the index with the following commands.
$ mkdir /data/userID/express $ cp -rp /usr/local/express/sample_data /data/userID/express $ cd /data/userID/express/sample_data $ module load bowtie/0.12.8 # this one use bowtie instead of bowtie2 $ module load samtools $ bowtie-build --offrate 1 transcripts.fasta transcript
This command will populate your directory with several index files that allow Bowtie to more easily align reads to the transcripts.
You can now map the reads to the transcript sequences using the following Bowtie command, which outputs in SAM (-S), allows for unlimited multi-mappings (-a), a maximum insert distance of 800 bp between the paired-ends (-X 800), and 3 mismatches (-v 3). The first three options (a,S,X) are highly recommended for best results. You should also allow for many mismatches, since eXpress models sequencing errors. Furthermore, you will want to take advantage of multiple processors when mapping large files using the -poption. See the Bowtie Manual for more details on various parameters and options.
2. The SAM output from Bowtie is piped into SAMtools in order to compress it to BAM format. This conversion is optional, but will greatly reduce the size of the alignment file.
$ bowtie -aS -X 800 --offrate 1 -v 3 transcript -1 reads_1.fastq -2 reads_2.fastq | samtools view -Sb - > hits.bam
3. Once you have aligned your reads to the transcriptome and stored them in a SAM or BAM file, you can run eXpress in default mode with the command:
$ module load express $ express transcripts.fasta hits.bam
4. If you do not wish to store an intermediate SAM/BAM file, you can pipe the Bowtie output directly into eXpress with the command:
$ module load bowtie/0.12.8 samtools express $ bowtie -aS -X 800 --offrate 1 -v 3 transcript -1 reads_1.fastq -2 reads_2.fastq | express transcripts.fasta