High-Performance Computing at the NIH

RSS Feed

VCFTools on Helix

VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc. The Perl tools support all versions of the VCF specification (3.2, 3.3, and 4.0), nevertheless, the users are encouraged to use the latest version VCFv4.0. The VCFtools in general have been used mainly with diploid data, but the Perl tools aim to support polyploid data as well.

VCFTools is maintained and developed by Adam Auton, Peter Danecek and collaborators. VCFTools paper.

Programs location

/usr/local/vcftools/bin

Please Note, tabix and bgzip are both under the same directory.

It is important that the environment for VCFtools is set correctly before running the programs. This can be done by typing 'module load vcftools' as in the example below. This only needs to be done once per login session.

How To Use

Example files below can be copied from /usr/local/vcftools/examples/

Example1: Running compare-vcf

helix% module load vcftools

helix% cd /data/userID/vcftools/run1

helix% cp /usr/local/vcftools/examples/cmp-test-* .

helix% compare-vcf cmp-test-a.vcf.gz cmp-test-b.vcf.gz
Number of sites found only in
6 cmp-test-a.vcf.gz (100.0%) cmp-test-b.vcf.gz (100.0%)

Example2: Running vcf-concat


helix% module load vcftools
helix% cd /data/userID/vcftools/run1
helix% cp /usr/local/vcftools/examples/concat-* .
helix% vcf-concat concat-a.vcf.gz concat-b.vcf.gz concat-c.vcf.gz | bgzip -c > out.vcf.gz

Example3: Running vcf-annotate


helix% module load vcftools
helix% cd /data/userID/vcftools/run1
helix% cp /usr/local/vcftools/examples/concat-a.vcf .
helix% cp /usr/local/vcftools/examples/annotate.txt .
helix% bgzip -c annotate.txt > annotate.txt.gz
helix% tabix -p vcf annotate.txt.gz
helix% vcf-annotate -c FROM,TO,CHROM,INFO/HM2,INFO/GN,INFO/DP \
-d key=INFO,ID=HM2,Number=0,Type=Flag,Description="HapMap2 membership" \
-d key=INFO,ID=GN,Number=1,Type=String,Description="Gene Name" \
-d key=INFO,ID=DP,Number=0,Type=Integer,Description="Depth,etc" \
concat-a.vcf -a annotate.txt.gz
##fileformat=VCFv4.0
##INFO=
##FORMAT=
##FORMAT=
##FORMAT=
##FILTER=
##INFO=
##INFO=
##source_20110110.1=/usr/local/vcftools/bin/vcf-annotate -c FROM,TO,CHROM,INFO/HM2,INFO/GN,INFO/DP -d key=INFO,ID=HM2,Number=0,Type=Flag,Description=HapMap2 membership -d key=INFO,ID=GN,Number=1,Type=String,Description=Gene Name -d key=INFO,ID=DP,Number=0,Type=Integer,Description=Depth,etc concat-a.vcf -a annotate.txt.gz
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
1 100 . GTTT G 1806 q10 DP=5;GN=gene1;HM2 GT:GQ:DP 0/1:409:35
1 110 . CAAA C 1792 PASS DP=6 GT:GQ:DP 0/1:245:32
1 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
1 130 . GAA G 1016 PASS DP=7;HM2 GT:GQ:DP 0/1:212:22
1 140 . GT G 727 PASS DP=8 GT:GQ:DP 0/1:150:30
1 150 . TAAAA TA,T 246 PASS DP=9 GT:GQ:DP 1/2:12:10
1 160 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
2 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:409:35
2 110 . CAAA C 1792 PASS GN=gene2;HM2 GT:GQ:DP 0/1:245:32
2 120 . GA G 628 q10 GN=gene2;HM2 GT:GQ:DP 1/1:21:21
2 130 . GAA G 1016 PASS GN=gene2;HM2 GT:GQ:DP 0/1:212:22
2 140 . GT G 727 PASS GN=gene2;HM2 GT:GQ:DP 0/1:150:30
2 150 . TAAAA TA,T 246 PASS GN=gene2;HM2 GT:GQ:DP 1/2:12:10
2 160 . TAAAA TA,T 246 PASS DP=11;GN=gene3 GT:GQ:DP 1/2:12:10

Documentation

http://vcftools.sourceforge.net/docs.html