VCFTools on Helix
VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc. The Perl tools support all versions of the VCF specification (3.2, 3.3, and 4.0), nevertheless, the users are encouraged to use the latest version VCFv4.0. The VCFtools in general have been used mainly with diploid data, but the Perl tools aim to support polyploid data as well.
VCFTools is maintained and developed by Adam Auton, Peter Danecek and collaborators. VCFTools paper.
Programs location
/usr/local/vcftools/bin
Please Note, tabix and bgzip are both under the same directory.It is important that the environment for VCFtools is set correctly before running the programs. This can be done by typing 'module load vcftools' as in the example below. This only needs to be done once per login session.
How To Use
Example files below can be copied from /usr/local/vcftools/examples/
Example1: Running compare-vcf
helix% module load vcftools helix% cd /data/userID/vcftools/run1 helix% cp /usr/local/vcftools/examples/cmp-test-* . helix% compare-vcf cmp-test-a.vcf.gz cmp-test-b.vcf.gz Number of sites found only in
6 cmp-test-a.vcf.gz (100.0%) cmp-test-b.vcf.gz (100.0%)
Example2: Running vcf-concat
helix% module load vcftools helix% cd /data/userID/vcftools/run1 helix% cp /usr/local/vcftools/examples/concat-* . helix% vcf-concat concat-a.vcf.gz concat-b.vcf.gz concat-c.vcf.gz | bgzip -c > out.vcf.gz
Example3: Running vcf-annotate
helix% module load vcftools helix% cd /data/userID/vcftools/run1 helix% cp /usr/local/vcftools/examples/concat-a.vcf . helix% cp /usr/local/vcftools/examples/annotate.txt . helix% bgzip -c annotate.txt > annotate.txt.gz helix% tabix -p vcf annotate.txt.gz helix% vcf-annotate -c FROM,TO,CHROM,INFO/HM2,INFO/GN,INFO/DP \ -d key=INFO,ID=HM2,Number=0,Type=Flag,Description="HapMap2 membership" \ -d key=INFO,ID=GN,Number=1,Type=String,Description="Gene Name" \ -d key=INFO,ID=DP,Number=0,Type=Integer,Description="Depth,etc" \ concat-a.vcf -a annotate.txt.gz ##fileformat=VCFv4.0
##INFO=
##FORMAT=
##FORMAT=
##FORMAT=
##FILTER=
##INFO=
##INFO=
##source_20110110.1=/usr/local/vcftools/bin/vcf-annotate -c FROM,TO,CHROM,INFO/HM2,INFO/GN,INFO/DP -d key=INFO,ID=HM2,Number=0,Type=Flag,Description=HapMap2 membership -d key=INFO,ID=GN,Number=1,Type=String,Description=Gene Name -d key=INFO,ID=DP,Number=0,Type=Integer,Description=Depth,etc concat-a.vcf -a annotate.txt.gz
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
1 100 . GTTT G 1806 q10 DP=5;GN=gene1;HM2 GT:GQ:DP 0/1:409:35
1 110 . CAAA C 1792 PASS DP=6 GT:GQ:DP 0/1:245:32
1 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
1 130 . GAA G 1016 PASS DP=7;HM2 GT:GQ:DP 0/1:212:22
1 140 . GT G 727 PASS DP=8 GT:GQ:DP 0/1:150:30
1 150 . TAAAA TA,T 246 PASS DP=9 GT:GQ:DP 1/2:12:10
1 160 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
2 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:409:35
2 110 . CAAA C 1792 PASS GN=gene2;HM2 GT:GQ:DP 0/1:245:32
2 120 . GA G 628 q10 GN=gene2;HM2 GT:GQ:DP 1/1:21:21
2 130 . GAA G 1016 PASS GN=gene2;HM2 GT:GQ:DP 0/1:212:22
2 140 . GT G 727 PASS GN=gene2;HM2 GT:GQ:DP 0/1:150:30
2 150 . TAAAA TA,T 246 PASS GN=gene2;HM2 GT:GQ:DP 1/2:12:10
2 160 . TAAAA TA,T 246 PASS DP=11;GN=gene3 GT:GQ:DP 1/2:12:10

