snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).
Typical usage :
- Input: The inputs are predicted variants (SNPs, insertions, deletions and MNPs). The input file is usually obtained as a result of a sequencing experiment, and it is usually in variant call format (VCF).
- Output: SnpEff analyzes the input variants. It annotates the variants and calculates the effects they produce on known genes (e.g. amino acid changes). A list of effects and annotations that SnpEff can calculate can be found here.
How to Use
snpEff uses environment modules. Typemodule load snpEff
at the prompt.
To see the help menu, typesnpEff
at the prompt.
In addition to the many options, there is one extra option available not listed.
- -m memory allocated
By default, snpEff uses 4gb of memory. For large VCF input files, this may not be enough. To allocate 20gb of memory, use:snpEff -m 20g
SnpSift is a collection of tools to manipulate VCF (variant call format) files. Here's what you can do:
- Filter: You can filter using arbitrary expressions, for instance "(QUAL > 30) | (exists INDEL) | ( countHet() > 2 )". The actual expressions can be quite complex, so it allows for a lot of flexibility.
- Annotate: You can add 'ID' from another database (e.g. variants from dbSnp)
- CaseControl: You can compare how many variants are in 'case' and in 'control' groups. Also calculates p-values (Fisher exact test).
- Intervals: Filter variants that intersect with intervals.
- Intervals (intidx): Filter variants that intersect with intervals. Index the VCF file using memory mapped I/O to speed up the search. This is intended for huge VCF files and a small number of intervals to retrieve.
- Join: Join by generic genomic regions (intersecting or closest).
- RmRefGen: Remove reference genotype (i.e. replace '0/0' genotypes by '.')
- TsTv: Calculate transiton to transversion ratio.
- Extract fields: Extract fields from a VCF file to a TXT (tab separated) format.
- Variant type: Adds SNP/MNP/INS/DEL to info field. It also adds "HOM/HET" if there is only one sample.
- GWAS Catalog: Annotate using GWAS Catalog.
- dbNSFP: Annotate using dbNSFP: The dbNSFP is an integrated database of functional predictions from multiple algorithms (SIFT, Polyphen2, LRT and MutationTaster, PhyloP and GERP++, etc.)
Annotate against the UCSC hg19 build:
module load snpEff ln -s $SNPEFFHOME/example/file.vcf . snpEff -m 8g -v hg19 file.vcf > file.eff.vcf
Pull out 'HIGH IMPACT' variants:
cat file.eff.vcf | java -jar $SNPEFFHOME/SnpSift.jar filter "( EFF[*].IMPACT = 'HIGH' )" > file.filtered.vcf
Annotate against the dbNSFP database:
java -jar /usr/local/apps/snpEff/3.1h/SnpSift.jar dbnsfp -v /fdb/dbNSFP2/dbNSFP2.0b3.txt file.eff.vcf > file.annotated.vcf