FAQ

Easy installation of VEP

For easy installation of VEP, we provide the following commands for installation through conda.

For LOFTEE plugin, the installation location for cpanm, perl, and VEP should be the same. For example, for conda, all tools should be located under the same conda environment for VEP to locate other tools.

To install VEP version 110, users can use the following command. If users want to install other versions, users can modify the number 110 to their desired specific version.

(The provided commands are referenced from VEP document.)

conda install -c bioconda perl-bioperl
conda install perl-App-cpanminus
conda install -c bioconda ensembl-vep=110
cpanm --force Bio::Perl

wget https://github.com/ucscGenomeBrowser/kent/archive/v335_base.tar.gz
tar xzf v335_base.tar.gz

export KENT_SRC=$PWD/kent-335_base/src
export MACHTYPE=$(uname -m)
export CFLAGS="-fPIC"
export MYSQLINC=`mysql_config --include | sed -e 's/^-I//g'`
export MYSQLLIBS=`mysql_config --libs`

cd $KENT_SRC/lib
echo 'CFLAGS="-fPIC"' > ../inc/localEnvironment.mk

make clean && make
cd ../jkOwnLib
make clean && make

ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/

cpanm Bio::DB::BigFile
cpanm DBD::SQLite

When the following message appears when using VEP,

Compress::Raw::Zlib version 2.201 required--this is only version 2.105

Try running the following command.

conda update -c conda-forge perl-compress-raw-zlib

Users can check whether VEP is installed through vep –help. If VEP is installed, the message will appear as following.

#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 110.584a8f3
  ensembl-funcgen      : 110.24e6da6
  ensembl-io           : 110.b1a0d57
  ensembl-variation    : 110.d34d25e
  ensembl-vep          : 110.1

Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl

http://www.ensembl.org/info/docs/tools/vep/script/index.html

Usage:
./vep [--cache|--offline|--database] [arguments]

Basic options
=============

--help                 Display this message and quit

-i | --input_file      Input file
-o | --output_file     Output file
--force_overwrite      Force overwriting of output file
--species [species]    Species to use [default: "human"]

--everything           Shortcut switch to turn on commonly used options. See web
                      documentation for details [default: off]
--fork [num_forks]     Use forking to improve script runtime

For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

How to configure ANNOTATION_KEY_CONFIG yaml file

The ANNOTATION_KEY_CONFIG yaml file contains functional_score and functional_annotation. Each file is in bed format and should be written with their short name that represents them. This name will be further used in other analyses.

When setting the short name, avoid using underscores (‘_’). Underscores are used to distinguish different domains within a single category. For example, a category ‘A_B_C_D_E’ will be recognized as five domains, but if the name of the category is ‘A_B_C_D_E_F’, it will cause error while association testing, as the category is divided into six domains.

An example for ANNOTATION_KEY_CONFIG yaml file looks like below.

functional_score:
  bed1.bed.gz: annot1
  bed2.bed.gz: annot2
functional_annotation:
  bed3.bed.gz: annot_3 # Do not use underscores like this. Users can use 'annot3' instead.
  bed4.bed.gz: annot4

The files should be located inside ANNOTATION_DATA_DIR. For preparation step, these files should be indexed using tabix.

As an example, users can sort and index their bed file like below.

cat bed1.bed | sort -k1,1V -k2,2n -k3,3n -t$'\t' | bgzip -c > sorted.bed1.bed.gz
tabix -p bed sorted.bed1.bed.gz

How to add or remove a gene set

Modify the txt file used for gene set (GENE_MATRIX).

The gene set contains gene ID, gene name and columns that represents each set of genes. To add or remove gene sets, users can add or remove columns.

gene_id	gene_name	ProteinCoding
ENSG00000000003.15	TSPAN6	1
ENSG00000000005.6	TNMD	1
ENSG00000000419.14	DPM1	1

Then, configure again.

cwas configuration -f

If users already have the annotated VCF (*annotated.vcf.gz) and gene set is the only thing they changed, then they can start from categorization step. Gene sets are first used in the categorization step.

How to add or remove a functional annotation or score

Modify the ANNOTATION_KEY_CONFIG yaml file. Add a new line or remove previous lines from the file.

Then, configure again.

cwas configuration -f

The annotation of CWAS-Plus2 contains two steps: (1) VEP annotation, (2) BED custom annotation. If the output of each step already exists, then the step is skipped.

If users already have the VEP annotated file (*.vep.vcf.gz), they can start from annotation step.

CWAS-Plus2 skips VEP annotation if the VEP annotated file already exists.

cwas annotation -v INPUT.vcf -o_dir OUTPUT_DIR -p 8

However, before annotation, please remove the annotated VCF (*annotated.vcf.gz). If the annotated VCF exists, CWAS-Plus2 will also skip BED custom annotation step.