Categorization

The variants are categorized according to the annotation results. To count the number of variants that falls into a category in each sample, CWAS-Plus2 checks whether a variant is annotated to an annotation term (for example, whether a variant is annotated to a gene that is one of disease risk genes). As these annotation terms are classified into five major groups, the combination of terms from each group results in a single category. While categorization, CWAS-Plus2 excludes the redundant categories that share the exact same variants (such as, missense variants that fall into a ‘single-nucleotide variant (SNV)’ term and ‘All (SNV and insertion-deletion)’ term are the same, as all missense variants are SNVs).

The parameters of the command are as below:

-i, –input_file: Path to the annotated VCF, resulted from annotation process. This file contains a specific pattern of .annotated.vcf in the file name. This file could be gzipped or not.
-o_dir, –output_directory: Path to the directory where the output files will be saved. By default, outputs will be saved at $CWAS_WORKSPACE.
-p, –num_proc: Number of worker processes that will be used for the categorization process. To prevent crashes caused by insufficient RAM when processing large input VCF files (e.g., over 10 million variants) using multiple cores, using small number of cores and monitoring the memory usage are recommended. By default, 1.

cwas categorization -i INPUT.annotated.vcf.gz -o_dir OUTPUT_DIR -p 8