Workshop for Variant Prioritization

Today we are going to:

Find Genes associated with a Phenoype using the NCBI Tools

Find Variants of a gene of Interest

Find Variants Associated with a Disease

Filtering Variants

Solving a rare disease case from a trio

A patient presented at the hospital with hyperammonemia after giving birth. The patient and her parents have been sequenced. The VCF (Hg19) file provided for you.

Hint, this is a compound het with one mutation not in the coding region

Can you find out what might be the causal variant?

Identifying disease causing muations in paired cancer sample

A patient presented at the hospital with . The patient's tumor has been sequenced. The VCF (HG38) file has been provided for you.

Log into BioHPC

First, we will log into the a compute node

cp -r /archive/nanocourse/genome_analysis/shared/session3 .
cd session3

Practice VCF manipulation skills with BCFtools

We are going to practice VCF manipulation skills on VCF from CEPH family 1463 with 17 members.

module load bcftools htslib/gcc/1.8
bgzip -c ceph1463.vcf > ceph1463.vcf.gz

The "-c" option write on standard output, keep original files unchanged.
If you don't want to keep the original file, do:

bgzip ceph1463.vcf

If you want to decompress do

bgzip -d ceph1463.vcf.gz
tabix ceph1463.vcf.gz
mkdir vcf_playground
bcftools --help
bcftools query --help
bcftools stats --help
bcftools filter --help
bcftools view --help

We will try out some of these tools in the following commands, you may refer to the documentation to understand the options we will be using.

bcftools query -l ceph1463.vcf.gz
bcftools stats ceph1463.vcf.gz > vcf_playground/ceph1463.stats.out
less vcf_playground/ceph1463.stats.out
bcftools query -f '%CHROM\t%POS\t[%GT ]\n' ceph1463.vcf.gz | less -S
bcftools filter --targets 10:96447911-96613017 ceph1463.vcf.gz | less
zcat ceph1463.vcf.gz | grep -v "^#" |wc -l
bcftools filter --SnpGap 10 ceph1463.vcf.gz | grep -v "^#" | wc -l
bcftools view --samples NA12891,NA12892,NA12878 ceph1463.vcf.gz -O z -o vcf_playground/maternal_family.vcf.gz
bcftools query -l vcf_playground/maternal_family.vcf.gz
bcftools filter -i 'TYPE="indel"' vcf_playground/maternal_family.vcf.gz -O z -o vcf_playground/maternal_indels.vcf.gz
bcftools stats vcf_playground/maternal_indels.vcf.gz | grep -A 8 "SN, Summary numbers:"
bcftools query -f '%CHROM\t%POS\t[%GT ]\n' vcf_playground/maternal_indels.vcf.gz | less -S
bcftools query -f '[%GT ]\n' vcf_playground/maternal_indels.vcf.gz|grep '0/0 0/0 0/0' | wc -l
bcftools view --private -s NA12891,NA12892,NA12878 vcf_playground/maternal_indels.vcf.gz -O z -o vcf_playground/maternal_only_indels.vcf.gz
bcftools query -f '[%GT ]\n' vcf_playground/maternal_only_indels.vcf.gz | grep '0/0 0/0 0/0' | wc -l
tabix vcf_playground/maternal_family.vcf.gz
tabix -l vcf_playground/maternal_family.vcf.gz
bcftools view vcf_playground/maternal_family.vcf.gz --targets ^X -O z -o vcf_playground/maternal_family_autosomes.vcf.gz
tabix vcf_playground/maternal_family_autosomes.vcf.gz
tabix -l vcf_playground/maternal_family_autosomes.vcf.gz
bcftools view vcf_playground/maternal_family.vcf.gz --targets X -O z -o vcf_playground/maternal_family_X.vcf.gz
tabix vcf_playground/maternal_family_X.vcf.gz
tabix -l vcf_playground/maternal_family_X.vcf.gz
bcftools concat vcf_playground/maternal_family_autosomes.vcf.gz vcf_playground/maternal_family_X.vcf.gz -O z -o vcf_playground/maternal_family_recombined.vcf.gz
tabix vcf_playground/maternal_family_recombined.vcf.gz
tabix -l vcf_playground/maternal_family_recombined.vcf.gz