Workshop for Variant Calling

Today we are going to:

Log into BioHPC

First, we will log into the log into a compute node,install a necessary package and then log into a compute node

cp /archive/nanocourse/May2018/shared/vc_calling_session2/* /archive/nanocourse/May2018/trainXX/
cd /archive/nanocourse/May2018/trainXX/

Run Germline Variant Calling Program Strelka 2

First we are going to run variant calling using a program called Stelka2 on 2 engineered cell line mixtures available from Horizon Genomics, HD728 and HD752. These sample were sequenced to ~1000X coverage.

Here are the commands that you will need to run Strelka2

module load strelka/2.9.0 samtools/1.6 manta/1.3.1
mkdir manta strelka
configManta.py --bam HD752.nanocourse.bam --bam HD728.nanocourse.bam --referenceFasta /project/shared/bicf_workflow_ref/GRCh38/genome.fa --runDir manta
manta/runWorkflow.py -m local -j 8
configureStrelkaGermlineWorkflow.py --bam HD752.nanocourse.bam --bam HD728.nanocourse.bam --referenceFasta /project/shared/bicf_workflow_ref/GRCh38/genome.fa --indelCandidates manta/results/variants/candidateSmallIndels.vcf.gz --runDir strelka
strelka/runWorkflow.py -m local -j 8
bcftools norm -c s -f /project/shared/bicf_workflow_ref/GRCh38/genome.fa -w 10 -O z -o Horizon.strelka2.vcf.gz strelka/results/variants/variants.vcf.gz

Run Somatic Variant Calling Program Shimmer

module load snpeff/4.3q shimmer/0.1.1 samtools/1.6  vcftools/0.1.14
shimmer.pl --minqual 25 --ref /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD752.nanocourse.bam HD728.nanocourse.bam --outdir shimmer 2> shimmer.err
module unload shimmer/0.1.1
vcf-annotate -n --fill-type shimmer/somatic_diffs.vcf | java -jar $SNPEFF_HOME/SnpSift.jar filter '(GEN[*].DP >= 10)'| bgzip > Horizon.shimmer.vcf.gz

Run Structural Variant Calling Program Delly

module load  delly2/v0.7.7-multi samtools/1.6 bedtools/2.26.0 snpeff/4.3q vcftools/0.1.14
delly2 call -t BND -o delly_translocations.bcf -q 30 -g /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD728.nanocourse.bam HD752.nanocourse.bam
delly2 call -t DUP -o delly_duplications.bcf -q 30 -g /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD728.nanocourse.bam HD752.nanocourse.bam
delly2 call -t INV -o delly_inversions.bcf -q 30 -g /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD728.nanocourse.bam HD752.nanocourse.bam
delly2 call -t DEL -o delly_deletion.bcf -q 30 -g /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD728.nanocourse.bam HD752.nanocourse.bam
delly2 call -t INS -o delly_insertion.bcf -q 30 -g /project/shared/bicf_workflow_ref/GRCh38/genome.fa HD728.nanocourse.bam HD752.nanocourse.bam
delly2 filter -t BND -o  delly_tra.bcf -f somatic -s samples.tsv delly_translocations.bcf
delly2 filter -t DUP -o  delly_dup.bcf -f somatic -s samples.tsv delly_duplications.bcf
delly2 filter -t INV -o  delly_inv.bcf -f somatic -s samples.tsv delly_inversions.bcf
delly2 filter -t DEL -o  delly_del.bcf -f somatic -s samples.tsv delly_deletion.bcf
delly2 filter -t INS -o  delly_ins.bcf -f somatic -s samples.tsv delly_insertion.bcf
bcftools concat -a -O v delly_dup.bcf delly_inv.bcf delly_tra.bcf delly_del.bcf delly_ins.bcf| vcf-sort -t temp > delly.vcf

Run the Germline Program on Astrocyte

While you are running the pipeline on the command line, you can run the full analysis pipelines using our point and click workflows on bioHPC!

  1. Go to Astrocyte
  2. Create a new project test
  3. Add data to your project including fastq and design file
  4. Start Variant Germline Workflow using the Fastq files and the germline.design.txt file

Examine the BAM File in IGV

module load IGV/2.3.90
sh igv.sh 

You can open the BAM file in IGV. Open IGV -- Load the Genome using: Genomes->Load from Server->Select Human (hg38) Next upload the BAM files using File->Load From File

Try to answer the following questions:

Load VCF into IGV

Compare your visual results to the variants identified manually from these position by loading the VCF file into IGV

Load BAMs and Variants into Iobio