GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

process_reports

Format

cg process_reports ?options? sampledir ?dbdir? ?reports?

Summary

Calculates a number of statistics on a sample in the reports subdir

Description

cg process_reports commands calculated a number of statistics on a sample and stores these in the subdir reports of the sample. If a type of report cannot be made (e.g. fastqstats if there are no fastqs for the sample), it will be skipped. Most reports are functional rather than fancy: a tsv file with sample, source (i.e. program used to make them), parameter and value

Following report types can be selected:

fastqstats: stats about number, length, quality, ... of reads in the fastq files made using fastq-stats (files report_fastq_fw-source.tsv and report_fastq_rev-sample.tsv) ; fastqc: fastqc analysis with graphs etc. per fastq file (in the fastqc subdir) ; flagstat_reads: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_reads-source.tsv) based on primary alignments only (so counts reads) ; flagstat_alignmments: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_alignments-source.tsv). This counts alignments, not reads (includes secondary alignments) ; histodepth: create a histogram of the sequencing depth. If a targetfile is present, histograms for on- and off-target regions will be separated. The report_* version contains coverage statistics at various depth cutoffs ; vars: number of variants, quality variants ($coverage >= 20 and $quality >= 50), etc in the various var files (file report_vars-source.tsv) ; hsmetrics: picard hsmetrics analysis of target coverage ; covered: how much bases are covered in the various region files (5x coverage, 20x coverage, gatk sequenced, ...), oper chromosome and in total (file report_covered-source.tsv) ; histo: "histogram" of coverage (file crsbwa-sample.histo) ; predictgender: predict gender of a sample

Arguments

sampledir
processed sample directory (containing variant files, bam files, etc.)
dbdir
directory containing reference data (genome sequence, annotation, ...)
reports
select wich reports to make (can also be given as an option). You can use basic (default) to add all basic reports (currently all except predictgender), or all to create all report types.

Options

-dbdir dbdir
dir containing the reference databases
-r reports (also --reports)
list of reports to make

This command can be distributed on a cluster or using multiple threads with job options (more info with cg help joboptions)

Category

Process