GenomeComb
Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)
cg pmulticompar ?options? multicomparfile sampledir/varfile ...
Compare multiple variant files
This command combines multiple variant files into one multicompar file: a tab separated file containing a wide table used to compare variants between different samples. Each line in the table contains general variant info (position, type, annotation, etc.) and columns with variant info specific to each sample (genotype, coverage, etc.). The latter have a column heading of the form field-<sample>.
The <sample> name used is extracted from the filename of the variant files by removing the file extension and the start of the filename up to and including the first -. (Variant files are expected to use the following convention: var-<sample>.tsv.) <sample> can be simply the samplename, but may also include information about e.g. the sequencing or analysis method in the form method-samplename, e.g. var-gatk-sample1.tsv and var-sam-sample1.tsv to indicate variants called using gatk and samtools respectively for the same sample (sample1), thus allowing comparison of different methods, etc.
When some samples contain a variant that others do not, the information in the multicompar file cannot be complete by combining variant files alone: A variant missing in one variant file may either mean that it is reference or that it is not sequenced. Other information about coverage, etc. on the variant not present in the variant file will also be missing. multicompar will try to complete this missing info based on files (if present) accompanying the variant file (in the same directory and having a similar filename). Folowing files can be used for completing multicompar data:
If data can not be completed using accompanying files, the missing information will be indicated by a questing mark as a value in the table.
Using the same command, you can add new samples to an existing multicompar file. pmulticompar needs the original variant files and accompanying files for the existing multicompar file. It can find these files (based on filename) easily if everything is organised in a projectdir (hint: You can add samples in another projectdir without duplication using symbolic links) pmulticompar will look however in various places relative to the multicompar file (same directory, samples directory in same dir, parent directory, samples directory in parent dir).
cg pmulticompar is a parallel and much faster version of multicompar. It also integrates the reannot step in one. It is more finicky about e.g. directory structure (needs to find the original sample variant files) and does not support all accompanying information files that the old one does.
This command can be distributed on a cluster or using multiple with job options (more info with cg help joboptions)
Compare