GenomeComb

Multicompar

Format

cg multicompar ?options? multicomparfile sampledir/varfile ...

Summary

Compare multiple variant files

Description

This command is used to create a multicompar file. This is a tab separated file containing a wide table used to compare variants between different samples. Each line contains general variant info (position, type, annotation, etc.), plus columns with variant info specific to each sample (genotype, coverage, etc.). The latter have a column heading of the form field-<source>. source will generally be the samplename, but may also be of the form method-samplename to indicate sequencing or analysis method, thus allowing the comparison of different methods on the same sample.

The source that is added after the fieldnames (-samplename, -method-samplename) is extracted from the filenames:

When some samples contain a variant that others do not, the information in the multicompar file cannot be complete by comparing variant lists alone: A variant missing in a variant list may either mean that it is reference or that it is not sequenced. Information about coverage, etc. will also be missing. multicompar will indicate this missing information with a questing mark in the table; this missing information can be added by using the -reannot option or the multicompar_reannot command, if it is available in other files. (check cg multicompar_reannot help for more info)

Fields starting with l: will be regarded as list dependend on different alt genotypes (a comma separated list). The -listfields option can be used to define columns to be treated this way.

Arguments

multicomparfile
resultfile, will be created if it does not exist
sampledir/varfile
directory or file containing variants of new sample to be added More than one can added in one command

Options

-reannot
Also do reannotation (see cg multicompar_reannot)
-reannotregonly
Also do reannotation, but only region data (sequenced) will be updated
-targetsfile variantfile
All variants in variantfile will be in the final multicompar, even if they are not present in any of the samples. A column will be added to indicate for each variant if it was in the targetsfile. The name of the column will be the part of the filename after the last dash. If variantfile contains a "name" column, the content of this will be used in the targets column instead of a 1.
-split
if 1, multiple alternative alleles will be on a separate lines, treated mostly as a separate variant Use 0 for the default (alternatives alleles are on the same line, separated by comma in the alt field)
-listfields listfields
fields given by listfields will be regarded as lists (comma separated) dependend on the different alt genotypes

Category

Compare