GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

long

Format

cg long ?infile? ?outfile?

Summary

Converts data in tsv format from wide format (data for each sample in separate columns) to long format (data for each sample in separate lines)

Description

Genomecomb usually expects data in tsv files in wide format, where some fields are common/general for all samples (e.g. chromosome, begin, end, .. for variants) and some fields have specific values for each sample (e.g. quality, coverage, ...). In wide format these columns are named by adding "-samplename" to the fieldname. In the long format, sample specific data is on separate lines (the common fields are repeated in each line). "cg long" converts from wide to long format. "cg wide" can be used for the reverse.

Arguments

infile
file to be converted, if not given, uses stdin. File may be compressed.
outfile
write results to outfile, if not given, uses stdout

Options

-norm 1/0
If 1, output "normalized" long tables: The normal output will contain only variant data with a numeric id, without the data specific for each sample. An extra tsv file is generated (<outfile>.sampledata.tsv) that contains the sample specific data, one line per sample (identified by the id) and sample (identified by a "sample" column)
-samplefields samplefields
Using this option, sample names that consist of several parts separated by - (as typical in genomecomb output, e.g. -gatk-rdsbwa-sample1) are split up in separate fields with the field names given by samplefields

Category

Format Conversion