GenomeComb

long

Format

cg long ?infile? ?outfile?

Summary

Converts data in tsv format from wide format (data for each sample in separate columns) to long format (data for each sample in separate lines)

Description

Genomecomb usually expects data in tsv files in wide format, where some fields are common/general for all samples (e.g. chromosome, begin, end, .. for variants) and some fields have specific values for each sample (e.g. quality, coverage, ...). In wide format these columns are named by adding "-samplename" to the fieldname. In the long format, sample specific data is on separate lines (the common fields are repeated in each line). "cg long" converts from wide to long format. "cg_wide" can be used for the reverse.

Arguments

infile
file to be converted, if not given, uses stdin. File may be compressed.
outfile
write results to outfile, if not given, uses stdout

Options

-norm 1/0
If 1, output "normalized" long tables: The normal output will contain only variant data with a numeric id, without the data specific for each sample. An extra tsv file is generated (<outfile>.sampledata.tsv) that contains the sample specific data, one line per variant (identified by the id) and sample (identified by a "sample" column)

Category

Conversion