GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

Select

Format

cg multiselect ?options? ?datafile? ...

Summary

Command for using cg select on multiple files, and combine the results

Description

cg multiselect performs a cg select on all given datafiles separately. It can output a separate result file for each datafile, or combine the results into one output file in various ways.

cg multiselect supports several (but not all) cg select options. For more info on thes options, refer to the cg select help.

Arguments

datafile
file to be scanned. File may be compressed.

Options

-q query
only lines fullfilling conditions in query will be written to outfile (see further)
-qf queryfile
only lines fullfilling conditions in queryfile will be written to outfile (see further)
-f fields
only write given fields to result.
-rf removefields
write all, except given fields to result.
-samples samples
Only the given list of samples (space separated) will be included in the output.
-s sortfields
sort on given fields (uses natural sort, so that e.g. 'chr1 chr2 chr10' will be sorted correctly)
-sr sortfields
sort on given fields in reverse order
-hc 0/1/2
is the header in the last comment line
-rc 0/1
remove comment
-samplingskip number
sample data, skipping number rows
-g groupfields
with this option a summary table is returned.
-gc groupcols
show other columns instead of count when using the -g option.
-o filename (-outfile)
Gives the name of the result file. If not given, output is written to stdout
-split 1/0
indicate if in the datafiles, multiple alternative genotypes are split over different lines (default 1)
-combine multicompar/cat/files
How to combine results (see further)

Combine results

The -combine option determines how the seperate cg select results are combined. Each query is done separately first before combining the result, so the query should be set up so that it can be run on each file separately and that the results are in the proper format to be combined. The following types of combination are supported:

multicompar

The results are combined into a multicompar file using cg multicompar. The result files must thus be variant files or multicompar files. The command does not get extra info (sreg, varall) on the files, so multicompar is run without reannot: If a variant line is not in the result file, the data for the samples coming from that file will be filled with unknown (?) completely.

cat

Results will be concatenated using cg cat with the merge option: If they have different headers, the final result contains data for each field that occurs in at least one of the source files. If a given field is not present in one of the source files, it will be empty for each line coming from this file.

This option can be useful for e.g. combining summary queries per sample

    cg multiselect -combine cat -g 'sample * sequenced v type *' tmp/multicompar1.tsv tmp/multicompar2.tsv

files

Create a separate resultfile for each datafile. Each resultfile will be named using the original filename with the filename given in the -o option as a prefix.

Category

Query