GenomeComb

Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

Reference

Format

cg subcommand ?options? ....

Description

This help page gives a (reference style) overview of all genomecomb functions. For an introductory text to genomecomb and its formats, use

cg help intro

All genomecombs functions are called using the cg command and a subcommand. The available subcommands are listed on this page (in categories) with a short description. To get further info on how to use the subcommands and their parameters, use

cg help subcommand

or

cg subcommand -h

Options

The following options are generic and available for all subcommands. They must however always preceed the subcommand specific options.

-v number (--verbose)
Setting this to 1 or 2 (instead of the default 0) makes some subcommands chattier about their progress. At the given number is 1, logging messages are shown (warnings, start of subtask, etc.) If the number >= 2, progress counters are also shown (for commands that support them)
--stack 0/1
When the program returns an error, by default only the error message is shown, which is normally ok to show errors in input format, etc. if --stack is set to 1, a full stack trace is shown on error (which may be useful to solve errors caused by bugs in the program)

Available subcommands

Process

process_project
process a sequencing project directory (projectdir), generating full analysis information (variant calls, multicompar, reports, ...) starting from raw sample data from various sources.
process_multicompar
process a sequencing project directory. This expects a genomecomb directory with samples already processed and makes annotated multicompar data
process_reports
Calculates a number of statistics on a sample in the reports subdir
process_sample
Processes one sample directory (projectdir), generating full analysis information (variant calls, multicompar, reports, ...) starting from raw sample data that can come from various sources.
project_addsample
Add a sample directory to a project directory.
renamesamples
Converts the sample names in a file or entire directory to other names.

Query

select
Command for very flexible selection, sorting and conversion of tab separated files
viz
grahical vizualization of tsv files (even very large ones) as table without loading everything into memory
multiselect
Command for using cg select on multiple files, and combine the results
groupby
Group lines in a tsv file, based on 1 or more identical fields

Analysis

exportplink
make a plink "Transposed fileset" from the genome data
homwes
finds regions of homozygosity based on a variant file
homwes_compare
finds regions of homozygosity based on a variant file

Regions

multireg
Compare multiple regions files
regselect
select all regions or variants in region_file1 that overlap with regions in region_file2
covered
Find number of bases covered by regions in a region file
regcollapse
Collapses overlapping regions in one (sorted) or more region files
regcommon
list regions common between region_file1 and region_file2
regextract
Find regions with a minimum or maximum coverage in a bam, bcol or tsv file.
regjoin
join regions in 1 or 2 regionfiles
regsubtract
subtract regions in region_file2 from region_file1

Annotation

annotate
Annotate a variant file with region, gene or variant data
download_biograph
download_biograph gene rank data from biograph.
download_mart
Download data from biomart
geneannot2reg
Creates a (region) annotation file based on a genelist with annotations.

Validation

genome_seq
Returns sequences of regions in the genome (fasta file), optionally masked for snps/repeats
makeprimers
Make sequencing primers for Sanger validation experiments
makesequenom
Make input files for designing sequenom validation experiments
primercheck
check primer sets for multiple amplicons and snps
validatesv

tsv

cat
Concatenate tab separated files and print on the standard output (stripping headers of files other than the first).
checktsv
Checks the given tsv file for errors
collapsealleles
Convert a split variant file to unsplit by collapsing alleles of the same variant into one line.
graph
Visualization of large tsv files as a scatter plot
index
make indices for a tsv file
less
view a (possibly compressed) file using the pager less.
paste
merge lines of tab separated files.
split
split a tab separated file in multiple tab separated files based on the content of a (usually chromosome) field.
tsvjoin
join two tsv files based on common fields (must be sorted).

Conversion

sam_clipamplicons
Clip primers from aligned sequences in a sam file (by changing bases to N and quality to 0) given a set of target amplicons
liftover
Use liftover to convert a variant file from one build to another
liftregion
Use liftover to convert a region file from one build to another
liftsample
Use liftover to convert the files from a sample (variant, region, ...) from one build to another
liftchain2tsv
Converts data in UCSC chain file format (for liftOver) to a tsv (tab separated) format to be used in e.g. cg liftregion.
correctvariants
Complete or correct a variants file
bamreorder
Changes the order of the contigs/chromosomes in a bam file

Format Conversion

bam2fastq
Extracts reads from a bamfile into fastq. The actual extraction is based on picard's SamToFastq or samtools bam2fq, but the data is prepared first (ao. sorted by name to avoid the problems caused by position sorted fq files)
bam2reg
Extract regions with a minimum coverage from a bam file.
bcol
several commands for managing bcol files
bcol_get
Get a list of values from a bcol file
bcol_make
Creates a bcol file
bcol_table
Outputs the data in a bcol file as tab-separated
bcol_update
updates bcol file(s) in the old format to the new
bed2tsv
Converts data in bed format to tsv format
cg2sreg
Extracts sequenced region data from a Complete Genomics format file in tsv (tab separated, simple feature table) format. The command will also sort the tsv appropriately.
cg2tsv
Converts data in Complete Genomics format to tsv (tab separated, simple feature table) format. The command will also sort the tsv appropriately. The cggenefile is optional. It contains the CG gene annotations, which can be (best) left out as these annotations (and more) can be done in genomecomb.
clc2tsv
Converts output from the clc bio assembly cell snp caller to sft (simple feature table) format
colvalue
Converts data in tsv format from a long key-value format to a wide column-value format
convcgcnv
Convert the Complete Genomics CNV files to the cg format
convcgsv
Convert the Complete Genomics SV files to the cg format
csv2tsv
Converts comma-separated value (csv) data to tab-sparated (tsv)
gene2reg
Extract gene elements from data in genepred-like format to a (tab separated) region file.
gff2tsv
Converts data in gff format to gene sft (simple feature table) format
gtf2tsv
Converts data in gtf format to gene sft (simple feature table) format
keyvalue
Converts data in tsv format from wide format (data for each sample in separate columns) to keyvalue format.
long
Converts data in tsv format from wide format (data for each sample in separate columns) to long format (data for each sample in separate lines)
process_rtgsample
Convert rtg data to cg format
rtg2tsv
Converts data in rtg format to sft (simple feature table) format
tsv2bed
Converts data in tab-separated format (tsv) format to bed format. By default it will create a minimal bed file by extracting chromosome,begin and end from the input using the default fields.
vcf2tsv
Converts data in vcf format to genomecomb tab-separated variant file (tsv). The command will also sort the tsv appropriately.
wide
Converts data in tsv format from long format (data for each sample in separate lines) to wide format (data for each sample in separate columns)

Compare

multicompar
Compare multiple variant files
multicompar_reannot
Reannotate multicompar file
pmulticompar
Compare multiple variant files

Report

coverage_report
generates coverage report for region of interest A histo file is generated at the location of the bam file
depth_histo
makes a histogram of the depth in the given bamfile, subdivided in on- and oftarget regions
hsmetrics
Creates a hsmetrics file using picard CalculateHsMetrics
predictgender
predicts gender for a sample

Structural variants

process_sv
Do all steps on a Complete Genomics sample to generate structural variant calls
svmulticompar
Compare multiple structural variant files

Compression

bgzip
Compresses files using bgzip
job_update
updates the logfile of a command using joboptions
lz4
Compresses files using lz4
lz4ra
decompresses part of a lz4 compressed file
razip
Compresses files using razip
unzip
Decompresses
zcat
pipe contents of one or more (potentially compressed) files to stdout.

Dev

help
display help
qsub
submit a command to the cluster (grid engine).
sh
runs a shell in which genomecomb commands can be input and executed interactively.
source
runs a script (Tcl) with all commands/extensions from genomecomb available.
tsvdiff
compare tsv files

Tools

cplinked
create a copy of a directory where each file in it is a softlink to the src.
hardsync
creates a hardlinked copy of a directory in another location