cg process_sv ?options? cgdir resultdir dbdir
Do all steps on a Complete Genomics sample to generate structural
cg process_sv detects structural variants based on discordant read
pairs in the Complete Genomics (CGI) mapping files. This structural
variant caller was developed on some of the earliest Complete
Genomics samples, and has become a bit superfluous since Complete
Genomics includes structural variant calls in their standard output.
There are some reasons that it may stil be useful:
- On the original data it was more sensitive, picking up smaller
deletions and some small insertions. However, changes in the
experimental setup of more recent runs have nullified this
advantage, leading to a great number of false positives (The
deletions < 600 and insertions that the CGI sv caller does not
detect have to be discarded anyway because of FDR).
- Analysing samples that did not yet have CGI calls
- The intermediate files provide a nice visualisation in
combination with cg_viz
cg process_sv successively runs the following steps:
- map2sv creates tab-separated files containing the
mapping locations of each mate pair sorted on the location of the
first mate and distributed per chromosome.
- tsv_index is used indexes these files.
- svfind is used for the actual detection of variants
from the mate-pair files, resulting in a sv prediction file per
- cg cat adds all of these together in one result file
- cg select is used to produce the final result file
(sv-samplename.tsv) by filtering out (very) low quality calls
- directory conataing the CGI mapping files
- "sample directory" where results will be written.
The resultfile will be named sv-samplename.tsv (where samplename is
the name of the resultdir). All intermediate data is stored in a
subdir sv in resultdir
- directory containing reference data (genome sequence,
- -force 0/1
- force full reanalysis even if some files already exist
supports job options (more info with cg help joboptions):
- -d number
- distribute subjobs of command over number processes
- -d sge
- use grid engine to distribute subjobs
cg process_sv GS27657-FS3-L01 NA19238cg /complgen/refseq/hg19