GenomeComb

liftover

Format

cg liftover variantfile resultfile liftoverfile

Summary

Use liftover to convert a variant file from one build to another

Description

Converts a variantfile from one build to another based on a liftover file. The liftover file is a tab separated file indicating which regions in the source genome (chromosome,begin,end,strand) shoud be converted to which regions in the destination genome (destchromosome,destbegin,destend,deststrand). If a UCSC liftover chainfile is given as a liftover file, it is converted into a liftover file using the command "cg liftchain2tsv chainfile liftoverfile".

cg liftover tries to correct the variants if there are changes in strand or reference. It will adapt the fields ref, alt, sequenced*, zyg* according to the changes. It uses the file <liftoverfile base>.refchanges.tsv for this. This file must contain the differences in reference sequence between the source and destination genome, and can be generated using the command "cg liftfindchanges srcgenome destgenome liftoverfile" if not available.

A liftover conversion is likely to cause loss of information: not all variants/regions can be properly converted. Variants that were dropped in the conversion, will be available in resultfile.unmapped. Regions that were not present in the source genome will obviously not have variants.

Arguments

variantfile
file in sft format (with variant data)
resultfile
resulting converted file
liftoverfile
liftover file guiding the conversion. Liftover files are normally in tsv format; If a (ucsc liftover) chainfile is given, it will be converted, saved next to the chain file and used. e.g. hg18/hg18ToHg19.over.tsv to convert from hg18 to hg19, and hg19/hg19ToHg18.over.tsv to convert from hg19 to hg18

Options

-regionfile file
a regionfile is a tsv file indicating which regions of the genome are actually sequenced. If given, it is used to add changes in the reference sequence in sequenced regions to variants. If variantfile is a multisample file, the regionfile must also be multisample (columns name sreg-sample or sample indicating for each region if is sequenced in the specific sample)
-correctvariants 0/1
normally cg liftover tries to correct variant data as far as possible, use 0 to skip this correction
-split 0/1
indicate if variantfile (and resultfile) is split (multiline alt allele snps) or not (1 multiallelic line)

Category

Conversion