GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

samtools based variant file

Variant files created by genomecomb using the samtools variant caller are in the usual genomecomb tab-separated variant file format (tsv). besides the usual fields of a variant tsv file, they (can) also contain the following columns:

totalcoverage
Raw read depth (DP in vcf INFO); samtools does not count all mapping reads for totalcoverage (bad reads are filtered out)
coverage
number of high-quality bases (DP in vcf FORMAT); bases with quality < 13 are filtered out as well
genoqual
genotype quality, encoded as a phred quality -10log_10p(genotype call is wrong)
phased
1 if genotype is phased, 0 otherwise
genotypes
list of allele numbers (0 for ref, 1 for first alt, ...) (GT in vcf) The alleles are comma (,) separated for phased genotypes (| in vcf), and ";" separated for unphased (/ in vcf). In contrast to alleleSeq1 and alleleSe2, this field allows for ploidies other than 2
loglikelihood
Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)
DV
number of high-quality non-reference bases
SP
Phred-scaled strand bias P-value
PL
List of Phred-scaled genotype likelihoods
DP4
number of high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases
MQ
Root-mean-square mapping quality of covering reads
FQ
Phred probability of all samples being the same
AF1
Max-likelihood estimate of the first ALT allele frequency (assuming HWE)
AC1
Max-likelihood estimate of the first ALT allele count (no HWE assumption)
totalallelecount
Total number of alleles in called genotypes
IS
Maximum number of reads supporting an indel and fraction of indel reads
allelecount
Allele count in genotypes for each ALT allele, in the same order as listed
G3
ML estimate of genotype frequencies
HWE
Chi^2 based HWE test P-value based on G3
CLR
Log ratio of genotype likelihoods with and without the constraint
UGT
The most probable constrained genotype configuration in the trio
CGT
P-values for strand bias, baseQ bias, mapQ bias and tail distance bias
PV4
P-values for strand bias, baseQ bias, mapQ bias and tail distance bias
INDEL
Indicates that the variant is an INDEL.
PC2
Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.
PCHI2
Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.
QCHI2
Phred scaled PCHI2
PR
number of permutations yielding a smaller PCHI2.
QBD
Quality by Depth: QUAL/#reads
RPB
Read Position Bias
MDV
Maximum number of high-quality nonRef reads in samples
VDB
Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. Note: this version may be broken.
cluster
variant is in a region containing many closeby variant calls