GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

gatk based variant file

Variant files created by genomecomb using the GATK variant caller are in the usual genomecomb tab-separated variant file format (tsv). besides the usual fields of a variant tsv file, they (can) also contain the following fields:

totalcoverage
Total Depth, counting all reads (DP in vcf INFO)
coverage
Read Depth, counting only filtered reads used for calling, and only from one samples (DP in vcf FORMAT)
genoqual
genotype quality, encoded as a phred quality -10log_10p(genotype call is wrong)
haploqual
haplotype qualities, two phred qualities comma separated
phased
1 if genotype is phased, 0 otherwise
genotypes
list of allele numbers (0 for ref, 1 for first alt, ...) (GT in vcf) The alleles are comma (,) separated for phased genotypes (| in vcf), and ";" separated for unphased (/ in vcf). In contrast to alleleSeq1 and alleleSe2, this field allows for ploidies other than 2
alleledepth
Allelic depths for the ref and alt alleles in the order listed
PL
Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic
allelecount
predicted allele count in genotypes, for each alt allele, in the same order as listed
frequency
predicted allele frequency for each alt allele in the same order as listed
totalallelecount
predicted total number of alleles in called genotypes
BaseQRankSum
Phred-scaled p-value From Wilcoxon Rank Sum Test of Alt Vs. Ref base qualities
DS
Were any of the samples downsampled?
Dels
Fraction of Reads Containing Spanning Deletions
FS
Phred-scaled p-value using Fisher's exact test to detect strand bias
HaplotypeScore
Consistency of the site with at most two segregating haplotypes
InbreedingCoeff
Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation
MLEAC
Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed
MLEAF
Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed
MQ
RMS Mapping Quality
MQ0
Total Mapping Quality Zero Reads
MQRankSum
Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities
NDA
Number of alternate alleles discovered (but not necessarily genotyped) at this site
QD
Variant Confidence/Quality by Depth
RPA
Number of times tandem repeat unit is repeated, for each allele (including reference)
RU
Tandem repeat unit (bases)
ReadPosRankSum
Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias
STR
Variant is a short tandem repeat
cluster
variant is in a region containing many closeby variant calls