Genomecomb moved to github on https://github.com/derijkp/genomecomb
with documentation on https://derijkp.github.io/genomecomb.
For up to date versions, go there. These pages only remain here for the data on the older scientific
application (or if someone really needs a long obsolete version of the software)
Headers
This describes the fields in the example file (and most files
coming out of genomecomb analysis) shortly. Some of these fields are
described more extensively in format_tsv
Basic variant fields
- chromosome chromosome number
- begin start position of variant (zero based)
- end end position of variant (zero based)
- type single nucleotide variants (snp), short
insertions (ins), deletions (del) or substitutions (sub), or the
combinations of two different variant types at heterozygous
positions (e.g. del_snp).
- ref The reference allele at this position
- alt The alternative allele at this position
Sample specific for cg samples
- sequenced-cg-cg-testNA19238chr2122cg Position sequenced
by CG/CG? ("u" = unsequenced, "v"=variant,
"r"=reference)
- zyg-cg-cg-testNA19238chr2122cg
- alleleSeq1-cg-cg-testNA19238chr2122cg First allele
called in CG genome.
- alleleSeq2-cg-cg-testNA19238chr2122cg Second allele
called in CG genome.
- totalScore1-cg-cg-testNA19238chr2122cg Variant score
at first allele.
- totalScore2-cg-cg-testNA19238chr2122cg Variant score
at second allele.
- coverage-cg-cg-testNA19238chr2122cg Coverage depth at
this position by CG.
- refscore-cg-cg-testNA19238chr2122cg Reference score at
this position.
- refcons-cg-cg-testNA19238chr2122cg in a poorly called
region (accoring to CG)
- cluster-cg-cg-testNA19238chr2122cg Presence within a
cluster of SNVs by CG
The same fields are present for MA19239 and NA19240
Sample specific for gatk analysis
- sequenced-gatk-rdsbwa-testNA19240chr21il sequenced by
Illumina/GATK? ("u" = unsequenced, "v"=variant,
"r"=reference)
- zyg-gatk-rdsbwa-testNA19240chr21il zygosity according
to GATK (m=homozygous, t=heterozygous, c=compound, o=other,
r=reference, u=unsequenced/unspecified)
- alleleSeq1-gatk-rdsbwa-testNA19240chr21il First allele
called in Illumina genome by GATK.
- alleleSeq2-gatk-rdsbwa-testNA19240chr21il Second
allele called in Illumina genome by GATK.
- quality-gatk-rdsbwa-testNA19240chr21il Quality score
for this position as called by GATK on Illumina genome.
- phased-gatk-rdsbwa-testNA19240chr21il order of
genotypes in alleleSeq1 and alleleSeq2 is significant (phase is
known)
- genotypes-gatk-rdsbwa-testNA19240chr21il list of
genotypes, can contain more than 2
- alleledepth_ref-gatk-rdsbwa-testNA19240chr21il depth
of reference allele
- alleledepth-gatk-rdsbwa-testNA19240chr21il depth of
alternative alleles
- coverage-gatk-rdsbwa-testNA19240chr21il Coverage depth
at this position by by Illumina/GATK.
- genoqual-gatk-rdsbwa-testNA19240chr21il quality of the
genotypes
- PL-gatk-rdsbwa-testNA19240chr21il
- BaseQRankSum-gatk-rdsbwa-testNA19240chr21il
- totalcoverage-gatk-rdsbwa-testNA19240chr21il
- DS-gatk-rdsbwa-testNA19240chr21il
- Dels-gatk-rdsbwa-testNA19240chr21il
- FS-gatk-rdsbwa-testNA19240chr21il
- HaplotypeScore-gatk-rdsbwa-testNA19240chr21il
- MQ-gatk-rdsbwa-testNA19240chr21il
- MQ0-gatk-rdsbwa-testNA19240chr21il
- MQRankSum-gatk-rdsbwa-testNA19240chr21il
- QD-gatk-rdsbwa-testNA19240chr21il
- ReadPosRankSum-gatk-rdsbwa-testNA19240chr21il
- SOR-gatk-rdsbwa-testNA19240chr21il
- cluster-gatk-rdsbwa-testNA19240chr21il Presence within
a cluster of SNVs by Illumina/GATK (if yes "cl", if no
"")
Sample specific for samtools analysis
- sequenced-sam-rdsbwa-testNA19240chr21il sequenced by
Illumina/samtools? ("u" = unsequenced,
"v"=variant, "r"=reference)
- zyg-sam-rdsbwa-testNA19240chr21il zygosity according
to samtools (m=homozygous, t=heterozygous, c=compound, o=other,
r=reference, u=unsequenced/unspecified)
- alleleSeq1-sam-rdsbwa-testNA19240chr21il First allele
called in Illumina genome by samtools.
- alleleSeq2-sam-rdsbwa-testNA19240chr21il Second allele
called in Illumina genome by GATK.
- quality-sam-rdsbwa-testNA19240chr21il Quality score
for this position as called by GATK on Illumina genome.
- phased-sam-rdsbwa-testNA19240chr21il order of
genotypes in alleleSeq1 and alleleSeq2 is significant (phase is
known)
- genotypes-sam-rdsbwa-testNA19240chr21il list of
genotypes, can contain more than 2
- genoqual-sam-rdsbwa-testNA19240chr21il genotype
quality
- loglikelihood-sam-rdsbwa-testNA19240chr21il
- coverage-sam-rdsbwa-testNA19240chr21il Coverage depth
at this position by by Illumina/GATK.
- DV-sam-rdsbwa-testNA19240chr21il
- SP-sam-rdsbwa-testNA19240chr21il
- PL-sam-rdsbwa-testNA19240chr21il
- totalcoverage-sam-rdsbwa-testNA19240chr21il
- DP4-sam-rdsbwa-testNA19240chr21il
- MQ-sam-rdsbwa-testNA19240chr21il
- FQ-sam-rdsbwa-testNA19240chr21il
- AF1-sam-rdsbwa-testNA19240chr21il
- AC1-sam-rdsbwa-testNA19240chr21il
- IS-sam-rdsbwa-testNA19240chr21il
- PV4-sam-rdsbwa-testNA19240chr21il
- PC2-sam-rdsbwa-testNA19240chr21il
- QBD-sam-rdsbwa-testNA19240chr21il
- RPB-sam-rdsbwa-testNA19240chr21il
- MDV-sam-rdsbwa-testNA19240chr21il
- VDB-sam-rdsbwa-testNA19240chr21il
- cluster-sam-rdsbwa-testNA19240chr21il
Annotation fields
- intGene_impact impact on gene transcripts (can be list)
of intGene gene set (integration of refGene, knownGene, genecode,
ensGene)
- intGene_gene gene name
- intGene_descr description of transcript and location
of variant in transcript in several ways (including hgvs)
- lincRNA_impact impact on long non-coding genes
- lincRNA_gene long non-coding gene name
- lincRNA_descr description of transcript and location
of variant in long non-coding transcript
- refGene_impact impact on refGene genes
- refGene_gene name of refGene genes
- refGene_descr list of refGene transcripts and
description of location and effect of variant on the transcript
- mirbase20_impact impact on mirbase20 miRNA
- mirbase20_mir mirbase20 miRNA variant is located in
- chainSelf Presence in self-chained region (yes =
"label from UCSC", no = "")
- cytoBand
- dgvMerged
- evofold
- gad
- genomicSuperDups Presence in segmental duplication
(yes = "label from UCSC", no = "")
- gwasCatalog_name
- gwasCatalog_score
- homopolymer_base
- homopolymer_size
- microsat Presence in a microsatelite
- oreganno
- phastConsElements46way_name
- phastConsElements46way_score
- phastConsElements46wayPlacental_name
- phastConsElements46wayPlacental_score
- phastConsElements46wayPrimates_name
- phastConsElements46wayPrimates_score
- rmsk Presence in a RepeatMasker region (yes =
"label from UCSC", no = "")
- simpleRepeat Presence in simple tandem repeat (yes =
"label from UCSC", no = "")
- targetScanS_name
- targetScanS_score
- tfbsConsSites_name
- tfbsConsSites_score
- tRNAs
- vistaEnhancers_name
- vistaEnhancers_score
- wgEncodeCaltechRnaSeq
- wgEncodeH3k4me1
- wgEncodeH3k4me3
- wgEncodeH3k27ac
- wgEncodeRegDnaseClusteredV3_name
- wgEncodeRegDnaseClusteredV3_score
- wgEncodeRegTfbsClusteredV3_name
- wgEncodeRegTfbsClusteredV3_score
- wgRna_name
- wgRna_score
- 1000g3
- cadd
- clinvar_acc
- clinvar_disease
- gnomad_max_freqp
- gnomad_nfe_freqp
- kaviar
- snp147
- snp147Common
- evs_ea_freqp
- evs_aa_freqp
- evs_ea_mfreqp
- evs_aa_mfreqp