A paper entitled "Error rate reduction through adaptive filtering improves variant detection in monyzygotic twin, tumor-normal and HapMap whole-genomes" is currently in revision. The abstract can be found below.
It remains challenging to accurately distinguish single nucleotide variants (SNVs) from errors in whole-genome sequences. Here, we describe a set of filters, together with a freely accessible software tool, to selectively reduce error rates in whole-genomes. By sequencing the near-identical genomes from a monozygotic twin and considering shared SNVs as ‘true variants’ and discordant SNVs as ‘errors’, we optimized thresholds for various filters and assessed which filters or combinations of filters were effective in terms of sensitivity and specificity. Cumulative application of all effective filters enabled identification of the first genetic differences in a monozygotic twin using whole-genome sequencing, whereas an adapted, less stringent set of filters reliably identified somatic mutations in a highly rearranged tumor. Analysis of the NA19240 HapMap genome relative to a reference set of SNVs revealed that filtering substantially reduced the number of errors. Overall, application of our filtering pipeline greatly facilitates variant detection in whole-genomes.
In our analyses, we compared replicate genomes both within one platform (i.e. monozygotic twins using Complete Genomics) and cross-platform (i.e. the Yoruban NA19240 genome sequenced using Illumina and CG technology). We rigidly assessed the effect of several filter types (intrinsic sequence quality, repetitive nature of the DNA, and complementary mapping and SNV calling algorithms) on the quality of the called SNVs and the portion of the genome masked by filtering. We used different quality metrics to do so:
In the Filter Selection pages you can choose which genomes to compare (i.e twins or NA19240), which which coverage stringency should be used (i.e. relaxed setting or minimal coverage depth of 10 versus stringent setting or minimak coverage depth of 20). After selecting the genomes to compare different filters can be applied manually, or certain predefined settings can be used.