GenomeComb

Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

Process_mastr

Format

cg project_mastr ?options? mastrdesigndir mastrprojectdir dbdir

Summary

process an mastr sequencing project, results in genomecomb project dir (mastrprojectdir) with multicompar, etc.

Description

This command is depricated. The same functionality is available through the more generic process_project.

Arguments

mastrdesigndir
directory with data about the mastr design. The mastrname is the root name of the mastrdesigndir (e.g. the mastername of mastrdesigndir test.mastr would be test). This directory should contain at least one file named amplicons-<mastrname>.tsv that contains the mastr design. This should be a tab separated file with at least the columns: name, chromosome, begin, end showing the positions of the amplicons. If the fields primer1_end and primer2_begin are also given, the primers can be properly clipped.
mastrprojectdir
genomecomb style project directory that will be made containing a directory (with bams, var lists, ...) per sample, and a compar dir containing multicompar data on the project
dbdir
directory containing reference data (genome sequence, annotation, ...)

Options

-c 1/0
cleanup files produced in processing that are no longer needed after finishing
-split 1/0
split multiple alternative genotypes over different lines
-a aligner
alignment program used for mapping reads (bwa or bowtie2)
-m maxopenfiles (-maxopenfiles)
The number of files that a program can keep open at the same time is limited. pmulticompar will distribute the subtasks thus, that the number of files open at the same time stays below this number. With this option, the maximum number of open files can be set manually (if the program e.g. does not deduce the proper limit, or you want to affect the distribution).

This command can be distributed on a cluster or using multiple with job options (more info with cg help joboptions)

Dependencies

Some of the programs needed in this workflow are not distributed with genomecomb. gatk and picard should be installed separately. Their installation location can be given using the environment variables GATK and PICARD. These should point to the installation directory that contains the jar files. If these environment variables are not set, a directory named gatk and picard will be searched in the PATH.

Example

export GATK=/opt/bio/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar
export PICARD=/opt/bio/picard-tools-1.87
cg process_conv_illmastr 130625_M01318_0013_000000000-A546C testproject
cg process_mastr -d 2 test.mastr testproject /complgen/refseq/hg19

Category

Depricated