Genomecomb moved to github on https://github.com/derijkp/genomecomb
with documentation on https://derijkp.github.io/genomecomb.
For up to date versions, go there. These pages only remain here for the data on the older scientific
application (or if someone really needs a long obsolete version of the software)
Process_mastr
Format
cg project_mastr ?options? mastrdesigndir mastrprojectdir dbdir
Summary
process an mastr sequencing project, results in genomecomb project
dir (mastrprojectdir) with multicompar, etc.
Description
This command is depricated. The same functionality is available
through the more generic process_project.
Arguments
- mastrdesigndir
- directory with data about the mastr design. The mastrname is
the root name of the mastrdesigndir (e.g. the mastername of
mastrdesigndir test.mastr would be test). This directory should
contain at least one file named amplicons-<mastrname>.tsv
that contains the mastr design. This should be a tab separated file
with at least the columns: name, chromosome, begin, end showing the
positions of the amplicons. If the fields primer1_end and
primer2_begin are also given, the primers can be properly clipped.
- mastrprojectdir
- genomecomb style project directory that will be made
containing a directory (with bams, var lists, ...) per sample, and
a compar dir containing multicompar data on the project
- dbdir
- directory containing reference data (genome sequence,
annotation, ...)
Options
- -c 1/0
- cleanup files produced in processing that are no longer
needed after finishing
- -split 1/0
- split multiple alternative genotypes over different lines
- -a aligner
- alignment program used for mapping reads (bwa or bowtie2)
- -m maxopenfiles (-maxopenfiles)
- The number of files that a program can keep open at the same
time is limited. pmulticompar will distribute the subtasks thus,
that the number of files open at the same time stays below this
number. With this option, the maximum number of open files can be
set manually (if the program e.g. does not deduce the proper limit,
or you want to affect the distribution).
This command can be distributed on a cluster or using multiple
with job options (more info with cg
help joboptions)
Dependencies
Some of the programs needed in this workflow are not distributed
with genomecomb. gatk and picard should be installed separately.
Their installation location can be given using the environment
variables GATK and PICARD. These should point to the installation
directory that contains the jar files. If these environment variables
are not set, a directory named gatk and picard will be searched in
the PATH.
Example
export GATK=/opt/bio/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar
export PICARD=/opt/bio/picard-tools-1.87
cg process_conv_illmastr 130625_M01318_0013_000000000-A546C testproject
cg process_mastr -d 2 test.mastr testproject /complgen/refseq/hg19
Category
Depricated