GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

Joboptions

Format

cg command ?joboptions? ...

Summary

Several commands understand these job options

Description

Several genomecomb commands (indicated by "supports job options" in their help) consist of subtasks (jobs) that can be run in parallel. If such command is interupted, it can continue processing from the already processed data (completed subtasks) by running the same command. The --distribute or -d option can be used to select alternative processing methods.

direct run

Without a -d option (or -d direct) the command will run jobs sequentially, stopping when an error occurs in a job. It will skip any job where the targets were already completed (and the files it uses as input were not changed after this completion). The command produces a logfile containing one line with information for each job (such as job name, status, starttime, endtime, duration, target files).

distributed run

An analysis can be run faster by running multiple jobs at the same time using multiple cores/threads on one machine (by giving the number of cores/threads that may be used to the -d option) or nodes on a cluster managed by the grid engine (option sge)

Some jobs depend on results of previous jobs, and have to wait for these to finish first, so the entire capacity is not allways filled. If an error occurs in a job, the program will still continue with other jobs. Subsequent jobs depending on the results of the failed job will also fail, but others may run succesfully. (In this way -d 1 is different from -d direct).

The command used to start the analysis will produce a logfile (with the extension .submitting) while submitting the jobs. Each job gets a line in the logfile, and also produces a number of small files (runscript, logfile, error output) in directories called log_jobs. (The first column in the logfile refers to their location.) These small files are normally removed after a successfull analysis (but not when there were errors, as they may be useful for debugging). When the command is finished submitting, the logfile is updated and gets the extension .running. When the run is completed, the logfile is again updated and gets the extension .finished if no jobs had errors, or .error if it did. You can update the logfile while jobs are running using the command:

cg job_update logfile

Using -d number starts up to number jobs in separate processes managed by the submitting command. The submitting command must keep running untill all jobs are finished. it will output some progress, but you can get more info by updating and checking the log file.

Using "-d sge" the submitting command will submit all jobs to the Grid Engine as grid engine jobs with proper dependencies between the jobs. The program itself will stop when all jobs are submitted. You can check progress/errors by updating and checking the logfile.

Options

List of all options supported by job enabled commands:

-d value (--distribute)
distribute method, see in description (also -distribute)
-dpriority value
(only for sge) run jobs with the given priority (default is 0)
-dqueue value
(only for sge) Use this option if you want to run on another queue than the default all.q
-dcleanup success/never/allways
Allows you to keep all run/log files in the log_jobs dirs even if a jub is successfull (never), or to allways delete them (allways) instead of only deleting them after a succesfull run (success)
-dremove 0/1
remove logfiles from older runs (relevant info is included in new one, default 0)
-v number (--verbose)
This general option has a big effect on job enabled commands: If it is > 0, they will output a lot of info (to stder) on dependencies, jobs started, etc.
-f 0/1 (--force)
use -f 1 to rerun all jobs, even if some were actually already successfully finished (also -force)
--skipjoberrors 0/1
Errors in one job cause the entire command to stop (in direct mode); use --skipjoberrors 1 to skip over these errors and complete other jobs (as far as they are not dependent on the the job with an error) (also -skipjoberrors)

Category

Format Conversion