GenomeComb
Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)
cg command ?joboptions? ...
Several commands understand these job options
Several genomecomb commands (indicated by "supports job options" in their help) consist of subtasks (jobs) that can be run in parallel. If such command is interupted, it can continue processing from the already processed data (completed subtasks) by running the same command. The --distribute or -d option can be used to select alternative processing methods.
Without a -d option (or -d direct) the command will run jobs sequentially, stopping when an error occurs in a job. It will skip any job where the targets were already completed (and the files it uses as input were not changed after this completion). The command produces a logfile containing one line with information for each job (such as job name, status, starttime, endtime, duration, target files).
An analysis can be run faster by running multiple jobs at the same time using multiple cores/threads on one machine (by giving the number of cores/threads that may be used to the -d option) or nodes on a cluster managed by the grid engine (option sge)
Some jobs depend on results of previous jobs, and have to wait for these to finish first, so the entire capacity is not allways filled. If an error occurs in a job, the program will still continue with other jobs. Subsequent jobs depending on the results of the failed job will also fail, but others may run succesfully. (In this way -d 1 is different from -d direct).
The command used to start the analysis will produce a logfile (with the extension .submitting) while submitting the jobs. Each job gets a line in the logfile, and also produces a number of small files (runscript, logfile, error output) in directories called log_jobs. (The first column in the logfile refers to their location.) These small files are normally removed after a successfull analysis (but not when there were errors, as they may be useful for debugging). When the command is finished submitting, the logfile is updated and gets the extension .running. When the run is completed, the logfile is again updated and gets the extension .finished if no jobs had errors, or .error if it did. You can update the logfile while jobs are running using the command:
cg job_update logfile
Using -d number starts up to number jobs in separate processes managed by the submitting command. The submitting command must keep running untill all jobs are finished. it will output some progress, but you can get more info by updating and checking the log file.
Using "-d sge" the submitting command will submit all jobs to the Grid Engine as grid engine jobs with proper dependencies between the jobs. The program itself will stop when all jobs are submitted. You can check progress/errors by updating and checking the logfile.
List of all options supported by job enabled commands:
Format Conversion