GenomeComb



Genomecomb moved to github on https://github.com/derijkp/genomecomb with documentation on https://derijkp.github.io/genomecomb. For up to date versions, go there. These pages only remain here for the data on the older scientific application (or if someone really needs a long obsolete version of the software)

GenomeComb

Installation

System Requirements

Binaries for genomecomb are now only available on 64bit Linux systems (Linux-x86_64).

In order to be compatible with older systems, all binaries have been cross-compiled using glibc-2.4. Systems with an older glibc are not compatible (and should probably be updated anyway ...)

Install program

The different steps of an install are explained below. At the bottom an example install on the commandline is given. This example also downloads the packages on the commandline, so you do not need to download them first.

Download genomecomb

Download from the following link

Unpack

Unpacking the bundle will create a directory that includes most dependencies. The executable cg must stay in this directory (it finds the dependencies based on its dir). You can add the directory to the PATH, or use a softlink to make it available in a bin directory. e.g.:

ln -s /opt/genomecomb-0.98.8-Linux-x86_64/cg /usr/local/bin/cg

Download databases

For some analysis, such as annotation, you need several databases. These have to be available in a directory you will give as a parameter (refdir). You can download these for the hg19 and hg38 builds of the human genome:

Install by unpacking in a directory
As the CADD based annotation database is very large, it is provided separately. You can install the download by unpacking directly in the relevant db directory (the one you unpacked before)

External software

Some of the externally developed software used by genomecomb is included in the distribution (e.g. samtools, fastq-mcf). Others need to be installed on the system (if used): freebayes can be used if it is installed in one of the directories in the PATH (i.e. you can run it by typing freebayes.) genomecomb will search in the PATH for a directory named gatk and picard, containing the GATK and PICARD jar files) . You can explicitely locate the directories using environment variables, e.g.:

export GATK=/opt/bio/GenomeAnalysisTK-3.6
export PICARD=/opt/bio/picard-tools-1.87

Example files

To be able to demonstrate selected queries, we have made our annotations of the comparison of chr 21 and 22 of the NA19238, NA19239 and NA19240 individuals ( NA19240 sequenced with different technologies) available: dataset_howto_query.tar.gz

Example install

The following commands will install genomecomb and the hg19 reference databases in the /opt/genomecomb directory.

mkdir /opt/genomecomb
cd /opt/genomecomb

wget http://sourceforge.net/projects/genomecomb/files/genomecomb-0.98.8-Linux-x86_64.tar.gz
tar xvzf genomecomb-0.98.8-Linux-x86_64.tar.gz
ln -s /opt/genomecomb/genomecomb-0.98.8-Linux-x86_64/cg /usr/local/bin/cg

wget http://genomecomb.bioinf.be/download/refdb_hg19-0.98.8.tar.gz
tar xvzf refdb_hg19-0.98.8.tar.gz
wget http://genomecomb.bioinf.be/download/refdb_hg19-cadd-0.98.8.tar.gz
cd /opt/genomecomb/hg19
tar xvzf ../refdb_hg19-cadd-0.98.8.tar.gz
cd /opt/genomecomb

wget http://genomecomb.bioinf.be/download/dataset_howto_query.tar.gz
tar xvzf dataset_howto_query.tar.gz

# delete downloaded files
rm genomecomb-0.98.8-Linux-x86_64.tar.gz
rm refdb_hg19-0.98.8.tar.gz refdb_hg19-cadd-0.98.8.tar.gz
rm dataset_howto_query.tar.gz

Build databases (optional)

Some databases are not included in the download because of the size or because they may not be redistributed (dbnsfp). The genomecomb bundle contains scripts that can be used to download the source data and build the databases yourself. This is a process that takes a lot of time and space, and is not necessary (most important dbs are included in the download)

The databases will be automatically created in the directory /complgen/refseq/hg38 or /complgen/refseq/hg19. You can change the location by editing the script. The scripts can be adapted to add other UCSC tracks as well as custom or in-house database annotations or to create a database for another build.