Basic Introduction

Method

THOR is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates. THOR performs genomic signal processing, peak calling and p-value calculation in an integrated framework.

More information please refer to:

Allhoff, M., Sere K., Freitas, J., Zenke, M.,  Costa, I.G. (2016), Differential Peak Calling of ChIP-seq Signals with Replicates with THOR, Nucleic Acids Research, epub gkw680 [paper][supp].

Download & Installation

To install the latest version of THOR, we recommend you use pip:

pip install RGT --user

Further installation instructions, including installation without pip,  are found here.

Important: THOR uses wigToBigWig, bedGraphToBigWig and bigWigMerge from the UCSC utilities website to handle BigWig files. THOR therefore automatically copies these tools in your bin directory. Note: The tools THOR copies are for 64bit linux system or macsOS. If you have another system (32bit), please copy the appropriate binaries by yourself and make sure that they are contained in your $PATH variable.

Example

You can find here example files containing reads aligned to selected genomic regions from a B Cell lymphoma study. These files are from two donors with follicular lymphoma (FL) and two control samples (CC). We are interested in the differential peaks between the biological conditions FL and CC. The files are based on our experiment LYMP-FL-CC described in our paper.

Run THOR

The following command finds differential peaks in the two conditions (here FL and CC):

rgt-THOR THOR.config

THOR.config contains all necessary files for THOR to run:

#rep1
FL5_H3K27ac.100k.bam
FL8_H3K27ac.100k.bam
#rep2
CC4_H3K27ac.100k.bam
CC5_H3K27ac.100k.bam
#chrom_sizes
hg19.chrom.sizes

THOR needs 3 types of files:

  • BAM files for the first biological condition,
  • BAM files for the second biological condition,
  • the chromosome sizes.

The file hg19.chrom.sizes is contained in the example files as well. It is a tab limited file with the chromosome name and the chromosome size. See here how to get the chromosome sizes for further organisms.

In the config file, we can also incorporate the genome (with header #genome) and control input-DNA files (with headers #inputs1 and #inputs2). This improves the accuracy of the differential peak estimates. See here for a more advanced example.

Results

THOR creates several files. If we do not specify anything, THOR uses THOR-exp-<date> as experiment name by default.

The file(s) are:

File Description
THOR-exp-<date>-s1-rep0.bw postprocessed ChIP-seq signal (in bigWig format) for the first BAM file in the first biological condition (here: FL5)
THOR-exp-<date>-s1-rep1.bw postprocessed ChIP-seq signal (in bigWig format) for the second BAM file in the first biological condition (here: FL8)
THOR-exp-<date>-s2-rep0.bw postprocessed ChIP-seq signal (in bigWig format) for the first BAM file in the second biological condition (here: CC4)
THOR-exp-<date>-s2-rep1.bw postprocessed ChIP-seq signal (in bigWig format) for the second BAM file in the second biological condition (here: CC5)
THOR-exp-<date>-setup.info information about the experiment setting
THOR-exp-<date>-diffpeaks.bed differential peaks in a proprietary BED format
THOR-exp-<date>-diffpeaks.narrowPeak differential peaks in narrowPeak format.

Please, see here for a more detailed description of the output, in particular of the *setup.info, *bed and *narrowPeak file.

For downstream analysis of the BED file, we provide two tools. The first tool separates the BED file by differential peaks that gain peaks in condition 1 and that gain peaks in condition 2. The second tool filters the BED file by p-value. The 11th column in the BED file gives a semicolon separated list for each differential peak. The first (second) element of the semicolon separated list contains a comma separated list of the counts of each replicate of the first (second) biological conditions. The third element of the list gives the calculated p-value. Column 6 (strand) indicates whether a DP gaining condition 1 (+) or condition 2 (-) was found. Column 9 gives a colour code for the peaks (red for a differential peak in signal 1, and green for a differential peak in signal 2)

Here, we provide a screenshot of the results based on IGV:

exampledata_THOR