Basic Introduction

Method

HINT1

HINT (Hmm-based IdeNtification of Transcription factor footprints) is a framework that uses open chromatin data to identify the active transcription factor binding sites. This method is originally proposed to model the active binding sites by simultaneous analysis of DNase-seq and the ChIP-seq profiles of histone modifications on a genome-wide level (paper). The HMM has as input a normalised and a slope signal of DNase-seq and one of the histone marks. It can, therefore, detect the increase, top and decrease regions of either histone modification and DNase signals.  And we next modified HINT to allow only DNase-seq data by removing the three histone-level states and the use of bias-corrected DNase-seq signal before normalisation steps (paper). Recently, we extended HINT to ATAC-seq, a new assay to identify accessible DNA regions, taking the protocol-specificity into consideration.

Installation

If you have followed the generic instructions for the RGT suite installation, then you can start using HINT with the latest version 0.11.1. If you haven’t install RGT yet, use command

pip install --user RGT 

to install HINT. Further installation instructions are found here. If you have any questions, comments, installation problems or bug reports, please access our discussion group.

Note: You need to follow these instructions to download the genomic data for cleavage bias correction.

Basic Usage

We describe here how to detect footprints using HINT for ATAC-seq, DNase-seq and histone modifications data. To perform footprinting, you need at least two files, one with the aligned reads of your chromatin data and another describing the regions to detect footprints. You can use a peak caller, such as MACS2,  to define these regions of interest.

Footprinting for ATAC-seq data

Download here the example data for ATAC-seq based on chromosome 1 of the GM12878 cell. Execute the following commands to extract the data from the download file:

tar xvfz HINT_ATACTest.tar.gz
cd HINT_ATACTest 

and the below command to perform footprinting:

rgt-hint footprinting --atac-seq ATAC.bam ATACPeaks.bed

For simplicity, we use only the first 1000 peaks from chromosome 1. The above commands will output a BED file containing the footprints in your current folder with footprints as the prefix. Moreover, You can set the below arguments

--output-location=your_directory  --output-prefix=your_prefix

to tell HINT your preferred output directory and name. Each footprint, i.e. each line of the BED file, will contain information regarding the tag-count score (number of reads) of each footprint. This score can be used as a footprint quality assessment (the higher values indicates better candidates). In addition, a file including the details of reads and footprints will also be written in the same folder of BED file.

If your data is paired-end, you may want to try another model which is optimized for paired-end sequencing data:

rgt-hint footprinting --atac-seq --paired-end --output-prefix=fp_paired ATAC.bam ATACPeaks.bed

Note: HINT performs bias correction for ATAC-seq by default, so you must download the genomes following these instructions and correctly specify the genome references with the following command before footprinting:

--organism=genome_version

Currently, the default setting is hg19. Find here for more information.

Footprinting for DNase-seq

You can find here example DNase-seq data. Execute the following commands to extract the data from a compressed file:

tar xvfz HINT_DNaseTest.tar.gz
cd HINT_DNaseTest

and the following command to call the footprints:

rgt-hint footprinting --dnase-seq DNase.bam DNasePeaks.bed

We recommend you to use cleavage bias correction. This can be done by using the following command:

rgt-hint footprinting --dnase-seq --bias-correction DNase.bam DNasePeaks.bed

Don’t forget to define the proper genome references using :

--organism=genome_version

Currently, the default setting is hg19.

Footprinting for histone modification data

Download here the example data for histone modification. Execute the following commands to extract data:

tar xvfz HINT_HistoneTest.tar.gz
cd HINT_HistoneTest 

and call footprints

rgt-hint footprinting --histone histone.bam histonePeaks.bed

The complete tutorial and more descriptive examples are found in here.

Citation

If you use HINT with DNAse or histones cite the following publication:

Gusmao EG, Dieterich C, Zenke M and Costa IG. “Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications” Bioinformatics, 30(22):3143-3151, 2014. [Full Text]

HINT with DNase with bias correction should cite

Gusmao EG, Allhoff M, Zenke M and Costa IG. “Analysis of computational footprinting methods for DNase sequencing experiments”. Nature Methods, 13(4):303-309, 2016.[Full Text]

HINT with ATAC-seq should cite the following publication

to come