Basic Introduction

 

Method

HINT1

HINT (Hmm-based IdeNtification of Transcription factor footprints) integrates both DNase I hypersensitivity and histone modifications for the detection of open chromatin regions and active binding sites. Within transcription factor binding sites, there is a specific grammar of DNase I digestion and histone marks. We have therefore devised a multivariate HMM to model this regulatory grammar by simultaneous analysis of DNase-seq and the ChIP-seq profiles of histone modifications on a genome-wide level. The HMM has as input a normalized and a slope signal of DNase-seq and one of the histone marks. It can therefore detect the increase, top and decrease regions of either histone modification and DNase signals. The genomic regions annotated with the ‘footprint’ HMM state are considered our predictions and represent likely binding sites within that cell’s context.  For more details on the method, please check our full paper. If you use this tool in your research we kindly ask you to cite the paper. For benchmarking data of our main publication please visit our lab’s website.

Download & Install

If you have followed the generic instructions for the RGT suite installation, then you can start using HINT.

If you did not install the full suite and wish to install only the HINT,  download here the latest RGT release and then type:

cd reg-gen-X.X.X
python setup.py install --rgt-tool=hint

Further installation instructions are found here.

If you have any questions, comments, installation problems or bug reports, please access our discussion group.

Basic Usage

Download here a simple example of input data for HINT. Execute the following commands in order to find footprints in the chromosome 22 of K562 cell type using DNase, H3K4me1 and H3K4me3 data:

rgt-hint --output-location ./Output/ --output-fname DU_K562_HINT ./InputMatrix_HINT_DNase+Histone.txt

This command will output a BED file (./Output/DU_K562_HINT.bed) containing the footprints, inside the Output folder. Each footprint, i.e. each line of the BED file, will also contain information regarding the tag-count score of each footprint. This score can be used as a footprint quality assessment (the higher the value, the better). For more information on the input files and optional input parameters please type:

rgt-hint -h

HINT also supports the use of only DNase-seq data to perform the footprinting. You can run an example with DNase-seq data only by typing:

rgt-hint --output-location ./Output/ --output-fname DU_K562_HINT ./InputMatrix_HINT_DNase.txt

To use HINT with your own DNase-seq data, open the experimental matrix (“InputMatrix_dnase.txt”) file and change the DNase-seq path “./Input/DNase_chr22.bam” by your DNase-seq file. Also, you need to give the HS regions as input. For that you need to perform peak calling in your DNase-seq data and replace the experimental matrix entry “./Input/regions.bed” by the result of your peak calling experiment.

We also make available HINT-BC, which performs DNase-seq cleavage bias correction in order to improve footprinting. To run HINT-BC, please download the human genome fasta file, extract it and place the fasta file in the rgtdata hg19 folder (~/rgtdata/hg19) and type the following command (in the example folder):

rgt-hint --default-bias-correction --output-location ./Output/ --output-fname DU_K562_HINT
./InputMatrix_HINTBC_DNase.txt

Finally, you can verify the quality of DNase-seq footprints for particular TFs using the protection score. For this, you need a bigbed file with motif predictions overlapping with your footprints (regions_mpbs.bb). Use our motif matching tool if you need to generate these for your footprint predictions.  Within the RGT code folder, locate the “tools” folder and type:

<PATH_TO_TOOLS_FOLDER>/protectionScore.py ./Output/DU_K562_HINTBC.bed ./Input/regions_mpbs.bb ./Input/DNase_chr22.bam ~/rgtdata/hg19/genome.fa Output/protection.txt

The output of the protection score script consists of a tab-separated table in which the first column contains the factor name and the second column contains the protection score. The higher the protection score, the more reliable the footprint predictions are. For more information on the input files and optional input parameters please type:

protectionScore.py -h

Further usage instructions and input parameters are found here.

Citation

If you use HINT/HINT-BC in your research, please cite the following publication:

Gusmao EG, Allhoff M, Zenke M and Costa IG. “Analysis of computational footprinting methods for DNase sequencing experiments”. Nature Methods, 13(4):303-309, 2016. [Full Text]

Bibtex:

@article{gusmao2016,
    author = {Gusmao, Eduardo G. and Allhoff, Manuel and Zenke, Martin and Costa, Ivan G.},
    citeulike-article-id = {13947931},
    citeulike-linkout-0 = {http://dx.doi.org/10.1038/nmeth.3772},
    citeulike-linkout-1 = {http://dx.doi.org/10.1038/nmeth.3772},
    day = {22},
    doi = {10.1038/nmeth.3772},
    journal = {Nat Meth},
    keywords = {dnase, footprinting},
    month = feb,
    posted-at = {2016-03-01 17:51:59},
    priority = {2},
    publisher = {Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.},
    title = {Analysis of computational footprinting methods for {DNase} sequencing experiments},
    url = {http://dx.doi.org/10.1038/nmeth.3772},
    volume = {13},
    number = {4},
    pages = {303--309},
    year = {2016}
}

You might also be interested in the original HINT method publication, which is:

Gusmao EG, Dieterich C, Zenke M and Costa IG. “Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications” Bioinformatics, 30(22):3143-3151, 2014. [Full Text]

Bibtex:

@article{gusmao2014,
    author = {Gusmao, Eduardo G. and Dieterich, Christoph and Zenke, Martin and Costa, Ivan G.},
    citeulike-article-id = {13340527},
    day = {15},
    doi = {10.1093/bioinformatics/btu519},
    issn = {1460-2059},
    journal = {Bioinformatics},
    keywords = {dnase1, histone\_marks, tfbs},
    month = nov,
    number = {22},
    pages = {3143--3151},
    pmid = {25086003},
    posted-at = {2014-08-29 05:42:07},
    priority = {2},
    publisher = {Oxford University Press},
    title = {Detection of active transcription factor binding sites with the combination of {DNase} hypersensitivity and histone modifications},
    url = {http://dx.doi.org/10.1093/bioinformatics/btu519},
    volume = {30},
    year = {2014}
}