HINT (Hmm-based IdeNtification of Transcription factor footprints) is a framework to integrate full profiles of both DNase I hypersensitivity (DNase-seq assays) and histone modification (ChIP-seq assays) for the detection of open chromatin regions and active binding sites (also known as digital DNA footprinting). Within transcription factor binding sites, there is a specific grammar of DNase I cleavage and histone mark patterns. We have therefore devised a multivariate Hidden Markov Model (HMM) to model this regulatory grammar by simultaneous analysis of DNase-seq and the ChIP-seq profiles of histone modifications on a genome-wide level.
First, the DNase-seq and ChIP-seq signals (represented by aligned read BAM files) are normalized using a within- and between-dataset approach. Such normalization accounts for, among others, signal length and intensity, signal variability within/between datasets, amount of sequencing performed and minimization of HMM-related numerical problems. After the signals are normalized, an HMM is applied to user-defined genomic regions (in a BED file). A good selection of regions are significant peaks of DNase-seq, histone modification ChIP-seq or the combination of both. Finally, the user will be reported with the digital footprints (regions likely to be bound by cis-acting regulatory factors) found by HINT. This will be reported in a BED (or BIGBED) file.
For more details on the method, please check our full paper. If you use this tool in your research we kindly ask you to cite the paper. For benchmarking data regarding our main publication please visit our lab’s website.
Benchmarking data includes:
- Footprints for our method and many competing methods.
- HMM models.
- Transcription factors binding sites obtained with motif matching (in order to create the evaluation datasets).
- ChIP-seq peaks for transcription factors tested (in order to create the evaluation datasets).