HINT can be executed with the following command:
rgt-hint [options] <experiment_matrix>
<experiment_matrix>: Required input for the program. It describes the input regions and aligned reads.
[options]: Additional input files, paths, parameters or output options.
|<experiment_matrix>||File||None||The experiment matrix should contain one regions in which the footprinting will be performed, one DNase-seq BAM files and one or more histone modifications BAM file(s) per group. Check more details on RGT experiment matrix format.|
|–hmm-file||FILE_1_1[[,…,FILE_N_1];…;FILE_1_M[,…,FILE_N_M]]||Default HMM||List of HMM files separated by comma. If one file only, then this HMM will be applied for all histone signals, otherwise, the list must have the same number of histone files given. The order of the list should be the order of the histones in the input_matrix file. If the argument is not given, then a default HMM will be used. In case multiple input groups are used, then other lists can be passed using semicolon. The number of group of lists should equals the number of input groups.|
|–bias-table||FILE1_F,FILE1_R[;…;FILEM_F,FILEM_R]||Default bias tables||List of files (for each input group; separated by semicolon) with all possible k-mers (for any k) and their bias estimates. Each input group should have two files: one for the forward and one for the negative strand. Each line should contain a kmer and the bias estimate separated by tab. Leave an empty set for histone-only groups. Eg. FILE1;;FILE3.|
|–organism||String||hg19||Describes the organism in which the analysis is being performed. All default files such as genomes will be based on the chosen organism and the data.config file. Check more information on the rgtdata and data.config file. This option is used only if a bigbed output is asked.|
|–estimate-bias-correction||Boolean||False||Applies DNase-seq cleavage bias correction with k-mer bias estimated from the given DNase-seq data (SLOW HINT-BC).|
|–default-bias-correction||Boolean||False||Applies DNase-seq cleavage bias correction with default k-mer bias estimates (FAST HINT-BC).|
|–output-location||Path||<input_path>||Path where the output files will be written.|
|–footprint-name||String||footprints||Name of the footprint (result) file (without extension).|
|–print-bb||Boolean||False||If used, the output (footprints) will be a bigbed (.bb) file.|
Special Input File Formats
The experiment matrix should contain one region (BED file) in which the footprinting will be performed, one DNase-seq BAM file and one or more histone modifications BAM file(s) per group. Each group will be defined by the last column (“group”) of the experimental matrix. Bellow there is an example of standard experiment matrix to find footprints inside hypersensitivity regions (HS) given: (analysis 1) DNase, H3K4me1 and H3K4me3 aligned read (BAM) files and (analysis 2) using DNase-seq only in the same set of regions. It is important also to set the “data” column of the experimental matrix to one of the following: “HS” (for the regions in which the footprinting will be performed), “DNASE” (for DNase-seq data) or “HISTONE” (for histone modification data)
name type file data group HS1 regions ./Input/regions.bed HS FP1 DNase reads ./Input/DNase_chr22.bam DNASE FP1 H3K4me1 reads ./Input/H3K4me1_chr22.bam HISTONE FP1 H3K4me3 reads ./Input/H3K4me3_chr22.bam HISTONE FP1 HS2 regions ./Input/regions.bed HS FP2 DNase reads ./Input/DNase_chr22.bam DNASE FP2
Learn more about Experiment Matrix Format.
Each HMM file (with extension .hmm) describes a 4-dimensional HMM containing data regarding the normalized and slope signals of the DNase and histone modification signals.
A valid HMM file follows these rules:
- The first line describes the number of states.
- The next two lines describe the initial probabilities. It consists of a space-separated list of numbers ordered by the HMM states (see order below).
- The following lines starting at the line containing “transitions” until the line containing “emissions” contain the transition matrix. Each line contains the probability of going from the state number represented by that line (first matrix line = first state, etc.) (see order below) to the state number represented by the column number (given a space-separated list of numbers).
- The following lines starting at the line containing “emissions” until the end of the file contain the emission probabilities. Each line represents the emissions of a particular state given the order of states (see below). Within a line, the numbers before the # symbol represent the signal distribution’s means in a certain order of signals (see order below). The numbers after the # symbol represent a vectorized form (by row) of the covariance matrix given all the signals. Rows and columns of the covariance matrix also follow the order of the signals presented below.
- The order of the states in the file above is always:
- DH-HMM: BACKGROUND – UP(H) – TOP(H) – DOWN(H) – UP(D) – TOP(D) – DOWN(D) – FOOTPRINT
- The order of the signals in the file above is always:
- DH-HMM: DNase normalized – DNase slope – Histone normalized – Histone slope.
An example of HMM file trained using DNase+H3K4me3 in cell type K562 can be seen below:
HINT outputs a bed file (or bigbed file, if requested by the user) containing all the footprints found by HINT within the regions queried.