we will use ATAC-seq data obtained from Lara-Astiaso et al 2014 for footprinting and differential analysis. We have performed low-level analysis steps including reads alignment and peaks calling. The sequencing data can be found here.
1. Download here and extract the pre-processed data:
tar xvfz Blood_Mus.tar.gz cd Blood_Mus
2. Execute the following commands to call footprints for CD4 and Lsk cells:
rgt-hint footprinting --atac-seq --organism=mm10 ./CD4/ATAC.bam ./CD4/ATACPeaks.bed --output-prefix=CD4Footprints rgt-hint footprinting --atac-seq --organism=mm10 ./Lsk/ATAC.bam ./Lsk/ATACPeaks.bed --output-prefix=LskFootprints
Each command will generate an output file in the current folder, containing the genomic locations of the footprints. HINT also produces a file with ending “.info”, which has general statistics from the analysis as no. of footprints, the total number of reads and so on.
We can use IGV to vizualise ATAC-seq signals and footprint predictions in particular loci. First, we can use a special HINT command to generate genomic profiles (bigWig files).
rgt-hint tracks --bc --bigWig --organism=mm10 ./CD4/ATAC.bam ./CD4/ATACPeaks.bed --output-prefix=CD4 rgt-hint tracks --bc --bigWig --organism=mm10 ./Lsk/ATAC.bam ./Lsk/ATACPeaks.bed --output-prefix=Lsk
This bigwig file contains the number of ATAC-seq (or DNAse-seq) reads at each genomic position as estimated by HINT after signal normalization and cleavage bias correction. This is, therefore, more accurate than simply looking a coverage profiles of a bam file.
Open all bigwig and footprint files (bed) generated above in IGV. Remember to set the genome version to mm10 beforehand. Check for example the genomic profiles around the gene Zap70 in the below, which is part of T cell receptor. We observe that this gene has several open chromatin regions for these two cell types. Moreover, CD4 T cells show higher level of chromatin accessibility in the start region and around around exon 3 of gene Zap70 than Lsk cells.
3. An important question when doing footprint analysis is to evaluate which TF motifs overlap with footprints and evaluate the ATAC-seq profiles around these motifs. RGT suite also offers a tool for finding motif predicted binding sites (mpbs). For example, we analyze here motifs from factors SPI1 and ELK4, which were discussed in Lara-Astiaso et al. 2014 to be associated respectively associated with LSK and CD4 cells.
Execute the following commands to do motif matching for footprints using motifs from HOCOMOCO:
rgt-motifanalysis matching --organism=mm10 --input-files CD4Footprints.bed LskFootprints.bed
The above commands will generate a BED file for each input file, containing the matched motif instances for each footprint region. The 4th column contains the motif name and the 5th column the bit-score of the motif match. Check here for more details about motif matching.
4. Finally, we use HINT to generate average ATAC-seq profiles around binding sites of particular TF. This analysis allows us to inspect the chromatin accessibility and the underlying sequence. Moreover, by comparing the cut profiles from two ATAC-seq libraries (i.s. LSK vs T CD4 cells ), we can get insights on changes in binding in two cells. For this, execute the following commands:
rgt-hint differential --organism=mm10 --mpbs-file1=./match/LskFootprints_mpbs.bed --mpbs-file2=./match/CD4Footprints_mpbs.bed --reads-file1=./Lsk/ATAC.bam --reads-file2=./CD4/ATAC.bam --condition1=Lsk --condition2=CD4
The above command will make a new folder ‘Lsk_CD4’ and output the ATAC-seq profile for each of the motifs founds in the provided mpbs bed files as shown in the below. Let’s check the profiles in the comparison LSK and CD4, you will see that ELK4 has the higher number of ATAC-seq counts in CD4 cells, while SPI1 has more ATAC-seq in LSK cells. Higher ATAC counts indicate the higher activity of the factor in that particular cell. This fits with the results discussed in Lara-Astiaso that SPI1 is more relevant/active in LSK, while ELK4 in CD4 cells.