Motif Analysis on PU.1 peaks

Here we will perform a motif enrichment analysis on the PU.1 peaks of the dendritic cell types CDP and cDC. In the end of this tutorial you should have all the putative transcription factors that are enriched in DNA sequences around PU.1 peaks of both cell types being analyzed.

First, you need to install the Motif Analysis tool and JASPAR/UNIPROBE logo graphs. We assumed that you have obtained the mouse genome (mm9) using the RGT Genomic Data, as described in the previous tutorial. Furthermore, all commands described in this tutorial should be executed within the example RGT-Viz folder from the previous tutorial.

The motif enrichment analysis consists of two parts. In the first part, we will perform a motif match in order to detect putative binding sites in the input regions and also in random genomic regions automatically generated by the tool. In the second part, the tool will count the number of putative binding sites inside the input and random regions and perform a statistical test to determine which transcription factors are more likely to be enriched.

Performing the motif matching

We are ready to perform the first step of the motif analysis, which consists of the motif matching. In this step we are going to find putative binding sites of a number of transcription factors into the genome of the mouse (mm9). To perform the motif match, type the following command:

rgt-motifanalysis --matching --organism mm9 --rand-proportion 2 --input-matrix Matrix_CDP_cDC.txt

Let’s check each part of the above command:

  • rgt-motifanalysis –matching: This is the motif match command call. In order to know all the options you can use in the motif match analysis please type:
rgt-motifanalysis --matching -h
  • –organism mm9: Set the organism being analyzed to mm9.
  • –rand-proportion 2: Since we plan to perform a motif enrichment after the matching, we also need putative binding sites at random regions. This command tells the tool to generate random putative binding sites with size two times larger than the input peak. In real scenarios this proportion should be set to 10 or more for proper statistical accuracy.
  • –input-matrix Matrix_CDP_cDC.txt: The experimental matrix for this analysis. The motif matching tool will only consider the “region” input files from the experimental matrix. For more information please refer to this manual.

After running the motif match command, the output should be presented in a folder termed “match”. In this folder you will find the putative binding sites (MPBS; Motif-Predicted Binding Sites) for the two input regions determined in the experimental matrix, as well as for the random regions.

Performing the motif enrichment

With the results of the motif matching, we can perform the motif enrichment. This analysis consists on verifying which transcription factors are enriched in our input regions. To perform the motif enrichment type the following command:

rgt-motifanalysis --enrichment --organism mm9 --input-matrix Matrix_CDP_cDC.txt match/random_regions.bed

Let’s check each part of the above command:

  • rgt-motifanalysis –enrichment: This is the motif enrichment command call. In order to know all the options you can use in the motif enrichment analysis please type:
rgt-motifanalysis --enrichment -h
  • –organism mm9: Set the organism being analyzed to mm9.
  • –input-matrix Matrix_CDP_cDC.txt: The experimental matrix containing the input files, as in the motif matching step.
  • match/random_regions.bed : The only required argument represents the path to the background file to use for enrichment, in this case the random regions file created at the previous step. Only the input file should be passed: the corresponding MPBS file is found automatically by the tool.

Results of the motif enrichment

The motif enrichment tool outputs files inside the following folder structure: <output_folder>/<name_of_region>. In our example, the results will be inside: <example_RGT-Viz_folder>/cDC_PU1 and <example_RGT-Viz_folder>/CDP_PU1.

Let’s explore the motif enrichment output in the cDC_PU1 folder:

  • mpbs_ev.bed: Contains all the putative binding sites found inside PU1 peaks of cDC cell type.
  • fulltest_statistics.html and fulltest_statistics.txt: Contains the results of the analyses in HTML and txt (tab-separated) format. The HTML table looks like this:

RGT_MotifEnrichment_Figure-1

You can observe that, as expected, PU.1 motifs were ranked in the top (UP00085_1_Sfpi1_primary and MA0080.3.Spi1). Furthermore, we observed that other transcription factors were also enriched in these regions, such as Erg, IRF1, Ets1, ETF1, EHF and many others. These transcription factors are putative co-binding or regulatory partners of PU.1 and are connected with PU.1 in its regulatory network within the cDC cell type.

For more information on the motif analysis, please refer to the tool section in this website.