Basic Introduction

 

Method

MotifAnalysis1We provide a flexible framework to perform operations related to motif analyses. Currently, the following analyses are available:
Motif Matching: Matches a set of transcription factor motifs found on the main repositories (such as Jaspar, Uniprobe and Hocomoco) against a set of genomic regions of interest.
Motif Enrichment: Performs Fisher’s exact test in order to verify if a set of genomic regions are enriched for particular transcription factors.

 

Installation

First, you need to install the package MOODS, which cannot be automatically installed in our setup procedure:

pip install --user https://github.com/fabioticconi/MOODS/tarball/pypi-ready

You can then install the RGT suite:

pip install RGT --user

You must now setup your RGT Data folder, unless you have already done it for another tool.

Specifically for Motif Analysis, you also need to create the Weblogos in the RGT Data folder, otherwise they will be missing from the enrichment files. More details here, but the simple command is (within the Data folder):

python setupLogoData --all

Further installation instructions, including installation without pip, are found here.

Tutorial

Full-site Test

Download here. Execute the following commands in order to perform a motif matching followed by motif enrichment:

cd RGT_MotifAnalysis_FullSiteTest
rgt-motifanalysis --matching input/regions_K562.bed input/background.bed 
rgt-motifanalysis --enrichment input/background.bed input/regions_K562.bed

Promoter Test

Download here. Execute the following commands in order to perform a motif matching followed by motif enrichment:

cd RGT_MotifAnalysis_PromoterTest
rgt-motifanalysis --matching --gene-list input/genes.txt input/background.bed 
rgt-motifanalysis --enrichment input/background.bed match/target_regions.bed

Gene association Test

Download here. Execute the following commands in order to perform a motif matching followed by motif enrichment:

cd RGT_MotifAnalysis_GeneAssocTest
rgt-motifanalysis --matching --input-matrix input_matrix.txt --rand-proportion 10
rgt-motifanalysis --enrichment --input-matrix input_matrix.txt match/random_regions.bed

The matching command will read the experimental matrix, which specifies the PATH to the genomic regions and the genes to make an association test on. It also creates a background made of random regions of size 10 times the biggest genomic region in input. It might take between 10 and 30 minutes to run. It if is taking too long, use --rand-proportion 1.

The enrichment command will calculate the enrichment statistics for the all the input regions over the random background. This step should take about 5 minutes to complete.

Further usage instructions are found here.