The genome data of hg38 should be already configured (How to configure genome data).
Genomic region test requires one RNA sequence and a set of interesting target regions:
- RNA sequence: A file in FASTA format. For known lncRNAs, you can download their sequences from the UCSC Genome Table Browser.
- Target regions: A BED file, which contains the regions of interest.
To demonstrate the genomic region test, we use MEG3 as an example. In the example files (download here), you can find the folder “MEG3_hg38“. There are several files:
- MEG3_sequence.fa: RNA sequence of MEG3 in FASTA format.
- MEG3_hg38_CHOP.bed: Potential target regions of MEG3.
You can use the following command to see all the arguments related to the genomic region test:
Analysis by default setting
Given the RNA sequence and interested regions, it is easy to perform the test with the default arguments.
rgt-TDF regiontest -r MEG3_sequence.fa -bed MEG3_hg38_CHOP.bed -rn MEG3 -organism hg38 -o genomic_region_test/MEG3
where -r is the lncRNA sequence, -bed is the region of interest, -rn defines the lncRNA name, –organism defines the organism, and -o is the output directory.
Then, the result webpages and graphics are stored in genomic_region_test/MEG3/
You can simply open genomic_region_test/MEG3/index.html to see all the results and graphics. (see Demo here)
There are three main pages for each test.
- RNA page: Shows statistics and graphics of candidate DBDs.
- Target region page: Shows DBSs statistics for all target regions and rankings.
- Parameters page: Shows the parameters used by TDF.
The main page is the RNA, which will display the main statistics regarding the candidate DBDs. Here, you can find which DBD binds significantly to the target regions.
The second page shows the statistics of target regions potentially forming triplexes with a given RNA. It is possible to define the region with the highest binding possibility and its associated gene.
Advanced options for the genomic region test
We will describe a few relevant options of TDF. See the tool usage for description of all options.
How to change the number of randomization?
The default randomization processes is performed for 10,000 times. You can change it by the argument -n.
rgt-TDF regiontest -r MEG3_sequence.fa -bed MEG3_hg38_CHOP.bed -rn MEG3 -organism hg38 -o genomic_region_test/MEG3 -n 100
How to change minimum length and error tolerance of triple helix binding sites
When the predicted binding sites are too less to analyze, you can decrease the minimum length (-l) or increase the percentage error tolerance (-e) as follows:
rgt-TDF regiontest -r MEG3_sequence.fa -bed MEG3_hg38_CHOP.bed -rn MEG3 -organism hg38 -o genomic_region_test/MEG3 -n 100 -l 14 -e 20
How to output target regions and DBSs in BED format
Sometimes we want to output the interested regions in BED format for further investigation. TDF can output target promoters and TTSs on target promoters by adding an argument, -obed:
rgt-TDF regiontest -r MEG3_sequence.fa -bed MEG3_hg38_CHOP.bed -rn MEG3 -organism hg38 -o genomic_region_test/MEG3 -n 100 -obed
Three BED files will appear in the output directory:
- MEG3_target_region_dbs.bed: The target regions which have at least one TTS within it.
- MEG3_dbss.bed: All TTSs between lncRNA and target regions.
The file name MEG3 is defined by output directory name.
Remove temporary files
TDF generates some temporary files such as .fa and .txp. They can be removed in order to save memory usage by the argument, -rt:
rgt-TDF regiontest -r MEG3_sequence.fa -bed MEG3_hg38_CHOP.bed -rn MEG3 -organism hg38 -o genomic_region_test/MEG3 -n 100 -obed -rt