Description of Experimental Matrix

The experiment matrix is a basic input for many RGT Tools. It consists of a tab-separated plain text file where you can define:

  • Genomic Regions: Particular genomic regions of interest. Usually represented by BED files.
  • Aligned Reads: Derived from sequencing methods (such as RNA-seq and ChIP-seq). Usually represented by BAM files.
  • Gene Sets: Lists of genes of interest.

Header

The header of such file (first line) contains the definition of each column. It has three mandatory elements (name, type and file) followed by any number of additional items that may depend on the tool being used (or sometimes, used only for experiment clarity by the user). Although the three mandatory elements must have these fixed names, the additional fields may have any name.

The three mandatory elements represent:

  • name: Unique name for each subsequent file.
  • type: File type. Can be “regions” for genomic regions, “reads” for aligned reads or “genes” for gene sets.
  • file: The path to the file (relative or absolute).

Files

After the header line, each line will be interpreted as a file entry. Each column must have an information matching the header’s description. Most tools use these file formats:

  • regions: BED format.
  • reads: BAM format.
  • genes: Plain text file containing one gene name per line. The gene names must match the gene symbols as can be obtained here.

Example One – Experimental Matrix from RGT Tutorial

Below we show the example of the experimental matrix used in the RGT Tutorial (from tool RGT-viz):

# Header below with two additional fields
name type file cell factor
# First section -> CDP data
CDP_PU1 regions ./data/PU1_CDP_500.bed CDP PU1
PU1_CDP_WT reads ./data/CDP_PU1.bw CDP PU.1
H3K4me1_CDP reads ./data/CDP_WT_H3K4me1.bw CDP H3K4me1
H3K4me3_CDP_WT reads ./data/CDP_WT_H3K4me3.bw CDP H3K4me3
H3K27me3_CDP_WT reads ./data/CDP_WT_H3K27me3.bw CDP H3K27me3
# Second section -> cDC data
cDC_PU1 regions ./data/PU1_cDC_500.bed cDC PU1
PU1_cDC_WT reads ./data/cDC_PU1.bw cDC PU.1
H3K4me1_cDC reads ./data/cDC_WT_H3K4me1.bw cDC H3K4me1
H3K4me3_cDC_WT reads ./data/cDC_WT_H3K4me3.bw cDC H3K4me3
H3K27me3_cDC_WT reads ./data/cDC_WT_H3K27me3.bw cDC H3K27me3

All lines starting with ‘#’ represent comments and are not considered by the experimental matrix parser. The above experimental matrix is divided into two sections:

  • First section: Contains the PU.1 regions (BED file) from cell type CDP in the first line below the section comment. The following lines represent ChIP-seq data (signal – BW file) from PU.1 and histone modifications on cell type CDP.
  • Second section: Contains the PU.1 regions (BED file) from cell type cDC in the first line below the section comment. The following lines represent ChIP-seq data (signal – BW file) from PU.1 and histone modifications on cell type cDC.

Example Two

Below, we show another example of experiment matrix:

# Header below with two additional fields
name type file cell factor
# Regions
K_GABP regions K562/gabp_peaks.bed K562 GABP
K_GATA2 regions K562/gata2_peaks.bed K562 GATA2
K_MYC regions K562/myc_peaks.bed K562 MYC
H_GABP regions ESC/gabp_peaks.bed H1-hESC GABP
H_GATA2 regions ESC/gata2_peaks.bed H1-hESC GATA2
H_MYC regions ESC/myc_peaks.bed H1-hESC MYC
# Reads
K_DNASE reads K562/DNase.bam K562 DNase
K_H3K4ME1 reads K562/H3K4me1.bam K562 H3K4me1
K_H3K4ME3 reads K562/H3K4me3.bam K562 H3K4me3
H_DNASE reads ESC/DNase.bam K562 DNase
H_H3K4ME1 reads ESC/H3K4me1.bam K562 H3K4me1
H_H3K4ME3 reads ESC/H3K4me3.bam K562 H3K4me3
# Genes
K_UP_REG genes K562/up_reg.txt K562 up_reg
K_DW_REG genes K562/down_reg.txt K562 dw_reg
H_UP_REG genes ESC/up_reg.txt K562 up_reg
H_DW_REG genes ESC/down_reg.txt K562 dw_reg

In the experiment matrix above we display data regarding two cell types: K562 and H1-hESC. We show regions enriched with the transcription factors GABP, GATA2 and MYC. Also, we provide aligned reads for DNase-seq and ChIP-seq of the histone modifications H3K4me1 and H3K4me3. Finally, we exhibit lists of genes which are up- and down-regulated in these two cell types.

Experimental matrix organization and possible input types may change between different RGT tools. Please check the tool’s manual for more details.