FIRE tutorial

Installing and using FIRE: a basic tutorial

Installation

The current FIRE implementation only works in a Unix environment (including Cygwin in Windows; see the required Cygwin packages). If you encounter a bug in the FIRE package, please contact us. Other people have done it before, and we have always been able to work out a solution.

First, get access to the source code by registering at this address. Download the code (FIRE-x.x.zip) to your machine. If you want, you can use wget to get the code:

wget LINK (LINK is the link to the FIRE-x.x.zip file provided after registration)

Unzip the .zip file using unzip (files will be unzipped in a FIRE-x.x/ directory)

unzip FIRE-x.x.zip

Then, change directory to FIRE-x.x/ and run the configuration tool:

cd FIRE-x.x ./configure

If './configure' complains that PCRE is not installed, the simplest thing to do is to install the PCRE library we provide, using:

sh install-pcre.sh

You can rerun './configure to make sure that the PCRE library is now available. If everything looks ok, you can compile FIRE using make:

make

Examine the output of make to make sure that were no compilation errors (ignore all warnings).

Create a FIREDIR environment variable, that points to where FIRE is installed (i.e. the current directory). On many Unix platforms, the command will be something like this:

export FIREDIR=/path/to/FIRE-x.x

If you haven't changed directory, a simple way to set FIREDIR is to use `pwd`

export FIREDIR=`pwd`

IMPORTANT: In the command line above, make sure that you included the inverted single quotes !. Also, the export command is not always present; on some platforms, it is replaced by setenv

setenv FIREDIR /path/to/FIRE-x.x or setenv FIREDIR `pwd`

Download the pre-packaged sequence file(s) for your organism(s), then simply unzip the file in the FIRE-x.x/ directory.

For example:

wget https://tavazoielab.c2b2.columbia.edu/FIRE/yeast_data.zip unzip yeast_data.zip

The above command will create a FIRE_DATA directory, a YEAST subdirectory and put the relevant files in subdirectories. Of course, you don't need to download all of the species-specific files above, just the ones you want to work with. Assuming everything went smoothly, FIRE can now be used.

Using FIRE

The current implementation of FIRE is meant to be executed from the same directory where all the scripts reside (in the FIRE/ directory if you've followed the instructions above).

The basic command line syntax for FIRE is :

perl fire.pl --expfile=<inp> --species=<sp> --exptype=<type>

where <inp> indicates the input expression profile, in case the expression values are discrete, please ensure that they start from 0 and they include all the values between 0-n. <sp> indicates the species, and <type> indicates whether the expression profile is discrete (e.g., cluster indices) or continuous (e.g., expression values obtained from a single microarray experiment).

The species name can be chosen from : yeast, pombe, bayanus, calbicans, malaria, worm, drosophila, arabidopsis, ciona, human, mouse, rat, chicken, ecoli_tu.

For example, the following command line will reproduce our results for the yeast stress clustering partition :

perl fire.pl --expfile=yeast_gasch_IclustPos.txt --species=yeast --exptype=discrete

The following command line will reproduce our results for the P. falciparum IDC phase analysis :

perl fire.pl --expfile=malaria_phase.txt --species=malaria --exptype=continuous --sortmotifsbyphase=1

Species name can be chosen from : yeast, pombe, malaria, worm, drosophila, arabidopsis, ciona, human, mouse. The corresponding pre-packaged sequence files must of course have been downloaded (see above).

The fire.pl script automatically stores the output files in a new directory named by appending _FIRE to the expression profile file name. In this directory, it creates three sub-directories:

yeast_gasch_IclustPos.txt_FIRE/DNA yeast_gasch_IclustPos.txt_FIRE/RNA yeast_gasch_IclustPos.txt_FIRE/DNA_RNA

A convenient way to browse the results is to execute the makeresultindex.pl script on the expression file:

perl MORESCRIPTS/makeresultindex.pl yeast_gasch_IclustPos.txt "Clustered stress microarray dataset (Gasch et al, 2000)"

This script creates an HTML file named index.htm in the _FIRE directory (yeast_gasch_IclustPos.txt_FIRE/index.htm). Simply point your web browser to that file (it should look like this one)

Otherwise, the following files are most interesting (in yeast_gasch_IclustPos.txt_FIRE):

DNA_RNA/yeast_gasch_IclustPos.txt.summary.pdf (or .eps) P-value heatmap combining DNA and RNA motifs (main figure). The log10 p-values (enrichment if positive, depletion if negative) can be found in DNA_RNA/yeast_gasch_IclustPos.txt.matrix

DNA_RNA/yeast_gasch_IclustPos.txt.fullmimatrix.pdf (or .eps) Interaction heatmap showing modules of co-occurring motifs

.pdf/.eps files in DNA_RNA/yeast_gasch_IclustPos.txt.summary_OUT/ Motif maps

.pdf/.eps files in DNA_RNA/yeast_gasch_IclustPos.txt.mimatrix_OUT/ Combined motif maps for co-localizing motifs

DNA/yeast_gasch_IclustPos.txt.signif.motifs.rep IMPORTANT: In this file, the column before last contains the fraction of genes that have the motif, in each bin/cluster. The second column contain the bin/cluster numbers. The first column indicate the motif. The last column indicate the number of genes in the considered bin/cluster.

DNA/yeast_gasch_IclustPos.txt.GOmotifs Best GO enrichments for DNA motifs

DNA/yeast_gasch_IclustPos.txt.GOmotifs.full All GO enrichments for DNA motifs

DNA/yeast_gasch_IclustPos.txt.motifreport All motif occurrences, sorted by putative functionality (see paper)

RNA/yeast_gasch_IclustPos.txt.signif.motifs.rep IMPORTANT: In this file, the column before last contains the fraction of genes that have the motif, in each bin/cluster. The second column contain the bin/cluster numbers. The first column indicate the motif. The last column indicate the number of genes in the considered bin/cluster.

RNA/yeast_gasch_IclustPos.txt.GOmotifs Best GO enrichments for RNA motifs

RNA/yeast_gasch_IclustPos.txt.GOmotifs.full All GO enrichments for RNA motifs

RNA/yeast_gasch_IclustPos.txt.motifreport All motif occurrences, sorted by putative functionality (see paper)

Finally, here are some useful options to the fire.pl script:

--jn_t=INT between 0 and 10, define the robustness index threshold (default is 6)

--k=INT defines the length of the k-mer seeds (default is 7)

--doskipdiscovery=1 --motiffile_dna=FILE --motiffile_rna=FILE Uses pre-specified lists of DNA and RNA motifs (regular expressions), instead of all k-mers

--submit=1 submit job to grid (DNA and RNA anaysis are executed in parallel)

--expfiles="*.txt" processes all .txt expression files in current directory

--domisearch=0 skip the motif discovery part, redo all other steps (useful to generate new figures with a different robustness threshold in combination with --jn_t=INT)

--sortmotifbyphase=1 Skip module discovery, sort motifs by phase (useful only for certain continuous expression profiles)

--dodna=0 --dorna=0 --dodnarna=0 Skip the DNA, RNA or DNA/RNA combination phase (useful to redo only parts of the analysis)

Checking the format of FIRE input files

If FIRE outputs a lot of error messages, or does not behave as expected, one possibility is that your input files are not in the right format. We provide a simple script to check FIRE input files:

perl TOOLS/FIRE_analyse_input_files.pl -species yeast -expfile FIRE_DATA/YEAST/EXPFILES/yeast_gasch_IclustPos.txt

Output:
---------------------------
Checking the expression file
Expression file is OK.

Checking the fasta file
Fasta file is OK.

Found fasta sequence for 6110 / 6152 identifiers in expression file.
---------------------------

Note that you can also check your own fasta file using -fastafile FILE instead of -species TXT.

How do I analyze my own sequences using FIRE ?

Analyzing your own sequences (not the prepackaged ones we provide) is of course possible in FIRE.

Assume your expression profile is in expression.txt and your sequences in mysequences.fa. All of them have length 2000bp (we recommend all your sequences have the same length, but this does not need to be the case). The command line you would have to type to analyze these (continuous) data is:

perl fire.pl --expfiles=expression.txt --exptype=continuous --fastafile_dna=mysequences.fa --seqlen_dna=2000 --nodups=1

You noted the --nodups=1 parameter. This tells FIRE to not try to remove duplicates from the expression data. This is not ideal, because duplicate promoters can introduce artefacts in FIRE results. The reason you have to have this parameter is that, by default, FIRE would look for a file called mysequences.fa.homologies.

Here is how to create this .homologies file. It involves BLAST comparisons, so you need to have BLAST standalone installed on your machine.

Then you need to modify SCRIPTS/MyBlast.pm, and make $self->{BLAST_DIR} point to the directory that contains the BLAST programs (blastall, formatdb, etc). For example:

$self->{BLAST_DIR} = "/home/elemento/PERL_MODULES/PROGRAMS/BLAST";

You will then format you sequence files using formatdb:

formatdb -i mysequences.fa -p F -o T

Then, create the .homologies file using the following command:

perl TOOLS/detect_homologous_sequences.pl --fastafile=mysequences.fa

The above command will create a mysequences.fa.homologies file (no need to redirect the output of the script unlike in previous versions of the script.

Then you can run FIRE without the --nodups=1 option above.

Using FIRE gapped

FIRE gapped is an additional wrapper which enables the users to discover bipartite motifs (i.e. the motifs with a gap in the middle). For this, you need to use the fire_gapped.pl. The commandline arguments are the same as fire.pl with the exception of 'kungapped' and 'gap':

perl fire.pl --expfile=<inp> --species=<sp> --exptype=<type> --kungapped=6 --gap=0-10

`DNA_RNA/yeast_gasch_IclustPos.txt.summary.pdf (or .eps)`	P-value heatmap combining DNA and RNA motifs (main figure). The log10 p-values (enrichment if positive, depletion if negative) can be found in DNA_RNA/yeast_gasch_IclustPos.txt.matrix
`DNA_RNA/yeast_gasch_IclustPos.txt.fullmimatrix.pdf (or .eps)`	Interaction heatmap showing modules of co-occurring motifs
`.pdf/.eps files in DNA_RNA/yeast_gasch_IclustPos.txt.summary_OUT/`	Motif maps
`.pdf/.eps files in DNA_RNA/yeast_gasch_IclustPos.txt.mimatrix_OUT/`	Combined motif maps for co-localizing motifs
`DNA/yeast_gasch_IclustPos.txt.signif.motifs.rep`	IMPORTANT: In this file, the column before last contains the fraction of genes that have the motif, in each bin/cluster. The second column contain the bin/cluster numbers. The first column indicate the motif. The last column indicate the number of genes in the considered bin/cluster.
`DNA/yeast_gasch_IclustPos.txt.GOmotifs`	Best GO enrichments for DNA motifs
`DNA/yeast_gasch_IclustPos.txt.GOmotifs.full`	All GO enrichments for DNA motifs
`DNA/yeast_gasch_IclustPos.txt.motifreport`	All motif occurrences, sorted by putative functionality (see paper)
`RNA/yeast_gasch_IclustPos.txt.signif.motifs.rep`	IMPORTANT: In this file, the column before last contains the fraction of genes that have the motif, in each bin/cluster. The second column contain the bin/cluster numbers. The first column indicate the motif. The last column indicate the number of genes in the considered bin/cluster.
`RNA/yeast_gasch_IclustPos.txt.GOmotifs`	Best GO enrichments for RNA motifs
`RNA/yeast_gasch_IclustPos.txt.GOmotifs.full`	All GO enrichments for RNA motifs
`RNA/yeast_gasch_IclustPos.txt.motifreport`	All motif occurrences, sorted by putative functionality (see paper)

--jn_t=INT	between 0 and 10, define the robustness index threshold (default is 6)
--k=INT	defines the length of the k-mer seeds (default is 7)
--doskipdiscovery=1 --motiffile_dna=FILE --motiffile_rna=FILE	Uses pre-specified lists of DNA and RNA motifs (regular expressions), instead of all k-mers
--submit=1	submit job to grid (DNA and RNA anaysis are executed in parallel)
--expfiles="*.txt"	processes all .txt expression files in current directory
--domisearch=0	skip the motif discovery part, redo all other steps (useful to generate new figures with a different robustness threshold in combination with --jn_t=INT)
--sortmotifbyphase=1	Skip module discovery, sort motifs by phase (useful only for certain continuous expression profiles)
--dodna=0 --dorna=0 --dodnarna=0	Skip the DNA, RNA or DNA/RNA combination phase (useful to redo only parts of the analysis)