Installing and using TEISER: a basic tutorial
Installation
Download the code (e.g, TEISERvx.x.zip) to your machine. If you want, you can use wget to get the code:
wget TEISER/TEISERvx.x.zip
Unzip the .zip file using unzip (files will be unzipped in a TEISERvx.x/ directory)
unzip TEISERvx.x.zip
Then, go to TEISERvx.x/ and run "make":
cd TEISERvx.x
make
Initializing the structural seeds
TEISER starts by evaluating a predefined set of structural motifs which we call seeds. The significant seeds are then further optimizied and elongated into more informative motifs. You can initialize the seed space as you see fit but we used the following criteria:
- stem length: from 4bp to 7bp.
- loop length" from 4nt to 9nt.
- Number of informative bases: 4nt to 6nt.
- Information of the motif: 14-20.
To create this set, you should run the following command ($TEISERDIR is the TEISER home directory where teiser.pl is located):
$TEISERDIR/Programs/seed_creator -min_stem_length INT -max_stem_length INT -min_loop_length INT -max_loop_length INT -min_inf_seq INT -max_inf_seq INT -max_inf FLOAT -min_inf FLOAT -outfile FILE
This program creates the seeds that satisfies the set constraints and packages them into seperate files, each containing 250,000 independent seeds. We recommend using the seeds folder in the TEISER_Data directory to deposit the seed files. For example, for the above parameters, set the 'outfile' parameter to "$TEISERDIR/TEISER_Data/seeds/seeds.4-7.4-9.4-6.14". There is a file called "seedfiles.txt" in this folder which must contain all the generated files (with paths reported relative to TEISER home directory); if not, modify this file as needed. For each species, in the species_data
folder, this file is set as a parameter, which enables TEISER to locate all the necessary seeds.
Using TEISER
The current implementation of TEISER is meant to be executed from the same directory where all the scripts reside (in the TEISERvx.x/ directory if you've followed the instructions above). If you want to run TEISER from another directory, you should define TEISERDIR variable:
export TEISERDIR=/path/to/TEISER/
If you don't set the TEISERDIR variable, TEISER assumes the current path as the TEISER home directory.
The basic command line syntax for TEISER is :
perl teiser_parallel.pl --expfile=<inp> --species=<sp> --exptype=<type> --ebins=<int> --submit=<0/1>
where <inp> indicates the input genome profile, <sp> indicates the species, <type> indicates the number of bins used for the quantization of the genome profile and <type> indicates whether the genome profile is discrete (e.g., cluster indices) or continuous (e.g., expression values obtained from a single microarray experiment). If you set "--submit=1", TEISER will submit the required job to the available nodes. Using the parameters above, 274 seedfiles will be generated. Each file will be submitted twice: once for searching the upstream sequences and once for downstream sequences. If the submit options does not run successfully (i.e. there are no job ids reported), modify Scripts/PBS.pm according to the settings of your platform.
TEISER creates an "expfile_META/" where the results are saved. In this directory, the results for each seed package are saved. The combined results are in turn saved into "expfile_TEISER/". The results themselves are grouped into "DN" (downstream), "UP" (upstream) and "UP_DN" (both). The output files include:
expfile.summary.pdf(eps) |
p-value heatmap combining significant categories (main figure) |
expfile.mimatrix.pdf(eps) |
p-value heatmap showing motif-motif interactions. |
expfile.page.pdf(eps) |
p-value heatmap showing pathways likely targeted by each element. |
expfile.motifs.pdf(eps) |
Contains the identified motifs along with their structure. |
We strongly recommend that you calculate false-discovery rates for each dataset that you use. For this, you can randomly shuffle the values assigned to each gene in your dataset and re-run TEISER. If you do find structural motifs deemed significant in this step, choose a threshold in z-score, robustness or a combination of both that will result in an acceptable FDR when applied to results from both the real input data and the shuffled one (e.g. <10%).
How do I analyze custom genomes?
Go to: TEISER_Data/species_data/
folder and open "human" file as an example. Basically, you need these files to start the program and the rest are optional:
- upstream sequences: e.g. human_5utr_1000.fa
- downstream sequences: e.g. human_5utr_1000.fa
- homology files: e.g. human_5utr_1000.fa.homologies (see FIRE's tutorial for creating these files.)
- pathway annotations: e.g. human_go_index.txt and human_go_names.txt
- alternative sequences for conservation score: e.g. mouse_5utr_1000.fa, mouse_3utr_1000.fa and human_mouse_orthologs.txt
Put these files in a folder with your species' name and create a file with the same name in "species_data" folder. In this file, you should set the following parameters using the pathnames for the files you have created:
- fastafile_up
- fastafile_dn
- goindexfile
- gonamesfile
- seedfiles
- fastafile_ort_up
- fastafile_ort_dn
- homologyfile