Fast and systematic genome-wide discovery of conserved
regulatory elements using a non-alignment based approach

Olivier Elemento and Saeed Tavazoie
Lewis-Sigler Institute for Integrative Genomics



Update 04/20/2007: new improved Fastcompare distribution, following the publication of Fastcompare: a non-alignment approach for genome-scale discovery of DNA and mRNA regulatory elements using network-level conservation (book chapter in Methods in Molecular Biology series on Comparative Genomics, Humana Press, edited by Nick Bergman)


Binaries and source code

- Fastcompare C source code
- Fastcompare executable for Linux
- REcompare C source code
- Comprehensive tutorial on using FastCompare

Sequence data (FASTA format)
- S. cerevisiae / S. bayanus,  4,358 orthologous 1,000 bp upstream regions (source : SGD)
- S. cerevisiae / S. paradoxus,  4,695 orthologous 1,000 bp upstream regions (source : SGD)
- S. cerevisiae / S. castelli,  4,113 orthologous 1,000 bp upstream regions (source : SGD)
- C. elegans / C. briggsae, 10,894 orthologous 2,000 bp upstream regions (source : ENSEMBL)
- D. melanogaster / D. pseudoobscura,  11,306 orthologous 2,000 bp and 5,000 bp upstream regions (source : ENSEMBL, Baylor College)
- H. sapiens / M. musculus,  15,983 orthologous  2,000 bp and 5,000 bp upstream regions (source : ENSEMBL)





Conserved regulatory elements between S. cerevisiae vs S. bayanus

- raw sorted lists of k-mers (and gapped k-mers)
- highest scoring 379 k-mers (k=7,8,9), with support by independent biological data and orientation/position biases
- highest scoring interactions, with support by independent biological data (functional categories) and median distances

Conserved regulatory elements between S. cerevisiae vs S. paradoxus
- raw sorted lists of k-mers (and gapped k-mers)
- highest scoring 400 k-mers (k=7,8,9), with support by independent biological data

Conserved regulatory elements between S. cerevisiae vs S. castelli
- raw sorted lists of k-mers
- highest scoring 376 k-mers (k=7,8,9), with support by independent biological data and orientation/position biases




Conserved regulatory elements between C. elegans and C. briggsae
- raw sorted lists of k-mers (and gapped k-mers)
- highest scoring 375 k-mers (k=7,8,9), with support by independent biological data and orientation/position biases
- highest scoring interactions, with support by independent biological data (functional categories) and median distances




Conserved regulatory elements between D. melanogaster and D. pseudoobscura, 2000 bp upstream regions
- raw sorted lists of k-mers (and gapped k-mers)
- highest scoring 371 k-mers (k=7,8,9), with support by independent biological data and orientation/position biases
- highest scoring interactions, with support by independent biological data (functional categories) and median distances




Conserved regulatory elements between D. melanogaster and D. pseudoobscura, 5000 bp upstream regions
- raw sorted lists of k-mers (k=7,8,9)




Conserved regulatory elements between H. sapiens and M. musculus,
2,000 bp upstream regions
- raw sorted lists of k-mers (and gapped k-mers)
- highest scoring 272 k-mers (k=7,8,9), with support by independent biological data and orientation/position biases
- highest scoring interactions, with support by independent biological data (functional categories) and median distances

Conserved regulatory elements between H. sapiens and M. musculus, 5,000 bp upstream regions
- raw sorted lists of k-mers (and gapped k-mers)

Conserved regulatory elements between H. sapiens and R. norvegicus, 2,000 bp upstream regions
- raw sorted lists of k-mers (and gapped k-mers)

Conserved regulatory elements between M. musculus and R. norvegicus, 2,000 bp upstream regions
- raw sorted lists of k-mers (and gapped k-mers)