MEME suite

The MEME suite is a collection of tools for the discovery and analysis of sequence motifs.

Motif discovery

MEME

Multiple Expectation maximizations for Motif Elicitation (MEME) is a tool for discovering motifs in a group of related DNA or protein sequences. ^[1] MEME takes as input a group of DNA or protein sequences and outputs as many motifs as requested up to a user-specified statistical confidence threshold. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.^[2]

GLAM2

Gapped local alignment of motifs (GLAM 2) is a tool for discovering gapped motifs in a group of DNA or protein sequences. Unlike MEME, GLAM2 does not try to find several different motifs all in one go. Instead, it performs replicates: it tries to find the best possible motif multiple times. ^[3]

DREME

Discriminative Regular Expression Motif Elicitation (DREME) is a tool for discovering motifs in large collections of sequences. DREME is computationally efficient and therefore is suitable for motif search on large data sets derived from ChIP-seq (Chromatin immunoprecipitation followed by sequencing) experiments. In the interest of computational efficiency, DREME finds only motifs that can be expressed in the IUPAC alphabet, which contains the standard DNA alphabet ACGT as well as eleven 'wildcard' characters (for example, R indicates either A or G).

MEME-ChIP

MEME-ChIP is a tool for discovering motifs in data sets derived from ChIP-seq (Chromatin immunoprecipitation followed by sequencing) experiments. ^[4]

Motif search

FIMO

Find Individual Motif Occurrences (FIMO) is a tool for finding instances of motifs in a sequence database. FIMO searches the database for the provided motifs, and reports a q-value for each match. ^[5]

GLAM2SCAN

GLAM2SCAN is a tool for finding occurrences of a GLAM2 motif in a sequence database. ^[6]

MAST

Motif Alignment & Search Tool (MAST) is a tool for searching biological sequence databases for sequences that contain an occurrence of each motif in a given set of motifs. MAST scores the matches and reports p-values for four types of events:

Position p-value: The p-value of a match of a given position within a sequence to a motif is defined as the probability of a randomly selected position in a randomly generated sequence having a match score at least as large as that of the given position. Note:If MAST is combining reverse complement DNA strands, the position p-value is not corrected for multiple tests.
Sequence p-value: The p-value of a match of a sequence to a motif is defined as the probability of a randomly generated sequence of the same length having a match score at least as large as the largest match score of any position in the sequence.
Combined p-value: The p-value of a match of a sequence to a group of motifs is defined as the probability of a randomly generated sequence of the same length having sequence p-values whose product is at least as small as the product of the sequence p-values of the matches of the motifs to the given sequence.
E-value: The E-value of the match of a sequence in a database to a group of motifs is defined as the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence does and is equal to the combined p-value of the sequence times the number of sequences in the database.

Motif enrichment analysis

SpaMo

Spaced Motif Analysis Tool (SpaMo) is a tool for inferring interactions between transcription factors. SpaMo takes a set of sequences (typically sequences surrounding ChIP-seq peaks), a motif represented in these sequences, and a database of known motifs. SpaMo searches the database for instances of database motifs enriched in sites neighboring the given motif. These enrichments suggest physical interaction between the factors that bind each motif. ^[7]

CentriMo

Central Motif Enrichment Analysis (CentriMo) is a tool for inferring direct DNA binding from ChIP-seq data. CentriMo is based on the observation that the positional distribution of binding sites matching the direct-binding motif tends to be unimodal, well centered and maximal in the precise center of the ChIP-seq peak regions. CentriMo takes a set of sequences and plots the occurrence of motifs relative to the ChIP-seq peak. Motifs that occur exclusively at the peak provide good evidence of direct binding, while motifs that do not occur in a consistent position relative to the peak may not bind directly. ^[8]

Motif cluster search

MCAST

Motif Cluster Alignment and Search Tool (MCAST) is a tool for searching a sequence database for statistically significant clusters of non-overlapping occurrences of a set of motifs. Such clusters may represent regulatory modules.

Motif comparison

TOMTOM

Tomtom is a tool for comparing a DNA motif to a database of known motifs. TOMTOM searches for statistically significantly similar motifs to the query motif. TOMTOM is useful for determining whether a discovered motif is novel or is a variation of a known motif.

Motif function analysis

GOMO

Gene Ontology for MOtifs (GOMO) is a tool for identifying possible roles for DNA binding motifs. It does so by comparing genes the motif occurs upstream of to a Gene Ontology database. If the motif occurs statistically significantly upstream of genes related to a particular function (for example, lactose digestion), it suggests that the transcription factor that binds the motif may regulate that function (for example, by promoting transcription of proteins that digest lactose).

References

^ Bailey T.L., Elkan C. Unsupervised Learning of Multiple Motifs In Biopolymers Using EM. Mach. Learn. 1995;21:51–80.
^ Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011.
^ MC Frith, NFW Saunders, B Kobe, TL Bailey, "Discovering sequence motifs with arbitrary insertions and deletions", PLoS Computational Biology, 4(5):e1000071, 2008
^ Philip Machanick and Timothy L. Bailey, "MEME-ChIP: motif analysis of large DNA datasets", Bioinformatics, 2712, 1696-1697, 2011
^ Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011
^ MC Frith, NFW Saunders, B Kobe, TL Bailey (2008) Discovering sequence motifs with arbitrary insertions and deletions, PLoS Computational Biology, 4(5), e1000071, 2008
^ Whitington, T., Frith, M. C., Johnson, J., & Bailey, T. L. (2011). Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Research, 39(15), e98-e98.
^ Bailey, T. L., & Machanick, P. (2012). Inferring direct DNA binding from ChIP-seq. Nucleic Acids Research, 40(17), e128-e128

External links

Official website.

[1] Bailey T.L., Elkan C. Unsupervised Learning of Multiple Motifs In Biopolymers Using EM. Mach. Learn. 1995;21:51–80.

[2] Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011.

[3] MC Frith, NFW Saunders, B Kobe, TL Bailey, "Discovering sequence motifs with arbitrary insertions and deletions", PLoS Computational Biology, 4(5):e1000071, 2008

[4] Philip Machanick and Timothy L. Bailey, "MEME-ChIP: motif analysis of large DNA datasets", Bioinformatics, 2712, 1696-1697, 2011

[5] Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011

[6] MC Frith, NFW Saunders, B Kobe, TL Bailey (2008) Discovering sequence motifs with arbitrary insertions and deletions, PLoS Computational Biology, 4(5), e1000071, 2008

[7] Whitington, T., Frith, M. C., Johnson, J., & Bailey, T. L. (2011). Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Research, 39(15), e98-e98.

[8] Bailey, T. L., & Machanick, P. (2012). Inferring direct DNA binding from ChIP-seq. Nucleic Acids Research, 40(17), e128-e128

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]