Welcome to the homepage of grappe
What is grappe
grappe (french meaning bunch of grapes) is a pattern matching program developed at Adage project at LORIA
Brief introduction to grappe
grappe is a pattern matching program.
It looks, in a text, for a set of patterns containing don't care
symbols (wildcards) of unbounded or bounded length. The size of
patterns, as well as their length, is unlimited. grappe is expected to
be particularly efficient for searching for a big number of patterns,
each containing don't cares. However, grappe is competitive with
respect to other programs (egrep, agrep) on small pattern sets too.
Patterns may occur with substitution errors. The number of errors and
their occurrences in a pattern have to be specified by the user (this
feature is restricted in grappe 3.0 by one error per keyword, see
A number of other useful options are present. As an example, the user
can specify letters occurring at a given position in a pattern by
using multiple choice, or negation.
A special option allows to compile a grappe version for DNA
How it works
The simplest usage of grappe is to call
grappe <patterns> <textfile>
<textfile> is a text file you want to search in.
<patterns> is a description of patterns you want to search for.
Here are some examples.
You can consult the README file for a more detailed description of grappe usage.
- abac matches any text containing an occurence of abac
- ab#ac matches any text containing an occurence of ab followed, within an arbitrary distance, by an occurence of ac. For example babcac and abac are matched whereas acab is not matched.
- ab(1,5)ac matches any text containing an occurence of ab followed, within a distance between 1 and 5, by an occurence of ac. For example abaac and abacac are matched but abac and abacadbdac are not matched.
- a[bc]d matches any text containing an occurence of abd or an occurence of acd
- ac[^d] matches any text containing an occurence of ac immediately followed by any letter from the underlying alphabet other than d.
- ?abc matches any text containing an occurence of abc or an occurence of any word at the substitution distance 1 from abc, e.g. bbc, acc, or abd.
- abaa|accdc matches any text containing either an occurence of
abaa or an occurence of accdc
- ?[gtc]tgcttacgtg#?tayta(1,3)?gct#?tgct#t[^a]ta|?tatagcgg#tgct[tcg] matches any text containing an occurence of one of the two patterns separated by | sign.
First, grappe has a special version for working with DNA and RNA
sequences (see installation instruction for how to compile this version). This version is case-insensitive, works with five-letter alphabet A,C,G,T,U (U is a synonym of T), and skips the end of line. Besides, this version recognizes the so-called IUPAC nucleid acid codes which are standart abbreviations for nucleotide combinations.
Both generic and DNA version of grappe has additional options:
Besides, the full version of grappe has one more option:
- -r considers "end of records" characters as usual characters
- -l matches each line of the text independently
- -c counts the number successfully matched records (lines) of the text
- -d <eor_chars> redefines the "end of record" characters. By default, a record is a line of a Unix file, MS-Dos file or Macintosh file.
- -e <patterns> introduces the patterns (this option is useful when the patterns begin with -, othewise it is unnecessary)
- -f <pattern-file> specifies the file with patterns
Note that you specify exactly one set of pattern. If no text file is specified then grappe tries to match the
standard input. You can specify a set of text files by listing them or using shell
regular expressions. For example,
- -i forces grappe to be case insensitive
grappe 'void' *.c
looks for pattern void in all c-files in the directory.
The usage of grappe is the the following:
grappe [-r] [-l] [-c] [-d <eor_chars>] ([-e] <patterns> | -f <pattern-file>) [>textfiles>]
Download grappe now !
You can download the grappe archive now.
Install grappe now !
Assume you want to have two versions of grappe - a generic
one and one for working with DNA/RNA files.
After downloading and unzipping the grappe archive, type the following commands.
tar xfvz grappe-3.0.tar.gz
After that, a generic version will be located in subdirectory generic of grappe-3.0, and the DNA version in subdirectory dna.
Now go ahead and use the version you want.
Get in touch
Although grappe has been tested for a long time, a bug can still be found. If you find any bug in grappe, please send a bug report to Gregory Kucherov and Sébastien Briais. Don't forget to indicate in your mail the command line
that provoked the bug (i.e. the set of patterns, options, version of grappe and if possible text files). Also, don't forget to mention which computer and operating system you use and the message you got (segmentation fault, assertion failed, ...). We will try to fix the bug as soon as possible.
If you have any suggestions, comments or critics, don't hesitate to contact Gregory Kucherov and Sébastien Briais.
Authors and references
grappe is based on an algorithm described in Gregory Kucherov, Michaël Rusinowitch,
Matching a Set Of Strings With Variable Length Don't Cares,
6th Symposium on Combinatorial Pattern Matching,
Helsinki, July 1995, Lecture Notes in Computer Science,
vol. 937, (1995), pp 230-247,
Extended version in Theoretical Computer Science, vol. 178 (1997), pp. 129-154
You can download the postscript document here.
The following people contributed to the programming of grappe: