Welcome to the homepage of grappe
What is grappe
grappe (french meaning bunch of grapes) is a pattern matching program developed at Adage project at LORIA
Brief introduction to grappe
grappe is a pattern matching program.
It looks, in a text, for a set of patterns containing don't care
symbols (wildcards) of unbounded or bounded length. The size of
patterns, as well as their length, is unlimited. grappe is expected to
be particularly efficient for searching for a big number of patterns,
each containing don't cares. However, grappe is competitive with
respect to other programs (egrep, agrep) on small pattern sets too.
Patterns may occur with substitution errors. The number of errors and
their occurrences in a pattern have to be specified by the user (this
feature is restricted in grappe 3.0 by one error per keyword, see
below).
A number of other useful options are present. As an example, the user
can specify letters occurring at a given position in a pattern by
using multiple choice, or negation.
A special option allows to compile a grappe version for DNA
treatment.
How it works
The simplest usage of grappe is to call
grappe <patterns> <textfile>
<textfile> is a text file you want to search in.
<patterns> is a description of patterns you want to search for.
Here are some examples.
- abac matches any text containing an occurence of abac
- ab#ac matches any text containing an occurence of ab followed, within an arbitrary distance, by an occurence of ac. For example babcac and abac are matched whereas acab is not matched.
- ab(1,5)ac matches any text containing an occurence of ab followed, within a distance between 1 and 5, by an occurence of ac. For example abaac and abacac are matched but abac and abacadbdac are not matched.
- a[bc]d matches any text containing an occurence of abd or an occurence of acd
- ac[^d] matches any text containing an occurence of ac immediately followed by any letter from the underlying alphabet other than d.
- ?abc matches any text containing an occurence of abc or an occurence of any word at the substitution distance 1 from abc, e.g. bbc, acc, or abd.
- abaa|accdc matches any text containing either an occurence of
abaa or an occurence of accdc
- ?[gtc]tgcttacgtg#?tayta(1,3)?gct#?tgct#t[^a]ta|?tatagcgg#tgct[tcg] matches any text containing an occurence of one of the two patterns separated by | sign.
You can consult the README file for a more detailed description of grappe usage.
Options
First, grappe has a special version for working with DNA and RNA
sequences (see installation instruction for how to compile this version). This version is case-insensitive, works with five-letter alphabet A,C,G,T,U (U is a synonym of T), and skips the end of line. Besides, this version recognizes the so-called IUPAC nucleid acid codes which are standart abbreviations for nucleotide combinations.
Both generic and DNA version of grappe has additional options:
- -r considers "end of records" characters as usual characters
- -l matches each line of the text independently
- -c counts the number successfully matched records (lines) of the text
- -d <eor_chars> redefines the "end of record" characters. By default, a record is a line of a Unix file, MS-Dos file or Macintosh file.
- -e <patterns> introduces the patterns (this option is useful when the patterns begin with -, othewise it is unnecessary)
- -f <pattern-file> specifies the file with patterns
Besides, the full version of grappe has one more option:
- -i forces grappe to be case insensitive
Note that you specify exactly one set of pattern. If no text file is specified then grappe tries to match the
standard input. You can specify a set of text files by listing them or using shell
regular expressions. For example,
grappe 'void' *.c
looks for pattern void in all c-files in the directory.
The usage of grappe is the the following:
grappe [-r] [-l] [-c] [-d <eor_chars>] ([-e] <patterns> | -f <pattern-file>) [>textfiles>]
Download grappe now !
You can download the grappe archive now.
Install grappe now !
Assume you want to have two versions of grappe - a generic
one and one for working with DNA/RNA files.
After downloading and unzipping the grappe archive, type the following commands.
tar xfvz grappe-3.0.tar.gz
cd grappe-3.0
mkdir generic
cd generic
../configure
make
cd ..
mkdir dna
cd dna
../configure --enable-DNA
make
After that, a generic version will be located in subdirectory generic of grappe-3.0, and the DNA version in subdirectory dna.
Now go ahead and use the version you want.
Get in touch
Although grappe has been tested for a long time, a bug can still be found. If you find any bug in grappe, please send a bug report to Gregory Kucherov and Sébastien Briais. Don't forget to indicate in your mail the command line
that provoked the bug (i.e. the set of patterns, options, version of grappe and if possible text files). Also, don't forget to mention which computer and operating system you use and the message you got (segmentation fault, assertion failed, ...). We will try to fix the bug as soon as possible.
If you have any suggestions, comments or critics, don't hesitate to contact Gregory Kucherov and Sébastien Briais.
Authors and references
grappe is based on an algorithm described in
Gregory Kucherov, Michaël Rusinowitch,
Matching a Set Of Strings With Variable Length Don't Cares,
6th Symposium on Combinatorial Pattern Matching,
Helsinki, July 1995, Lecture Notes in Computer Science,
vol. 937, (1995), pp 230-247,
Extended version in Theoretical Computer Science, vol. 178 (1997), pp. 129-154
You can download the postscript document here.
The following people contributed to the programming of grappe: