Regular Expression Processor
by Michela Becchi
This site provides the code I wrote to process regular expressions. My first goal is to allow line rate regular expression matching in the networking domain. However, the tool can be used in any context where accelerating regular expression matching is desirable (e.g., bibliographic search).
The tool allows:
- building deterministic and non deterministic finite automata (NFA and DFA) corresponding to any given set of regular expressions;
- compressing DFA through recently proposed schemes;
- evaluating finite automata based designs.
The tool supports regular expressions as defined in terms of formal language theory. Constructs such as lazy and greedy quantifiers, back-references, lookaheads and lookbehinds are not supported and will cause the tool to abort.
Code Summary
The code includes:
- NFA and DFA generation algorithms for sets of regular expressions
- DFA minimization (J. Hopcroft, "An nlogn algorithm for minimizing states in a finite automaton," in Theory of Machines and Computation, J. Kohavi, Ed. New York: Academic, 1971, pp. 189-196)
- Export/import of DFA to/from text format
- Export of NFA and DFA into DOT format for pictorial representation (http://www.graphviz.org/)
- NFA transformation algorithms
- epsilon transition removal
- state reduction
- Default transition compression algorithms for DFA:
- Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley and Jonathan Turner, "Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection," Proceedings of ACM SIGCOMM'06, Pisa, Italy, September 12-15, 2006.
- Michela Becchi and Patrick Crowley, "An Improved Algorithm to Accelerate Regular Expression Evaluation", Proceedings of the 2007 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Orlando, FL, December, 2007.
- Hybrid-FA generation algorithm, as described in:
- Michela Becchi and Patrick Crowley, "A Hybrid Finite Automaton for Practical Deep Packet Inspection", In Proceedings of the International Conference on emerging Networking EXperiments and Technologies (CoNEXT), New York, NY, December, 2007.
- DFA, NFA and Hybrid-FA traversal routines
- Regular expression workload consisting of:
- a synthetic regular expression generator
- a traffic trace generator
- a memory layout generator (implementing different memory encoding schemes)
- a memory/cache simulator
- In particular, the workload is described in:
- Michela Becchi, Mark Franklin and Patrick Crowley, "A Workload for Evaluating Deep Packet Inspection Architectures", In Proceedings of the 2008 IEEE International Symposium on Worload Characterization (IISWC), Seattle , WA, September, 2008.
Note
- Only authenticated users with confirmed email-address can
- Download the code
- Edit the publications page
- Opening an account is free. Please send an email to Michela Becchi indicating your name and your institution if you want to be a registered user. I would appreciate if you could also let me briefly know why you are interested in regular expressions. After creating a new account, you will receive a confirmation email. Please follow the instructions to confirm your identity. You may also use the preferences page to modify your account data.
- I do not plan to include the code implementing the DFA compression algorithm described in
- Michela Becchi and Srihari Cadambi, "Memory-Efficient Regular Expression Search Using State Merging", in Proceedings of IEEE INFOCOM, Anchorage, AK, May 2007
- That code was written during an internship at NEC Labs, Princeton, NJ and is NEC property.
Acknowledgements
This work has been supported by National Science Foundation grants CCF-0430012 and CCF-0427794. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).