Skip to content

Latest commit

 

History

History
234 lines (173 loc) · 8.66 KB

index.md

File metadata and controls

234 lines (173 loc) · 8.66 KB

Regular expression implementations

Uncategorized

  • Bash

  • BSD grep

  • Code Search [src]

    Background in “Regular Expression Matching with a Trigram Index—or—How Google Code Search Worked” by Russ Cox (2012)

  • ECMAScript RegExp

    • XRegExp [src]: extended parsing for JavaScript RegExp
  • exrex [src]

    Author: Adam Tauber (2012)

    Generates all or random matching strings to a regexp.

    TODO: Evaluate its similar projects.

  • Flex [Wikipedia]

    • Flex 2.5.2 ported to OpenVMS [src v20, v30, …]
  • GNU Bison

    • Bison ported to OpenVMS [src v20 (Andrew Consortium Bison A2.3), v30 (Andrew Consortium Bison A2.3), …, v80]
  • GNU sed

    The version 2.03 readme indicates that sed uses GNU rx and before that GNU regex.

    • GNU sed ported to OpenVMS [src v30 (sed 2.05) …, v80 (sed 2.03)]
  • I-Regexp [rfc]

    RFC 9485 I-Regexp: An Interoperable Regular Expression Format

  • ICgrep [site] [architecture]

    Mirror(?)

    Andrew Gallant says ICgrep implements (most of?) UTS #18 level 2 [HN].

  • .NET System.Text.RegularExpressions [docs]

    • dlclark/regexp2 [src]: port of .NET System.Text.RegularExpressions to Go, which has RE2 and ECMAScript compatibility modes
  • Shell globs

  • Truffle TRegex [src]

  • Unix egrep [history]

    Author: Al Aho

    First appeared in Seventh Edition Unix

  • Unix grep [history]

    First appeared in Fourth Edition Unix

  • Unix lex

  • Unix sed [Wikipedia]

  • Unix yacc [Wikipedia]

  • UTS #18: Unicode Regular Expressions [standard]

    Andrew Gallant has discussed why implementing UTS #18 level 2 is difficult and has rarely been done (e.g., by ICgrep).

Other

  • Mattias Wadman's libfa [src]: automata library in c to determinize, minimize, and translate regexps

    Author: Mattias Wadman

    many many years ago i worked at a network equipment company and did my unfinished master thesis about using software and hardware DFA:s for flow classification. We wanted to use it do fancy QoE for tv/phone traffic but customer wanted to block file sharing 🙂 anyway it never ended up being used. but! years later i manged to convince my boss to open source most of the library code i wrote, can be found here https://github.com/wader/libfa

    main idea was to be able to do a union of FA:s and while determinize/minimize distinguish and keep track of original FA:s accepting states

    Aug 14, 2015 i asked for permission to open source it, was ok:ed Nov 18, 2015.

    the thesis has the date July 2, 2010 on the front page, not sure what that means 🙂

    so i maybe started working on the code early 2010 or so

  • Mike French's myrex [src] [HN]: converts regexp via NFA to an Elixir process network

  • fancy_regex [src] [docs]: hybrid NFA and backtracking engine, that delegates to Rust regex when possible

  • Sri Panyam's tlex: used Russ Cox's “Regular Expression Matching Can Be Simple And Fast” article as a reference [HN]

  • Reini Urban's matcher for Asterisk [src archive] [HN]

    Author: Reini Urban (2003)

    2003 I added a dynamic regular expression matcher to asterisk, which has a weird synatx, but otherwise the matcher looked fine.

    In the end it was removed from CVS before a release without me noticing because the variable capturing was not thread-safe. Would have been trivial to fix.

    TODO: Archive it from CVS

    Influenced by Steffen Offermann's xstrcmp.c (1991) from the Snippets Collection.

    • rurban/tiny-matcher [src] [HN]: extends it
  • tiny-regex-c [src]

    Design is inspired by Rob Pike's regex-code for the book "Beautiful Code" available online here.

    Supports a subset of the syntax and semantics of the Python standard library implementation (the re-module).

    Formally verified using KLEE.

    • rurban/tiny-regex-c [src]

Benchmarks