Some times ago, I released Tyre, a library for Typed Regular Expressions that allows matching, printing and routing. While many people expressed interest, the syntax was (rightfully) considered too obtuse for practical usage. Thanks to Petter A. ‘paurkedal’ Urkedal, this should now be fixed!

Around the same time Tyre was released, paurkedal released ppx_regexp, a ppx that allows to write matcher using the PCRE syntax:

let check_line = function%pcre
   | {|(?<t>.*:\d\d) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
      Lwt_io.printlf "%s %s" t host
   | _ ->

While tyre provided lot’s of very nice properties, it forces users to write regular expressions using combinators, resulting in code that can be difficult on the eyes:

let p s = Tyre.regex ( ?opts s)
let re = Tyre.(route [
  p".*:\d\d" <* p" .* postfix/smtpd\[[0-9]+\]: connect from " *> p"[a-z0-9.-]+" -->
    (fun (t, host) -> Lwt_io.printlf "%s %s" t host) ;
  p".*" --> 
    fun _ -> Lwt.return_unit ;

It makes composition very easy and safe, but programming with is, to put it lightly, an acquired taste. ppx_regexp on the other hand is less typed and harder to compose, but very easy to use.

A PPX for Tyre

Paurkedal and I are happy to introduce ppx_tyre, which combines the best of both world. It allows you to write regular expressions using PCRE syntax and obtain a typed version. For instance, the typed regular expressions to parser strings like dim:4x3 is shown below. Note the call syntax (?&int) which allows you to call arbitrary typed regular expressions, such as here. Such a call does not incur additional cost: the regular expression is inlined and optimized directly.

# open Tyre ;;
# let dim = [%tyre "dim:(?&int)x(?&int)"] ;;
val dim : (int * int) Tyre.t

As before, name capture can be used. Here is a small regular expression to capture a named followed by an optional port number. Note how the option turned the port into an int option. Nesting of options (?) and repetitions (* and +) will produce the correct OCaml datatype.

# let origin = [%tyre "(?<name>[[:alnum:]]+)(:(?<port>(?&int)))?"] ;;
val origin : < name : string; port : int option > Tyre.t 

Named captures can also be used to name alternatives. For instance, we might want to distinguish ids from names:

# let id_or_name = [%tyre "id:(?&id:int)|name:(?<name>[[:alnum:]]+)"] ;;
val id_or_name : [ `id of int | `name of string ] Tyre.t

Typed regular expressions obtained through the ppx are exactly like the ones built from combinators, and can thus be composed and mixed arbitrarily. Once you have a typed regular expression, you can either compile and match it with Tyre.compile and Tyre.exec; or you can use it in reverse, to print values with Tyre.eval.

Back to routes

Just like ppx_regexp, ppx_tyre supports routes! The routing shown at the beginning of the article is directly accepted by ppx_tyre using the syntax function%tyre. In addition, you can also use all the goodies presented above in your tyre route, such as call to other regular expressions.

# let check_line = function%tyre
   | {|(?<t>(?&origin)) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
      Lwt_io.printlf "%s %i %s" t#name (Option.default 0 t#port)  host
   | _ ->
      Lwt.return_unit ;;
val check_line : unit Lwt.t

A route can then be matched using Tyre.exec. Just like ppx_regexp, tyre routes are very fast and uses a linear-time matching algorithm (provided by ocaml-re) that is independent of the number of routes.

What about ppx_regexp ?

The ppx_regexp package is still supported, and receives a new version 0.4 which fixes a bug related to the type of captures under alternatives. It also uses a brand new regexp parser (shared with ppx_tyre). ppx_regexp and ppx_tyre will keep living together for now, notably since they share part of their implementations!

The future

This announcement also marks the release of Tyre 0.4.2, which now uses the new Seq type from the standard library to represent matches under repetitions.

We hope this PPX will make Tyre much more palatable for people that find combinators difficult to digest. It’s the result of a pretty long discussion and we hope you’ll give us feedback on how to improve it further! In particular, we await opinions on this new regular expression parser and on the ergonomics of groups. Both packages are available through OPAM and their documentation can be found here.

Happy hacking!

You can discuss this on discuss and reddit.