Some times ago, I released Tyre, a library for Typed Regular Expressions that allows matching, printing and routing. While many people expressed interest, the syntax was (rightfully) considered too obtuse for practical usage. Thanks to Petter A. ‘paurkedal’ Urkedal, this should now be fixed!
Around the same time Tyre was released, paurkedal released ppx_regexp, a ppx that allows to write matcher using the PCRE syntax:
let check_line = function%pcre
| {|(?<t>.*:\d\d) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
Lwt_io.printlf "%s %s" t host
| _ ->
Lwt.return_unit
While tyre
provided lot’s of very nice properties, it forces users
to write regular expressions using combinators, resulting in code
that can be difficult on the eyes:
let p s = Tyre.regex (Re_perl.re ?opts s)
let re = Tyre.(route [
p".*:\d\d" <* p" .* postfix/smtpd\[[0-9]+\]: connect from " *> p"[a-z0-9.-]+" -->
(fun (t, host) -> Lwt_io.printlf "%s %s" t host) ;
p".*" -->
fun _ -> Lwt.return_unit ;
])
It makes composition very easy and safe, but programming with is, to put it
lightly, an acquired taste. ppx_regexp
on the other hand is less typed and
harder to compose, but very easy to use.
A PPX for Tyre
Paurkedal and I are happy to introduce ppx_tyre, which combines
the best of both world. It allows you to write regular expressions
using PCRE syntax and obtain a typed version. For instance, the
typed regular expressions to parser strings like dim:4x3
is shown below.
Note the call syntax (?&int)
which allows you to call arbitrary typed regular expressions, such as Tyre.int
here. Such a call does not incur additional cost:
the regular expression is inlined and optimized directly.
# open Tyre ;;
# let dim = [%tyre "dim:(?&int)x(?&int)"] ;;
val dim : (int * int) Tyre.t
As before, name capture can be used. Here is a small regular expression
to capture a named followed by an optional port number.
Note how the option turned the port into an int option
. Nesting of options (?
)
and repetitions (*
and +
) will produce the correct OCaml datatype.
# let origin = [%tyre "(?<name>[[:alnum:]]+)(:(?<port>(?&int)))?"] ;;
val origin : < name : string; port : int option > Tyre.t
Named captures can also be used to name alternatives. For instance, we might want to distinguish ids from names:
# let id_or_name = [%tyre "id:(?&id:int)|name:(?<name>[[:alnum:]]+)"] ;;
val id_or_name : [ `id of int | `name of string ] Tyre.t
Typed regular expressions obtained through the ppx are exactly like the ones
built from combinators, and can thus be composed and mixed arbitrarily.
Once you have a typed regular expression, you can either compile and match it
with Tyre.compile
and Tyre.exec
; or you can use it in reverse, to print
values with Tyre.eval
.
Back to routes
Just like ppx_regexp
, ppx_tyre
supports routes!
The routing shown at the beginning of the article is directly accepted by
ppx_tyre
using the syntax function%tyre
.
In addition, you can also use all the goodies presented above in your tyre
route, such as call to other regular expressions.
# let check_line = function%tyre
| {|(?<t>(?&origin)) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
Lwt_io.printlf "%s %i %s" t#name (Option.default 0 t#port) host
| _ ->
Lwt.return_unit ;;
val check_line : unit Lwt.t Tyre.re
A route can then be matched using Tyre.exec
. Just like ppx_regexp
, tyre routes
are very fast and uses a linear-time matching algorithm (provided by ocaml-re)
that is independent of the number of routes.
What about ppx_regexp
?
The ppx_regexp
package is still supported, and receives a new version 0.4 which fixes a bug related to the type of captures under alternatives. It also
uses a brand new regexp parser (shared with ppx_tyre
).
ppx_regexp
and ppx_tyre
will keep living together for now, notably since
they share part of their implementations!
The future
This announcement also marks the release of Tyre 0.4.2,
which now uses the new Seq
type from the standard library to represent matches
under repetitions.
We hope this PPX will make Tyre much more palatable for people that find combinators difficult to digest. It’s the result of a pretty long discussion and we hope you’ll give us feedback on how to improve it further! In particular, we await opinions on this new regular expression parser and on the ergonomics of groups. Both packages are available through OPAM and their documentation can be found here.
Happy hacking!