Sunday, May 20, 2012

Who is using Regexp::Grammars ?

I've been using Regexp::Grammars for a couple of projects and I find this module really amazing : so much power in such a compact form ! Actually, the whole thing is a gigantic hack, but an extremely clever hack. The learning curve is steep, but once you are used to it, playing with callback actions, grammar inheritance trees or fancy "result distillation" strategies is a joy. The only drawback is that running callback functions within the Perl regexp compiler is sometimes tricky (in Perl versions prior to 5.14, the regexp engine is not reentrant; and debugging may be acrobatic).

I just had a look at the reverse dependencies and was very surprised to find so few CPAN modules that depend on Regexp::Grammars, while its predecessor Parse::RecDescent has many more dependencies. So I'm wondering why Regexp::Grammars is not more widespread. Some guesses :
  • the learning curve
  • the dependency on perl 5.10 regexes
  • the regex engine reentrance problem mentioned above
Any other opinions ?


  1. Maybe you could write up a few use cases where you found it useful?

  2. ^ Yes, I would love to see some examples.

  3. I converted one of my PRD grammars and found problems with large inputs.
    Regexp::Grammars just blew up. From memory (and this is a while ago)
    I found PRD was actually quicker. Now, I didn't spend too long on the exercise once I found out that Regexp::Grammars wasn't a shinier go-faster version of PRD, so the problems are maybe resolveable, but I was very disappointed as I really could use something that has the flexibility of PRD. I hugely love the ability to skip over sections of documents that are of no interest to me via the use of nifty Regexps, allowing me to write 'Island Grammars', and was really hoping for something ten times better from the Wizard of Oz - Dr D. Sadly, it doesn't look like we have it. If some-one with an IQ on the right side of 140(SB) can come up with examples and documentation for the rest of us to follow, then I'll look at Regexp::Grammars again.

  4. There is also Marpa for writing parsers...

  5. i gave a talk on it the last FPW :)

    i used Regexp::Grammar to parse some RPN queries and other stuff but the bottom up approach was too slow for log parsing for example. That's why i wrote persec (inspired by parsec combinators).

    i'll give a talk on parsec at fpw'12 : cya soon :)

  6. I have used R::G for parsing a minilanguage (Language::Expr) and a couple other cases. It's very convenient compared to, say, Marpa, because of integrated lexer (everything is inside regex). Last time I tried, error reporting is not that good and performance suffers for long input. For future projects I might migrate to PRD or Marpa.

  7. I use R::G for all my non-trivial parsing.

    1. Well, computational time does considerably go up when parsing larger and larger strings that are more and more complex. For my current task, the time has gone up so much that I'm considering parallelizing my code so that each string will be processed in its own thread.