Archive for Regular Expressions

Regular expressions and $~, $’, $1, etc.

When I wanted to play around with Oniguruma (the forthcoming regular expression engine for Ruby 2.0) I used the Oniguruma gem, which kept me from needing to recompile Ruby or installing 1.9. One of the things about using the gem, though, is that you can’t do this:

001:0> re = Oniguruma::ORegexp.new('e')
002:0> "test".match(re)
TypeError: wrong argument type Oniguruma::ORegexp (expected Regexp)
        from (irb):2:in `match'
        from (irb):2

Now, I’m really used to the String#match way of doing things. That’s been my idiom. I’ve never ever even thought of using Regexp#match. For some reason (dumb luck, probably) String#match makes more sense to me. So I thought I was pretty smart by adding this to my .irbrc:

require 'oniguruma'
include Oniguruma

# allow String#match to take Oniguruma regexps
class String
  alias_method :old_match, :match

  def match(regexp)
    case regexp
    when Oniguruma::ORegexp
      regexp.match(self)
    else
      old_match(regexp)
    end
  end
end

But what I found out (after Hpricot stopped working, looking at its source code, googling for $’ [fun], and playing around in irb) was that no longer were global variables $~, $', $1, etc. being set. Oh wait, did I say global variables?

Surprise! They aren’t. They’re local. They have the scope of wherever the match is happening. So when I wrapped everything within a method, those variables aren’t viewable outside the method. But then, they aren’t really variables anyway. They’re “parser-level macros”, says taw. The confusion is that a lot of people call them global variables (because they certainly look that way).

It tripped me up.

I’ve stopped redefining String#match, and I’m getting used to ORegexp#match when I need an Oniguruma expression.

I hope this information will ever prove useful to someone.

Comments

Playing with Oniguruma

I’ve been busy reading Jeffrey Friedl’s Mastering Regular Expressions and getting a little sad that some of the coolest tricks available are not (yet) available in Ruby. Oniguruma, the regular expression engine coming in Ruby 2.0, is more feature-full and faster than what we have now, and makes whole swaths of Mastering Regular Expressions suddenly relevant. I hear it’s possible to recompile 1.8 to use Oniguruma instead, but I’m not quite ready for that.

I am ready for lookbehind and named captures, though. Thankfully, the Oniguruma gem is available to save me from trying to mess up my Ruby install.

The one unsettling thing about using the Oniguruma gem, though, is how they left String’s match method alone. It’s regexp.match(string) only for these things. Thankfully, that’s easily fixed:

class String

  def o_match(regexp)
    case regexp
    when Oniguruma::ORegexp
      regexp.match(self)
    else
      old_match(regexp)
    end
  end

  alias_method :old_match, :match
  alias_method :match, :o_match

end

Comments (2)

Modifying regexes

I recently had a very long regular expression that I needed two versions of, one for anywhere and one for just the end of a line. Handled!

RE = /long with a (capture) or (two)/
RE_AT_END = Regexp.new(RE.source + '$', RE.options)

This even works if the regular expression you’re modifying was created via something like Regexp.compile("(?i-mx:test)"), although it turns out a little ugly. In that case, I might (might) recommend:

Regexp.compile(RE.to_s.sub(/\)$/, '$)'))

Comments (1)