Regular expressions and $~, $’, $1, etc.
When I wanted to play around with Oniguruma (the forthcoming regular expression engine for Ruby 2.0) I used the Oniguruma gem, which kept me from needing to recompile Ruby or installing 1.9. One of the things about using the gem, though, is that you can’t do this:
001:0> re = Oniguruma::ORegexp.new('e')
002:0> "test".match(re)
TypeError: wrong argument type Oniguruma::ORegexp (expected Regexp)
from (irb):2:in `match'
from (irb):2
Now, I’m really used to the String#match way of doing things. That’s been my idiom. I’ve never ever even thought of using Regexp#match. For some reason (dumb luck, probably) String#match makes more sense to me. So I thought I was pretty smart by adding this to my .irbrc:
require ‘oniguruma‘ include Oniguruma # allow String#match to take Oniguruma regexps class String alias_method :old_match, :match def match(regexp) case regexp when Oniguruma::ORegexp regexp.match(self) else old_match(regexp) end end end
But what I found out (after Hpricot stopped working, looking at its source code, googling for $’ [fun], and playing around in irb) was that no longer were global variables $~, $', $1, etc. being set. Oh wait, did I say global variables?
Surprise! They aren’t. They’re local. They have the scope of wherever the match is happening. So when I wrapped everything within a method, those variables aren’t viewable outside the method. But then, they aren’t really variables anyway. They’re “parser-level macros”, says taw. The confusion is that a lot of people call them global variables (because they certainly look that way).
It tripped me up.
I’ve stopped redefining String#match, and I’m getting used to ORegexp#match when I need an Oniguruma expression.
I hope this information will ever prove useful to someone.