May 07, 2004

Regular Expression Fun

Note to self:

When adding ‘#’ to the character class [-. ], what you do not want to do is:

[#-. ]

but rather:

[-#. ].

For those who don’t spend a lot of time with regular expressions: the stuff between [ and ] represents a character class, which means that whatever is in there can match a single character in your target string (e.g. if you have [abc], it can match the ‘a’ in at, the ‘b’ in bog, and the ‘c’ in cut. What you can also do is specify ranges, so [a-z] will match all lower-case letters from a through z. If you want to match a literal ‘-‘, it needs to be the first character in the character class. Therefore, when I added ‘#’ to the front of the character class, I was saying I wanted to match all characters between ‘#’ and ‘.’. In case you’re curious, that is the equivalent of [#$%&'()*+,-.]. This, needless to say, caused some unexpected errors.

On the plus side, I don’t think I’ll make that mistake again…

Posted by Bill Stilwell at May 7, 2004 08:15 PM
Comments

There are a lot of pitfalls in regular expressions. 'Mastering Regular Expressions' by Jeffrey Friedl helps me much to avoid them.

Posted by: Serge Moralez at May 14, 2004 02:40 AM