04 November 2007

Regular expressions and text editing

Regular expressions



When you edit glossaries or translation memories a few regular expressions always come in handy.

Regular expressions are pattern matching expressions. You create a pattern in the search field of a text editor and the text editor will look for anything that matches the pattern. Similarly, once you have found the pattern, you can replace it with a second pattern. That way, you end up with super powerful search-replace routines that can save you hours of stress on thorny texts...

Here are two articles that you can use to get up to speed on the topic:

Regular Expressions from Princeton University
Regular Expressions Unfettered from Apple Developer Connection

As you can see, regular expression creation is not always an easy task and a helper application can sometimes save you a lot of time.

There are a few free regexp testers for OSX but they basically all work from outside your text editor.

The one that seems to be the easiest to use is reggy, a nice piece of software licensed under the GNU General Public License v2.

On his blog, Bill Clementson talks about regex-tool.el. As the name indicates, regex-tool.el is an elisp library for Emacs

Besides for text editors, regular expressions are supported by all kinds of software. Including, of course, the major Office applications on the market. Look at their user manuals to find more information about regexp handling there.

Text editing



A number of very good text editors are available for the Mac. TextEdit, OSX's default text editing application, is fine but lacks even basic regular expressions support. It does plenty of other things though, and can be considered as a simple word processor with most of what is needed in that field (read and write Word files etc).

Others major text editors on Mac include:
Smultron (free as in "speech"),
TextWrangler (free as in "beer"),
SubEthaEdit (not free, except for the old version),
BBEdit (not free at all) and
TextMate (not free either and with very bad multibyte characters support).

There are also the "ancestors" that are VI (VIM) and Emacs. Both are available from the Terminal application but require some time to get used to. Still, they are definitely some of the most powerful application OSX hosts...

Emacs is not exactly the text editor that I'd recommend to my wife. But Aquamacs, a "Mac" version in terms of interface is much friendlier and can almost right away be used as a replacement for TextEdit as far as, well, text editing is concerned. Being an adaptation of Emacs, it is just as free and also distributed under the GNU General Public License...

Emacs is written in Elisp. So anything you write in Elisp within Emacs can de-facto extend the functionality of Emacs. In other words, Emacs is just a huge macro editing environment...

Whichever text editor you decide to use, don't forget to read the user manual and especially the "searches" and "regular expressions" chapters.