emacs regex with emacs lisp

June 28 update: using \#1 instead of (string-to-number \1)

A reader on reddit mentionned that the manual also had the "\#d" construct to replace the often used (string-to-number \d) function.

That regex-replace improvement was mentionned in Stevve Yegge's emacs 22 introduction, back in 2006.

Last but not least, I noticed that the whole post was originally written with (string-as-number ...) when the correct function name is (string-to-number ...)


Not strictly related to translation but here is what's happening...

I've resumed studies last year, trying to finish an MA in Japan Studies I started 25 years ago.

For the first year, I only have to write a 30ish pages dissertation on my subject (representation of women in kendo magazines in Japan) and I decided to go the emacs + org-mode way, with the easy export to ODF function that's packaged with the thing.

So I decided to write each chapter in a different org file, and send them one by one to my director. But then, for the final delivery I needed to put all that in one big file and was faced with the fact that all my footnotes would need to be re-indexed manually because each file had notes starting at 1...

I usually use BBedit for any serious regex work. Mostly because the interface is clearer than emacs, and the regexp feels more modern (\d vs [:digit:])

But one thing you can't do in BBEdit is to send commands to the replace string. For ex, in my case, add 21 to the matching number, which seems pretty trivial, when you think of it, but doing that will involve other technologies, like using perl or some other command line thing.

In emacs, however, everything can be interpreted as an expression, hence you can insert code wherever you want and get the result from that code right in the document.

The org-mode footnotes all look like [fn:12], where "12" is the note number that I need to replace with an incremented number. Since there are no instances of fn:\d+ without the brackets that are not footnotes, I figured I could just be searching for that string:

fn:\([:digit:]+\)

Notice that in emacs, "(" and ")" need to be escaped, also I could have used the [0-9] class.

In BBedit I'd just need:

fn:(\d+)

And now I need to replace that with the expression that will add 21 to the number.

In BBEdit, I'd be stuck here. I just can't add anything to a match. In emacs, I can replace the match with that:

fn:\,(+ 21 (string-to-number \1))

The emacs lisp expression is "(+ 21 (string-to-number \1))", which means "convert the \1 match that is a string into its numerical value and add 21 to it".

But, wasn't \1 supposed to match [0-9]+, which is a number? Well, yes, but really it's just digits, hence strings, that have no numerical value whatsoever, so first, we need the expression to convert them to a numerical value before adding 21 to them.

Now, the trick is to have the expression be handled as an operation and not as an arbitrary string, and that's where the "\," prefix comes into play.

"\," tells the replace engine that the string that follows must be interpreted as an emacs lisp expression and not as a mere string. With it, the regexp replaces properly adds 21 to my note numbers, and I get two dozen footnotes updated in one fell swoop...

I love BBEdit and its people, but emacs is really a gift that keeps on giving.


Here are some handy references:

the emacs regexp-replace function
Regexp Replacement
the emacs regexp syntax
Syntax of Regular Expressions
the emacs-lisp string-to-number function
Conversion of Characters and Strings

And here is a really super short introduction to lisp syntax

There is no real need to have a very deep understanding of emacs lisp to use this regexp-replace function. Just remember that a lisp expression generally looks like this:

(operator operands)

Where the operator is a generally a function, like + or string-to-number above, and the operands can be any expression that is accepted by the operator. So, here:

(+ 21 (string-to-number \1))

means:

add 21 to the result of the expression (string-to-number \1)

with (string-to-number \1) meaning:

convert the string matched by \1 to its numeric value

Obviously, if \1 is not a string, the conversion will fail and the addition won't work. And without that conversion, if we had just added \1 as a string, the addition that expects numbers as operands would have failed.

I just realized that this is my first emacs lisp related post ever ! I'd like to thank that person I met in Tokyo about 15 years ago who showed me the way. It's an egg that definitely took some time to hatch...

Maxprograms is back, take 2.

Free software by a world class XLIFF expert

I first wrote about Maxprograms in 2007. That was after Rodolfo M. Raya left Heartsome to pursue his activity as the Maxprograms software company. I’ve been meaning to write about his work for a long time now, especially since he decided to put all his products on Github, thus allowing translators to get the code, build the things, get the updates and run them on their side, for free.

Maxprograms also offers a lot of technical articles on the subjects of DITA, XLIFF and XML in localization. Check the page here if you are interested.

Rodolfo M. Raya is an editor of the XLIFF 2.0 specification and is a member of the OASIS XLIFF technical committee. He is also co-author of A Practical Guide to XLIFF 2.0.

July 4, 2021 update: all of Maxprograms products are now available under an OSI approved Open Source license

Most of Maxprograms' products come with an easy-to-use installer and only the packages listed in italics below do not, because in their case an installer is not relevant.

If you check the repository list at https://github.com/rmraya?tab=repositories you will see the following software packages. I added the product page on Maxprograms' site (the "..." links) to make it easier for you to find extra information:

  • Eclipse Public License 1.0 (OSI approved)
    • Fluenta (DITA translation manager) ...
    • Swordfish (standards based XLIFF CAT, supports Trados Studio packages) ...
    • TMXEditor (TMX editor, as the name suggests) ...
    • Stingray (document aligner) ...
    • XLIFFManager (GUI for OpenXLIFF) ...
    • OpenXLIFF (command line utility to create, merge, validate XLIFF files) ...
    • TMXValidator (validator/cleaner for TMX files) ...
    • TMEngine (translation memory engine) ...
    • SRXEditor (SRX editor) ...
    • MVDServer (simple web server) ...
    • XLIFFValidation (web-based XLIFF validation tool) ...
    • RemoteTM (remote TM server) ...
  • MIT (FSF and OSI approved)
    • Conversa (DITA publisher) ...
    • KeysAnalyzer (DITA keys analyzer) ...

A "normal" freelance translator would use Swordfish as an XLIFF translation editor/project manager. With eventually TMXEditor to work on client TMX, Stingray to align file sets, and TMXValidator to check and clean TMX files.

More advanced users, or PMs would use XLIFFManager or OpenXLIFF, along with XLIFFValidation and SRXEditor.

DITA specialists would use Fluenta, Conversa and KeyzAnalyzers.

Users who want to self-host services would use XLIFFValidation, MVDServer, RemoteTM.

Maxprograms' packages offer pretty much anything a translator would need, in very cleanly packaged solutions, with excellent user support, either community-based (on groups.io), or directly from Maxprograms if you are a subscriber.

And if your main translation editor in not Swordfish, you can still benefit from all the other packages depending on your needs.

DIY software

Fluenta, Swordfish, TMXEditor and Stingray are subscription-based, but allow translators to run the software for free from the source code (RemoteTM is proposed as a "software as a service" hosted by Maxprograms, but you can host it on your own server if you want).

What that means is that the packaged installer comes as a subscription, but if you don’t mind using Terminal to run your software, which is, let’s be honest, not much of a hurdle, you can use the software, access its updates, eventually make modifications for your own use, etc. for free.

The requirements to download, build and run the software are all given on each product’s Github repository, but here is the gist:

Ant and Node.js are also available from Homebrew (https://brew.sh).

To build the software you’ll need to checkout (clone) the repositories. You also need to make sure that Ant uses Java 11 to build the package.

Rodolfo gives all the steps for building (for ex. Stingray):

    $ git clone https://github.com/rmraya/Stingray.git
    $ cd Stingray
    $ ant
    $ npm install
    $ npm start

Java ?

Maxprograms' applications used to be multiplatform Java only applications. Now they use a combination of Java and Javascript (even if the names are close, the two languages are totally unrelated) see below for Rodolfo's explanations.

Apple used to ship Java when macOS was young. It was in fact one of its selling points. Apple has stopped distributing Java a long time ago and now suggests that users install Oracle Java. Changes in the Oracle license and the availability of free Java distributions have allowed users to still benefit from Java without having to pay for a commercial user license.

Even after installing AdoptOpenJDK, your machine will still prefer to use an older Java version if you have one installed on your machine.

Which means that if you run:

$ echo $JAVA_HOME

in Terminal it won’t point to Java 11, and you won’t be able to build the software.

On my machine, the above command gives

/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/

A short investigation shows that the newly installed Java 11 is here:

/Library/Java/JavaVirtualMachines/adoptopenjdk-11-openj9.jdk/Contents/Home/

The way to have ant use Java 11 instead of Java 8 for the build process is simply to give ant the right pointer:

$ export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11-openj9.jdk/Contents/Home/; ant

Once you’re done with the process, you can export your Java_Home back to its original value, just to be on the safe side. You’ll need to point to Java 11 only to build the software after a code update ($ git pull).

Back to the command line

Once you’ve run

$ npm install

you’re free to launch the application any time you want with

$ npm start

That’s it. This procedure works for Swordfish, Stingray, TMXEditor.

You can also create an Applescript that packages the whole process into a nice application that you can use to launch the various packages as you'd launch standard macOS applications.

A simple Applescript to do that could be:

use AppleScript version "2.4" -- Yosemite (10.10) or later use scripting additions set SwordfishPath to "/path/to/Swordfish/local/repository" set myCommand to "cd " & SwordfishPath & "; npm start" tell application "Terminal"'s front window delay 0.1 do script myCommand in its last tab end tell

Electron ?

Maxprograms' applications used to be Java only applications. But Oracle (Java's main caretaker) neglect of GUI libraries led to a not very satisfactory situation for Java GUI application developers. Here is Rodolfo's rationale for his move away from Java-based GUI libraries:

I use HTML + JavaScript + CSS for the UI and Java for core libraries. In essence, my apps are web pages running on a Java server.

NodeJS provides access to Chrome's V8 JavaScript Engine. Electron is a bridge that provides native windowing environment for JavaScript pages displayed on top of Chrome.

JavaScript is a powerful language, but it is dangerously sloppy. Instead of coding directly on JavaScript, I use TypeScript (which is a safer language) and "transpile" my code to JavaScript for deployment.

I have not switched completely to TypeScript because there aren't good XML libraries for JavaScript. All translation standards are defined using XML and Java is still the best option for handling them. The good news is that JLIFF (a JSON-based version of XLIFF) is currently in development.

Why switch to HTML + CSS for the UI if Java is a cross-platform tool?, you may ask. The answer is simple: Java dropped its major UI libraries.

Initially, Java shipped with AWT, a horrible toolkit with poor design and feature lacking. AWT was so bad that two options replaced it: Swing (originally developed by Sun and currently used by OmegaT) and SWT (the one I used in Swordfish, developed by Eclipse Foundation).

Oracle deprecated Swing in Java 8 and replaced it with JavaFX. When Java 9 was released, JavaFX was abandoned and development moved to its own open-source project, away from Oracle.

The Eclipse Foundation, which developed and maintained SWT for years, stopped improving SWT when JavaFX appeared. It did not fully embrace JavaFX and SWT is languishing since then.

With SWT dying, I looked at JavaFX. It tries to mimic HTML + CSS without success. Not good enough. The real alternative: adopt HTML and CSS.

Et voilà!

I'm not even scratching the surface of what Maxprograms has offered to translators since Heartsome disappeared. Each software package deserves a whole article.

Maxprograms' packages are yet another proof that working on macOS as translators, project managers or localization engineers is possible. And we're lucky that most of the multiplatform packages that allow us to work on macOS are backed by very talented people and communities who do their best to keep us from having to change of platform.

Disclaimer: I am not affiliated in any way to Maxprograms. Just so that you know...

Popular posts (last 30 days):