29 July 2009

Excel files with colored non translatables... (2)

Just to show you that I am not lying:

Here is one of the Excel cells that I have to translate:

<h1>This is ugly<br />ugly ugly ugly</h1>Ugly ugly ugly ugly ugly ugly:<br /><br /><a href='%%LINK_A%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly.<br /><a href='%%LINK_B%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.<br /><a href='%%LINK_C%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly. <br /><a href='%%LINK_D%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.!

Putting that straight into OmegaT as an ODS file

  1. keeps the various styles that exist in the original Excel file (fonts/colors/etc)
  2. keeps the HTML codes as plain text.

So, in OmegaT, you end up with segments like this (I colored in green the formating inherited from the Excel file, and in red the original HTML tags):

First segment:

<h1><f0>This is ugly</f0><f1><br /></f1><f2>ugly ugly ugly</f2><f3></h1></f3><f4>Ugly ugly ugly ugly ugly </f4><f5>ugly</f5><f6>:</f6><f7><br /><br /><a href='%%LINK_A%%' class='yellow'></f7><f8>Ugly </f8><f9>ugly:</f9><f10></a></f10><f11> </f11><f12><s13/>Ugly ugly ugly ugly</f12><f14>.</f14><f15><br /><a href='%%LINK_B%%' </f15><f16>class='yellow'></f16><f17>Ugly ugly:</f17><f18></a> </f18><f19><s20/>Ugly ugly ugly ugly ugly</f19><f21>.</f21><f22><br /><a </f22><f23>href='%%LINK_C%%' class='yellow'></f23><f24>Ugly ugly:</f24><f25></a> </f25><f26>Ugly </f26><f27><s28/>ugly ugly </f27><f29>ugly ugly</f29><f30>.

Second segment:

</f30><f31><br /><a href='%%LINK_D%%' class='yellow'></f31><f32>Ugly </f32><f33>ugly</f33><f34>:</f34><f35></a> </f35><f36>U</f36><f37>gly ugly ugly ugly ugly</f37><f38>.</f38>

You see ?

All the original HTML is here, and all the original formatting is applied to each cell part separately...

This is obviously not the solution. For one thing, the inherited formatting (all the green tags) is totally irrelevant as far as the translation is concerned. Then, the HTML tags should rather be handled as HTML and not as plain text, to be able to see the translatable text better, but also to reduce the modification risks when you translate the segment.

Now, if you follow my earlier advice, the above segment would look like this in OmegaT (with blue color for emphasis), after being handled as a "normal" HTML file:

First segment:

This is ugly<br0/>ugly ugly ugly

(notice that the tags <h1> and </h1> are not shown since they are block level tags and the whole block is the segment! You just reduced the total number of tags in the segment!)

Second segment:

Ugly ugly ugly ugly ugly ugly:<br0/><br1/><a2>Ugly ugly:</a2> Ugly ugly ugly ugly.<br3/><a4>Ugly ugly:</a4> Ugly ugly ugly ugly ugly.<br5/><a6>Ugly ugly:</a6> Ugly ugly ugly ugly ugly.

Third segment:

<br7/><a8>Ugly ugly:</a8> Ugly ugly ugly ugly ugly.!

Much better isn't it ? Now, how much time and energy do you think this trick will save you next time you have such a (not so uncommon) file to translate ?

ps: I did try to do all the heavy editing in Emacs, but I fear I am not yet familiar enough with its regex syntax. I eventually had to revert to TextWrangler, but I am not giving up...

ps2: If you know a CAT tool that does not require any manipulation to reach a manageable tag number, let me know !

OmegaT small (?) update

OmegaT 2.0.3 update 2 has been released.

From the release file:

OmegaT beta version 2.0.3 update 2 has been released. For the first time, OmegaT is available as a Java Web Start application (http://en.wikipedia.org/wiki/Java_Web_Start).

Launching OmegaT from http://omegat.sourceforge.net/webstart.html requires no installation, and future updates of this version will happen automatically.

Two bugs have been corrected. One concerns segment detection in the Editor, and the other StarDict dictionaries.

Belarusian, Dutch and Catalan localisations have been updated.The OmegaT Project always welcomes developers, localizers and users to contribute their experience, knowledge and insights to the software we release.

Regarding Webstart and privacy (quoting Tony):

OmegaT Java Webstart does NOT save any of your information to our web servers. The application runs on YOUR machine. Your documents and translation memories remain on your computer, and the OmegaT project will have no access to your work or information.

(end quote)

The fact that OmegaT is now available directly from the network without any specific install procedure does not mean that OmegaT better supports network interaction between translators (data sharing etc). It only means that it is easier to distribute/use/update.

There are still issues to iron out though: how to use the Lucene tokenizer with that WebStart version... We're investigating this right now... :)

Excel files with colored non translatables...

Here comes an excel file, with pseudo HTML in the cells.

The HTML tags are red and must not be modified. If you want how the segments look check the follow up post.

Translating that in OmegaT is relatively straightforward:

  • save the file as ODS in Openffice.org (or NeoOffice)
  • put that source file in the /source/ folder of your OmegaT project
  • load the project and translate

The problem is that not only you are going to have all the HTML tags displayed for what they are within the translatable text, but you're going to have to deal with the red color tags that will surround all the HTML...

Not user friendly at all...

Another solution is to do like this:

  • copy-paste the column into a text file -> no more red color, will deal with that later
  • insert a visible marker like @@@ at each end of line
  • save the file as .html -> no more full HTML tags in the segments
  • put in /source/
  • go to Options > Segmentation and add 2 rules. One where you segment before @@@ and one where you segment after @@@, that way you'll nicely isolate the marker and it will be translated only once
  • load, translate

The resulting file should contain all the original tags, without modifications, but some characters in the original may have been converted to HTML references. Replace those with the original character if you think it is better.

Now, open your file in a text editor, remove the @@@ markers and paste the contents into a Write page in OpenOffice.

There, do a "Regular Expression" search for the string: (<[^>]*>)

The string means: a "<" followed by a number of anything but ">" followed by a ">": (basically any HTML tag). The surrounding parenthesis put the matching string into a memory for later retrieval!

and replace by "&" with the style "font color=red". "&" means "the group that was just put into memory".

All your HTML tags should be colored in red now.

Copy-paste the contents into the original file where it needs to be, and deliver !!! Also make sure that one line corresponds to one cell (manipulating the @@@ marker should not change the overall structure but one never knows!)

(There are probably easier ways to deal with such files. Let me know!)

21 July 2009

Yes ! Okapi snapshots available !

From Yves Savourel @ Okapi Framework:

If you are interested in testing the latest build of the Java-based Okapi tools and libraries you can download them from here:

Those are the latest development builds (more recent than the normal releases), they have all the latest features, BUT they are not tested and may be unstable. They are the versions the developers are using. Use them at you own risks.

The times listed in the directory is date/time when the files was generated (in US Pacific Time). We will update those snapshots relatively frequently.

09 July 2009

OmegaT small update

OmegaT 2.0.3_1 was just released, it comes now with a complete Japanese/German/Slovenian UI/tutorial.


07 July 2009

Okapi webinar

A message from Okapi's Yves Savourel:

Just a note to let you know that I'll be giving a little introduction on using the Okapi Tools. Note that it will be focused on the new Java-based tools.

ENLASO is hosting this free webinar on July 16th at noon Mountain Time.
It's free and you can register here:

The latest version of the Java Okapi tools is here:

02 July 2009

Bored Trados users ?

For bored Trados users who want to work in a free software environment, here are the latest news:

1) OmegaT is able to translate TTX files that have source=target (output option if I am not mistaken), that requires a small utility called Toxic, to be found here:

2) It is possible to deliver "cleaned/uncleaned" file sets by translating the file in OmegaT and processing the resulting TMX in OpenOffice.org with the Anaphraseus macro (a Wordfast equivalent for OOo). The discussion about the process is here:

And Anaphraseus is here:

(updated 7/8)

3) Ask your Trados clients to send you TMX memories instead of Trados memories, you'll be able to use them in OmegaT without problems. Trados seems to have problems creating conforming TMX files. If OmegaT complains, use TMXValidator from Maxprograms.

TMXValidator is here: http://www.maxprograms.com/products/tmxvalidator.html

4) Some file formats are not directly supported by OmegaT. Use Rainbow to create OmegaT projects from unsupported files. Not all files supported by Trados are supported by Rainbow but Rainbow still covers a very reasonable range of "exotic" formats.

OmegaT directly supports the following formats:

  • OpenDocument/OpenOffice.org
  • Microsoft Open XML
  • DocBook XML
  • TeX
  • Plain text
  • SRT subtitles
  • PO (monolingual)
  • XLIFF (Okapi generated)
  • ResX ressources
  • INI ('key=value' format)
  • Java bundle.properties
  • HTML Help Compiler
  • QuarkXPress CopyFlow Gold
Rainbow has filter for the following formats: Okapi Framework - Filters. Rainbow is here: http://okapi.opentag.com/downloads.html


That is the title of an article I wrote in Japanese about OmegaT.

The article was published in the 45th issue of the AAMT Journal. I've put a copy of the article online, here: 自由に翻訳!

The article is not an introduction to using OmegaT, if you want the introduction, download OmegaT and start it. You'll have a tutorial in the language of your OS.

Download link: OmegaT_2.0.3_Beta.dmg

OmegaT in Kyoto on July 11th

I'll be making a presentation about OmegaT at the "Open Source Conference 2009 Kansai" on the 11th of July, from 11.15 to 12.00.


People who can't make it for the seminar are welcome to the OmegaT "booth" where I'll be from Friday morning.