04 August 2009

Rainbow, XLIFF and OmegaT

(I've updated the contents of the file to make things clearer for people who are not familiar with XLIFF - 08/05)

OmegaT's support for complex XLIFF is not really optimal. I'm going to spare you the details but basically, unless you tweak the file significantly, OmegaT is likely not going to be able to leverage all the available data.


Let's say that we have a XLIFF file with the following data:

First segment, not yet translated:
ba bi bu
In OmegaT, this part should be in a supported source file located in /source/

Reference translation from a first reference file:
ba bi, translated to バ ビ, matches the segment to translate at 66%
In OmegaT, this part should be in a TMX file located in /tm/

Reference translation from a second reference file:
ba bi bu be, translated to バ ビ ブ ベ, matches the segment to translate at 75%
This part too should be in a TMX file located in /tm/

Second segment, already translated in the same document:
ba bi, translated to バ ビ
Since this data comes from the same document and is a validated translation, it should be in the project_save.tmx file located in /omegat/

Such an XLIFF file would look like this:

First segment, not yet translated(<target> is empty):

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>

First reference translation for that segment:

<alt-trans match-quality="66" origin="first_reference_translation">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</alt-trans>

Second reference translation for that segment:

<alt-trans match-quality="75" origin="second_reference_translation">
<source xml:lang="en">ba bi bu be</source>
<target xml:lang="ja">バ ビ ブ ベ</target>
</alt-trans>
</trans-unit>

(↑ end of the first segment)

Second segment, already translated in that document:

<trans-unit approved="yes" id="source_file">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</trans-unit>



If you want to know more about the XLIFF specification, look at this page: XLIFF Version 1.2


Now, let's see how Rainbow can help us leveraging all the data directly from within OmegaT.


Rainbow


Here comes Rainbow. I'm mentioned Rainbow, or the Okapi Framework a number of times already.

Rainbow is a Java software that acts as a swiss army knife for localization processes. It is a filter to/from complex formats, it is a batch search/replace tool etc.

Don't forget to see the Okapi Webinar that was given a few weeks ago by Yves Savourel. It is about 1h long and goes into great details to show you what Rainbow/Okapi is about. OmegaT is used as an example too and the following text can be considered as a detailed explanation of what you see in the Webinar.

Webinar: http://www.translate.com/Language_Tech_Center/Webinar_Portal.aspx?id=132

The Okapi group has decided to frequently release snapshots of the code so that users don't have to wait for milestones to get a taste of the most recent features.
The snapshots are available here: http://okapi.opentag.com/snapshots/. Get the rainbow_carbon-macosx_... file.

When you unzip the package, you will find a rainbow.sh script file. Open the Terminal where the file is located, make the file executable by typing the following command at the prompt:
chmod +x rainbow.sh

Then, call the script from Terminal to launch Rainbow.


When you launch Rainbow, you're displayed with an empty file list.

  1. drag & drop your XLIFF file on that list. It should be the list corresponding to the "Input List 1" tab.
  2. click on the "languages and encoding" tab and set the appropriate information. For example, an English manual translated to Japanese should be:
    • Source=EN (you can use a variant, like EN-US)
    • Encoding=UTF-8
    • Target=JA
    • Encoding=UTF-8
  3. Then go to the "Utilities > Translation Package Creation"
  4. Select OmegaT
  5. click on the Package Location tab
  6. Select a directory where Rainbow will create the OmegaT project
  7. Click OK.


You're back to the main window where the status bar indicates that the XLIFF file has been processed. What has Rainbow done to your XLIFF file ?

XLIFF



A XLIFF file is a mix of a number of things. First, it contains strings to translate in <trans-unit> sections:

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>
</trans-unit>


Then, it may contain reference data. The first type is, obviously, what has already been translated, in similar <trans-unit> sections:

<trans-unit approved="yes" id="source_file">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</trans-unit>


The other type is what is embedded to serve as reference, it comes as <alt-trans> sections within a <trans-unit>:

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>
<alt-trans match-quality="66" origin="first_reference_translation">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</alt-trans>
<alt-trans match-quality="75" origin="second_reference_translation">
<source xml:lang="en">ba bi bu be</source>
<target xml:lang="ja">バ ビ ブ ベ</target>
</alt-trans>
</trans-unit>




When Rainbow is used to create OmegaT projects from XLIFF files it looks for the 3 types of information and separates them in a way that OmegaT can manage.


  1. OmegaT cannot manage "empty" <target> elements in <trans-unit> sections. So, Rainbow recreates the source file so that it contains the <source> data in the <target> section. That way, OmegaT only has to overwrite that data:

    <trans-unit id="source_file">
    <source xml:lang="en">ba bi bu</source>
    <target xml:lang="ja">ba bi bu</target>
    </trans-unit>


  2. OmegaT cannot manage already translated data in the XLIFF file. So, Rainbow creates a typical "already translated" OmegaT data file that contains the already translated segments: the project_save.tmx located in the /omegat/ folder of the project:

    <tu tuid="source_file">
    <tuv xml:lang="en"><seg>ba bi</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ</seg></tuv>
    </tu>


  3. OmegaT cannot managet reference data in the XLIFF file. So, Rainbow creates a typical "reference translation" OmegaT data file that contains all the embedded translation segments: the alternate.tmx located in the /tm/ folder of the project:

    <tu tuid="first_reference_translation">
    <tuv xml:lang="en"><seg>ba bi</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ</seg></tuv>
    </tu>

    <tu tuid="second_reference_translation">
    <tuv xml:lang="en"><seg>ba bi bu be</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ ブ ベ</seg></tuv>
    </tu>




That way, OmegaT can seamlessly access all the data available in that complex XLIFF file. Since Rainbow creates a whole OmegaT project, the only thing necessary to proceed is to open the project with OmegaT and translate.

Delivery



Nothing could be simpler. When you are done with your translation, create the translated files from within OmegaT.

Then, drag and drop the manifest.xml file located at the root of the OmegaT project created by Rainbow on Rainbow's main window where the Input List 1 tab is displayed. When manifest.xml is visible on the list, use the "Utilities > Translation Package Post-Processing..." menu.

Rainbow will use all the data it saved in the project to recreate a "proper" XLIFF file for delivery. That file contains all the data that was contained by the original XLIFF, its structure is identical. The only difference is that now it includes the data you have translated in OmegaT.