30 August 2009

Okapi new version

In case that had escaped you:

<quote>

Hello everyone,

We are happy to announce the milestone 3 of the Okapi Framework for Java.

To try to make things easier there are now two main distributions:

=== okapi-lib (for developers):


--> This distribution includes the libraries, documentation and examples you would need to use the Okapi Framework with your programs. It also include Tikal, a command-line tool to do basic task like simple extract and merge. This distribution is not UI dependent and therefore the same for all platforms.

=== okapi-apps (for users and developers): <-- The one you are likely to want


--> This distribution is the same as okapi-lib (minus the developers documentation). And in addition, it includes the UI-dependent components of the framework, as well as Rainbow (Localization Toolbox) and Ratel (SRX Editor). Because the UI depends on each platform, there are different flavors of this distribution: Windows, Linux (32-bit and 64-bit), and Macintosh (Carbon and now Cocoa).


All this can be downloaded from Okapi's main Web site:
http://okapi.opentag.com/downloads.html,

as well as from the Google-Code project's site:
http://code.google.com/p/okapi/downloads/list.


A large part of the work done for this milestone is not visible: We moved the build structure to Maven, so we can develop using continuous integration more efficiently. Ultimately it should make the framework more robust and better tested.

We did managed to work on a few visible changes too :) Here are some of them:

=== The first version of Tikal in Java has been implemented. This tool allows you to perform simple tasks from the command-line. For example:

C:/Tmp>tikal -x *.docx *.html -sl EN -tl FR

will extract to XLIFF all the DOCX and HTML files in the Tmp directory.

C:/Tmp>tikal -m *.xlf -sl EN -tl FR

will merge them back into their respective formats.

For more information on Tikal, see http://okapi.opentag.com/help/applications/tikal/index.html.

=== Various bugs have been fixed in several filters (thanks for the bugs reports) and some improvements have been implemented in others.

=== There is a new filter for Qt TS files.

=== Cocoa support has been added for the Macintosh.

=== There is a new TM connector to query Translate Toolkit TM servers. You can try it using Tikal. For example, if you have a local server running on port 8080, the command:

C:/>tikal -q "open file" -tt localhost:8080 -sl EN -tl FR

will get the matches for "open file" in that TM.

=== As usual, a more complete list of the main changes can be seen in the changes.html document that comes with the distribution.


Bug reports and enhancement or features requests can be posted here:
http://code.google.com/p/okapi/issues/list


Cheers,
-the Okapi Team

<end quote>

29 August 2009

Snow Leopard

I've only used it for a few hours but here are the first things I found:

1) Default Java version is Java 1.6, and OmegaT seems to have issues with that. It works but there is an issue with the font selection that is not respected in the Editor window.

2) The input method can be set to stick to a document. For ex, you can have a TextEdit document that uses Kotoeri, another that uses US and changing of document will not change the input method. That's an improvement over the situation in Tiger where, if I remember well, the input method was "attached" to the application but not to the document.

3) The old option "multilingual spellchecking" can use a subset of the available languages. That way, you can more efficiently check the spelling without changing the language in the spellchecker interface every time a paragraph uses a different language.

4) The spellchecker can also automatically correct your mistakes.

5) The services are indeed much better integrated to the workflow. A right click on an element (text, image etc) will trigger a contextual menu that shows possible services for that element.

28 August 2009

Calculator bug ?

This bug seems to have been fixed in Snow Leopard.

My bug report to Apple:

Summary:
Calculator does not accept "." input after conversion from the "Convert" menu.

Steps to reproduce:
1- enter a value with a decimal point, the decimal point can be entered either from the keyboard or from the the Calculator UI.
2- select any conversion from the Convert menu
3- the conversion result is displayed
4- enter a value with a decimal point again

Expected result:
The new value should have a decimal point in the location where it has been input

Actual result:
The decimal point cannot be input, neither from the keyboard nor from the Calculator UI. It is necessary to hit "C" on the Calculator UI to reinitialize Calculator to be able again to input a decimal point.

05 August 2009

OmegaT Java Web Start

In previous article, I mentioned the release of OmegaT as a Java Web Start application.

Once OmegaT.jnlp is downloaded, a double-click should open it in Java Web Start.app.

If that is not the case, if it opens as a text file for example, it means that .jnlp files associations to Java Web Start.app are broken.

To fix that, right-click on the .jnlp file, Choose "Open With", select "Others..." at the bottom of the list and browse your disk to find the application.

Java Web Start.app is located in /Applications/Utilities/Java/ for older versions of Java and in /System/Library/CoreServices/ for the most recent version of Java (Java update 4, released on June 15 2009).

Once you have selected the application, check the "Always open with" box at the bottom of the dialog.

Thanks to Yves who send me the comment that triggered this "investigation" and to this blog article that clarified everything.

04 August 2009

Rainbow, XLIFF and OmegaT

(I've updated the contents of the file to make things clearer for people who are not familiar with XLIFF - 08/05)

OmegaT's support for complex XLIFF is not really optimal. I'm going to spare you the details but basically, unless you tweak the file significantly, OmegaT is likely not going to be able to leverage all the available data.


Let's say that we have a XLIFF file with the following data:

First segment, not yet translated:
ba bi bu
In OmegaT, this part should be in a supported source file located in /source/

Reference translation from a first reference file:
ba bi, translated to バ ビ, matches the segment to translate at 66%
In OmegaT, this part should be in a TMX file located in /tm/

Reference translation from a second reference file:
ba bi bu be, translated to バ ビ ブ ベ, matches the segment to translate at 75%
This part too should be in a TMX file located in /tm/

Second segment, already translated in the same document:
ba bi, translated to バ ビ
Since this data comes from the same document and is a validated translation, it should be in the project_save.tmx file located in /omegat/

Such an XLIFF file would look like this:

First segment, not yet translated(<target> is empty):

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>

First reference translation for that segment:

<alt-trans match-quality="66" origin="first_reference_translation">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</alt-trans>

Second reference translation for that segment:

<alt-trans match-quality="75" origin="second_reference_translation">
<source xml:lang="en">ba bi bu be</source>
<target xml:lang="ja">バ ビ ブ ベ</target>
</alt-trans>
</trans-unit>

(↑ end of the first segment)

Second segment, already translated in that document:

<trans-unit approved="yes" id="source_file">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</trans-unit>



If you want to know more about the XLIFF specification, look at this page: XLIFF Version 1.2


Now, let's see how Rainbow can help us leveraging all the data directly from within OmegaT.


Rainbow


Here comes Rainbow. I'm mentioned Rainbow, or the Okapi Framework a number of times already.

Rainbow is a Java software that acts as a swiss army knife for localization processes. It is a filter to/from complex formats, it is a batch search/replace tool etc.

Don't forget to see the Okapi Webinar that was given a few weeks ago by Yves Savourel. It is about 1h long and goes into great details to show you what Rainbow/Okapi is about. OmegaT is used as an example too and the following text can be considered as a detailed explanation of what you see in the Webinar.

Webinar: http://www.translate.com/Language_Tech_Center/Webinar_Portal.aspx?id=132

The Okapi group has decided to frequently release snapshots of the code so that users don't have to wait for milestones to get a taste of the most recent features.
The snapshots are available here: http://okapi.opentag.com/snapshots/. Get the rainbow_carbon-macosx_... file.

When you unzip the package, you will find a rainbow.sh script file. Open the Terminal where the file is located, make the file executable by typing the following command at the prompt:
chmod +x rainbow.sh

Then, call the script from Terminal to launch Rainbow.


When you launch Rainbow, you're displayed with an empty file list.

  1. drag & drop your XLIFF file on that list. It should be the list corresponding to the "Input List 1" tab.
  2. click on the "languages and encoding" tab and set the appropriate information. For example, an English manual translated to Japanese should be:
    • Source=EN (you can use a variant, like EN-US)
    • Encoding=UTF-8
    • Target=JA
    • Encoding=UTF-8
  3. Then go to the "Utilities > Translation Package Creation"
  4. Select OmegaT
  5. click on the Package Location tab
  6. Select a directory where Rainbow will create the OmegaT project
  7. Click OK.


You're back to the main window where the status bar indicates that the XLIFF file has been processed. What has Rainbow done to your XLIFF file ?

XLIFF



A XLIFF file is a mix of a number of things. First, it contains strings to translate in <trans-unit> sections:

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>
</trans-unit>


Then, it may contain reference data. The first type is, obviously, what has already been translated, in similar <trans-unit> sections:

<trans-unit approved="yes" id="source_file">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</trans-unit>


The other type is what is embedded to serve as reference, it comes as <alt-trans> sections within a <trans-unit>:

<trans-unit approved="no" id="source_file">
<source xml:lang="en">ba bi bu</source>
<target state="new" xml:lang="ja"/>
<alt-trans match-quality="66" origin="first_reference_translation">
<source xml:lang="en">ba bi</source>
<target xml:lang="ja">バ  ビ</target>
</alt-trans>
<alt-trans match-quality="75" origin="second_reference_translation">
<source xml:lang="en">ba bi bu be</source>
<target xml:lang="ja">バ ビ ブ ベ</target>
</alt-trans>
</trans-unit>




When Rainbow is used to create OmegaT projects from XLIFF files it looks for the 3 types of information and separates them in a way that OmegaT can manage.


  1. OmegaT cannot manage "empty" <target> elements in <trans-unit> sections. So, Rainbow recreates the source file so that it contains the <source> data in the <target> section. That way, OmegaT only has to overwrite that data:

    <trans-unit id="source_file">
    <source xml:lang="en">ba bi bu</source>
    <target xml:lang="ja">ba bi bu</target>
    </trans-unit>


  2. OmegaT cannot manage already translated data in the XLIFF file. So, Rainbow creates a typical "already translated" OmegaT data file that contains the already translated segments: the project_save.tmx located in the /omegat/ folder of the project:

    <tu tuid="source_file">
    <tuv xml:lang="en"><seg>ba bi</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ</seg></tuv>
    </tu>


  3. OmegaT cannot managet reference data in the XLIFF file. So, Rainbow creates a typical "reference translation" OmegaT data file that contains all the embedded translation segments: the alternate.tmx located in the /tm/ folder of the project:

    <tu tuid="first_reference_translation">
    <tuv xml:lang="en"><seg>ba bi</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ</seg></tuv>
    </tu>

    <tu tuid="second_reference_translation">
    <tuv xml:lang="en"><seg>ba bi bu be</seg></tuv>
    <tuv xml:lang="ja"><seg>バ ビ ブ ベ</seg></tuv>
    </tu>




That way, OmegaT can seamlessly access all the data available in that complex XLIFF file. Since Rainbow creates a whole OmegaT project, the only thing necessary to proceed is to open the project with OmegaT and translate.

Delivery



Nothing could be simpler. When you are done with your translation, create the translated files from within OmegaT.

Then, drag and drop the manifest.xml file located at the root of the OmegaT project created by Rainbow on Rainbow's main window where the Input List 1 tab is displayed. When manifest.xml is visible on the list, use the "Utilities > Translation Package Post-Processing..." menu.

Rainbow will use all the data it saved in the project to recreate a "proper" XLIFF file for delivery. That file contains all the data that was contained by the original XLIFF, its structure is identical. The only difference is that now it includes the data you have translated in OmegaT.