07 May 2008

OpenOffice.org 3.0 Beta available !

It is official, OpenOffice.org 3.0 Beta version is available for download.

The feature list is here and you'll be glad to know that support for Microsoft 2007 file format (OOXML) is now a reality !

Also, OpenOffice.org for Mac is now an Aqua application that does not require the X11 windowing environment. Those of you who don't know what that means are blessed !

The stable version is planned for release in September. Although the free office suite is still not considered stable, it is stable enough for most of your non-mission critical work. I've been using test versions of 3.0 for a while now and I have been very pleased with it. I've noticed that it is significantly faster than NeoOffice at launch too.

Feel free to download it from here.

27 April 2008

Text alignment on Mac

When you start a translation, it is important to prepare your reference materials so that you can use them in the most efficient way possible.

Computer Aided Translation (CAT) tools have a special way to do that. They can use translation memories (TM) that contain source and target language information that will be matched against the source text to provide translation suggestions.

Using translation memories has two major benefits. The first is that any text present in the memory and also in a similar form in the file to translate can see its TM translation be recycled in your work. The other benefit is that the translation memory, if properly used will increase the style consistency of your final work.

Creating TMs is also called aligning bilingual texts. The end format will depend on your CAT flavor but the standard today is TMX (Translation Memory eXchange), an XML dialect maintained by LISA .



There are 3 ways to align text.

  1. by hand
  2. with a-free software
  3. with free software (including free of charge)

I used to do it by hand (copy the texts in 2 text editors windows with line numbers displayed, hack the contents so that strings on the same line number correspond, paste TMX code all over this).

In a good text editor, you can do all the work with the keyboard using only shortcuts. The paste TMX code all over this part is a little tricky but some smart people have created simple scripts in Perl or Python to ease the pain .

Then, I bought Heartsome's Translation Suite, a set of Java applications for translators. The set includes TMXEditor, a TMX file editor as its name says, and I did most of my alignment there for a while. I've always had mixed feelings about TMXEditor. There are display glitches, it does not seem easy to work only with the keyboard, it uses a lot of memory... TMXEditor does a few things very well (TMX merging and various checks), but on Mac, it is not the best tool for aligning texts .

The best tool (for now) on Mac is a native application called Appletrans, previously known as Alair. Appletrans had been on my hard disk for so long that I had almost forgotten about it, always promising myself that I'd test it to write a blog entry about it.

Appletrans is a text editor for translators. It is available free of charge directly from Apple, from their localization page, and besides for being a very nice aligner, it also is a full fledged CAT tool that a number of people have adopted as their tool of choice .

The following is an introduction to text alignment in Appletrans. I'd like to thank Steven DeWitt for helping me when I was lost in the shortcuts and for confirming that what follows is not merely the product of my feverish imagination.

Aligning text in Appletrans


  1. Prepare the files


  2. (This part is very well explained in the Appletrans manual. Don't hesitate to refer to it.)


    1. Appletrans does not open .doc files.
      → save the files to align to the RTF format in TextEdit

      Appletrans can also open a number of other file formats by default and plugins are available to add even more file formats.



    2. Open the source file and the target file from the finder or in Appletrans.
      → in the Finder, right click, Open With, Appletrans should come in the list.

      The files should be displayed with most of their styling but without any images, if any were present in the original files. Also, the files names now come with an .alair extension in replacement for the .rtf extension (see the title bar).



    3. Segment the two files (repeat the procedure for both files)
      → Do not select any contents in the opened files
      Tool menu, Segment submenu, Segment

      A dialog shows, select the segmentation type you want in the drop down menu, press segment all, you'll see small orange markers at the beginning and end of each segment Appletrans has created for you.



    4. Let Appletrans know that the two files are to be synchronized, do that for the two files.
      Tool menu, Synchronize

      A dialog shows, enter the language of the file.


      The synchronization causes the display to change a little bit. Use Cmd+1 (or Cmd+2) on the frontmost text and you'll see that the segments defined in that window are somehow linked to segments in the other window.

      By doing that, you can already see that some source segments are not associated to the correct target segment. The alignment process is about correcting such association mistakes.


  3. Correct the default segment associations


  4. (This part is not as clear in the user manual and required a bit of guessing.)

    You have now 2 windows open:

    1. The segmented source file
    2. The segmented target file


    Here are the Appletrans specific shortcuts that you will need to modify the alignment:

    Cmd+1 (Tool menu, Segment submenu)

    → selects the next segment and shows the associated segment in the other window

    Cmd+2 (Tool menu, Segment submenu)

    → selects the previous segment and shows the associated segment in the other window

    It is also possible to select any segment in the text by clicking on one of its orange segment marker.

    Opt+Cmd+R (Tool menu, Segment submenu)

    Restore, removes the segmentation for the selected segments, at least one full segment must be selected for the action to work

    Opt+Cmd+S (Tool menu, Segment submenu)

    Segment Selection, no need to go through the Segment dialog again !



    Now, here are some practical standard shortcuts that will make your life easier.

    Arrows

    → moves the cursor around the window

    Shift+arrows

    → selects while the cursor is moving

    Delete

    → deletes the selected part (segment or text)

    Cmd+X, Cmd+V

    → standard cut, paste that you can use to move segments or text around



    Merge segments


    • Select the segments to merge.
    • Press Opt+Cmd+R (Restore) to remove their original segmentation.
    • Press Opt+Cmd+S (Segment Selection) to make a segment from the selection.



    Split a segment


    • Select the segment to split.
    • Press Opt+Cmd+R (Restore) to remove its original segmentation.
    • Select the part you want to make a segment out of.
    • Press Opt+Cmd+S (Segment Selection) to make a segment from the selection.
    • Proceed similarly with the remaining of the original segment until every part is a segment.


    It is also possible to cut and paste segment contents around to achieve the same result. You may end up with empty segments that will have to be deleted. Do what fits best your workflow.

    In the system shortcuts (see System Preferences, Keyboard Shortcuts), you should have a Move focus to next window in active application.

    I have set this shortcut to Cmd+Esc, so that I can Cmd+Tab to navigate the running applications and Cmd+Esc to navigate the open windows of the frontmost one.

    Imagine the following scenario:

    Cmd+1, you select the next coming segment, you notice that it is not associated with the right segment in the other window.

    Cmd+Esc, you go to that window, you do what you have to do there, and when the segments are properly aligned, you don't need to go back to the first window, just proceed with Cmd+1.

    Anyway, with the above indications, you should be able to correct all the segments association in the files by using only the keyboard and by thus saving a huge amount of time.

  5. Create the alignment file


  6. The purpose of all this is of course to create an aligned file that you will later use for reference in your favorite CAT tool.

    Appletrans allows you to save such corpus in the familiar TMX format that most CAT tools support.

    First, you need to create a new corpus that will contain the data you just aligned.

    File menu, New Corpus


    A new dialog should be displayed but you don't have to worry about it. Click on any of the two text window that you have just aligned.

    Now, to save your data:

    Tools menu, Build Corpus


    Appletrans will be busy for a few seconds and then will release the focus.

    If you go back to the Corpus dialog, you will notice that the upper left red light now has a black dot in it, which indicates that the corpus has been modified.

    To create the final TMX:

    File menu, Save As


    Put a relevant file name and select TMX Format from the File Format drop down menu. Then save.

    The TMX that you have just created is a TMX 1.4 file that contains only textual information. All the style that was present in the RTF files has been removed. It is thus a TMX 1.4 level 1 file.

    A typical Appletrans created TMX file will look like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <tmx version="1.4">
    <header creationtool="AppleTrans" creationtoolversion="38" datatype="unknown" segtype="sentence" adminlang="en" srclang="en" o-tmf="AlairCorpus">
    </header>
    <body>
    <tu>
    <tuv xml:lang="en"><seg>sentence</seg></tuv>
    <tuv xml:lang="fr"><seg>phrase</seg></tuv>
    </tu>


    You may want to change the srclang argument since Appletrans defaults to "en". If you use this TMX in OmegaT, the change won't be necessary as long as the xml:lang argument for the two tuvs corresponds to language variants of the languages you have set at the time of the project creation.

  7. Validate the TMX contents

  8. Appletrans (and a number of other tools) do not ensure that the TMX contents perfectly follows the TMX standard. In some cases, the textual contents that you have just aligned and converted will contain characters that should not be included in a TMX file. To ensure that the TMX you have just created does not contain such characters, you are going to need another utility.

    Maxprograms, the creator of a number of translation related tools, has released a free TMX validation utility that will be put at use here .

    Launch TMXValidator, and instead of using the Validate File in the File menu, use the Clean Invalid Characters from the same menu.

    TMXValidator will ask you to select your TMX file. After a very short time, the main window should display "File cleaned". No need to save, the file has already been modified.

    You can now use the TMX file in any CAT tool that support TMX files.






Links



About LISA:
TMX, XLIFF, etc...

Perl and Python scripts to create TMX files are available from the OmegaT official page.

Heartsome's page is here. You can download the software set and use it without limitations for 30 days.

Appletrans can be found from Apple's Localization Tools page, here.
There is a very active support group hosted by Yahoo Groups.

Maxprograms has been around for a while but limited itself to deliver free utilities eventually distributed with the Heartsome tool set. Now it has a full fledged XLIFF editor, Swordfish, along with all the smaller utilities that are all very useful.

20 April 2008

Office 2008 and OOXML on the Mac (update)

I have edited and updated the Office 2008 related posts:

Office 2007 files (.docx, .xlsx, .pptx) on Mac
and
Office 2008 review

10 April 2008

OSX in Arabic !

I was wondering how much news I'd get from reading the Mac related French sites and until now I've only been disappointed by seeing only translation of the English news.

This morning, something that was not reported in the English sites made its way to my RSS page... The Arabic localization of OSX ! The site is Mac Génération and reported on the release of an Arabic kit for OSX.

The release is available for OSX 10.5.2 as a .dmg package. Looking at the release page, one can see an Intel 10.4.10 localization package is also available.

Very good news for the Arabic OSX users!

03 March 2008

Spellchecking in OmegaT 1.8

This (or something similar) will eventually make its way into the user manual. Meanwhile...


  1. Click on Options > Spell checking...

  2. Indicate where you want OmegaT to look for dictionaries.

  3. This can of course be the directory where OpenOffice.org keeps his.
  4. If there are valid dictionaries in that location, OmegaT will recognize them and will display them. If the dictionary you want to use is already there and visible to OmegaT, you're done. If that is not the case, proceed with the following:

  5. Click on "Install". This takes a while because OmegaT gets a list of dictionaries from the internet.

  6. OmegaT will display a list of dictionaries, click on the dictionaries you want to install (Cmd+click will do multiple selections on Mac, maybe Ctrl+click will do on other platforms).

  7. After you have clicked "Install", the button will change of color and OmegaT will get the files from the internet and nothing noticeable will happen for a while. Just wait until the button reverts to its "normal" state.

  8. Close.

  9. The new dictionaries will be displayed in the dictionary list.

  10. To use the dictionaries, make sure the language code of the target files corresponds to the dictionary's language code: an FR-FR dictionary will not work with an FR target setting. You need to change the setting to FR-FR to have the spellchecker recognize the correct dictionary for your target.



You don't have to use that interface to install new dictionaries.

Go to OpenOffice.org's dictionary download page and get the files you want.

Uncompress them in the directory specified in step 2) above.

If OmegaT does not notice them after that install, you can try reloading the project or restarting OmegaT.


Once you have started translating, OmegaT will produce a familiar red wavy underlining for words that are not included in the applied dictionary. A right-click on the word should produce a contextual menu that will display a number of candidates as well as a few options.

People who can't "right-click" because they only have one mouse button can use Command+Click to display the contextual menu. Those of you who have a recent Mighty Mouse from Apple should know that it is quite configurable. Check the System Preferences.

It is also possible to configure some touchpads to simulate a right-click when hitting them with 2 fingers at once. Check your preferences...

02 March 2008

OmegaT 1.7.3, 1.8, 1.9...

February was a good month for OmegaT.

Rank is second best with 187 out of 100,000+, after October 2007 (see this post) where it was at 137.

Downloads is also second best with 3,883 packages (everything included), after November 2006 where it was at 4,127.

A few days ago, the latest stable version was released (OmegaT 1.7.3_01). It had been existing as a test version for a while and since there were no major issues with it, it was eventually considered stable.

Making 1.7.3 stable also meant creating a whole new test version. While the developers were busy fixing the most important glitches and adding localizations, work was also done on the last version of OmegaT that will work with Java 1.4: OmegaT 1.8.

OmegaT 1.8 test has been released a few minutes ago ! In fact, Didier had been waiting all this week for the OSX bundled that I had totally forgotten about. Apologies everybody !!!

OmegaT 1.8



Donwload OmegaT 1.8 test for OSX !

Java 1.4 is a thing of the past for most Windows and Linux users. For them, Java 1.6 has been available for a while already. But for Mac, Java 1.5 is still the default in Leopard (10.5) and Tiger (10.4), and Panther (10.3) users are still limited to Java 1.4.

OmegaT 1.8 is bringing quite a few major new features to OmegaT.

First of all, a spellchecker. OmegaT uses the same spellchecker as OpenOffice.org: hunspell. Which also means that it can use all the dictionaries available for OpenOffice.org, and that means quite a lot. Since the manual has not been updated yet to cover this aspect of the setup, you'll have to proceed by trial and error to install your dictionaries, but it is relatively trivial so you should be alright. Don't forget to make sure that the dictionary language code and the project target language code match, otherwise the spellchecker will not realize it is called...

Update! It looks like some OmegaT users have a hard time with the spellchecking setup, so I just wrote a page about that: Spellchecking in OmegaT 1.8.

After the spellchecking, there are quite a few other features that will surely ease your work. Here is a list from the changes.txt file:


  • Letter case change in editing field

  • Display (all) source segments, so that you don't have to navigate to a segment to see its source, you can have all the source segments displayed at once

  • Mark translated segments with a distinguishable background color

  • Mark untranslated segments with a distinguishable background color

  • Navigation history, so that you can change of segment and come back to the one where you left

  • HTML, skip extraction of messages matching regxep

  • Select elements to translate in office documents

  • Clickable match window, so that you can navigate to a match

  • Compare source segment and translated segment lengths

  • Indicate translation progress in status bar, mostly the data in the project file window, but available without having to change of window




OmegaT 1.8 does have a few glitches though, some of them I gather, due to the spellchecker interaction with the editing interface. So it should really be considered as a test version. But I have been working with it since the very first days of the spellchecker implementation and I have yet to loose data with it (not that I am particularly anxious to prove the fact that a test version should not be used for real jobs though...)



OmegaT 1.7.3_02



If you check the release notes available in OmegaT 1.8 you'll find that a 1.7.3 release 2 is in the making.

Currently, 1.7.3_02 includes the following:
Enhancements:

  • Command line parameters for OmegaT.exe

  • Windows installer

  • New Arabic localisation (readme, instant start)


Bug fixes:

  • PO: Bad handling of plural messages on multiple lines



OmegaT 1.7.3 release 2 is not yet ready. It is still waiting for more localizations and eventual bug fixes. I'll update this page when information comes in.

Regarding OmegaT's first Arabic localization, I would like to express my sincere thanks to Mr. Faycal Alami who considered my request for help and gladly contributed his work to the OmegaT Project. I hope we will be able to have a fully localized version of OmegaT in Arabic thanks to his work.

The Arabic localization will only be available in OmegaT 1.8 test for a while, until OmegaT 1.7.3_02 is released.


OmegaT 1.9



Now that OmegaT 1.8 is in testing as the last version of OmegaT that will work in Java 1.4, a lot of work is being accomplished on the next version of OmegaT. Targetting Java 1.5 and probably modifying quite a few important things that we've been used to... OmegaT 1.9 code is now mostly OmegaT 1.8 with a lot of refactoring, to prepare the code for core changes.

I am currently using the new code and am updating it as soon as something seemingly big comes in so I'll let you know what goes on. For people who like to be on the bleeding edge, check the OmegaT Project new SVN repository by typing the following at your command line:

svn co https://omegat.svn.sourceforge.net/svnroot/omegat omegat

You'll need ant to build the code. There won't be an OSX bundle for 1.9 before a while, so you'll have to do as we used to before the bundle: either double-click on the OmegaT.jar file, or start from the command line.

29 February 2008

Mac for Translators, mailing list ?

I am starting to think that this blog could benefit from a proper mailing list...

So here it is: http://groups.google.com/group/mac-for-translators...

Feel free to join. The archives are set to be publicly available, and the group will be multilingual.

To subscribe, send a message to mac-for-translators-subscribe@googlegroups.com. Google will send you a confirmation mail.