How to support this blog?

To support this blog, you can hire me as an OmegaT consultant/trainer, or you can send translation and project management jobs my way.

Search the site:

Quick Look for zip files, folders

Currently, Quick Look does not display anything interesting when you hit a folder or a zip file.

With 2 free utilities you can hit the space bar and see a listing of the contents of a folder or of a zipped file...

- Folder Quick Look Plugin
- Zip Quick Look Plugin

Keep the pages in your bookmarks to see updates when they are released.

OmegaT 1.7.3 (with Mac bundle) released !

First of all I'd like to thank all the people who have tested my Mac bundles for OmegaT.

I used your comment to create the official version that is released today.

As you probably know, the OmegaT project puts the "test" label on versions that do not have up to date manuals but that are at least as stable as the stable version... Basically, test version have been thoroughly tested already by a number of power users and all the problems are supposed to have been ironed out.

So go ahead, it won't bite you !

The file, once unzipped, becomes OmegaT.dmg and opens with 2 files:

- documentation.html
- OmegaT.app

The documentation.html file is exclusive (now) to the Mac package, and groups all the information per available language. I put the up to date manuals in bold so that you can see them right away. This is the case for the English manual exclusively as of today.

The manual has been fully updated thanks to the work of the current documentation manager to reflect all the new features (check the changes.txt file) and extra information like bidi languages handling and command line arguments.

It's been a long time since OmegaT was released with a Mac bundle (1.4.4 was the last one if I remember well) and the current release manager and I are working on automating this process (instead of having to do everything by hand here).

People who still want the rough edges of the "pure Java" version can still use the "OmegaT_1.7.3_Without_JRE.zip file. It will behave exactly like previous OmegaT versions (as far as integration with OSX is concerned) and will allow for all sorts of command line arguments passing. The bundle is a little bit trickier to modify, you need to edit the .plist file inside the package to obtain the same results.

Java 1.6 for Mac !

Apple has been criticized for not including Java 1.6 into Leopard. The current default is Java 1.5 when all the other platforms (Windows, Linux, Solaris etc) have access to Java 1.6...

Since SUN opened the code of Java it is freely available for porting and modifications. What should happen eventually happened: somebody took the source code and ported it to Mac.

The result is "SoyLatte: Java 6 Port for Mac OS X 10.4 and 10.5 (Intel)" by Landon Fuller. The performance seems to be pretty good too. See the benchmark comparing Ruby on OSX and JRuby on Java 1.6/OSX by Charles Nutter.

With this release, OSX users are getting closer and closer to a stable release of Java 1.6 for their machine (only Tiger or Leopard).

There is a developer preview from Apple, but it only works under Leopard and you need to be registered as a developer to access it (registration is free).

OpenOffice.org 2.4 localization

Almost two weeks since the last post. Amazing how 3 kids can suck your energy into nether...

Today's first post is an announcement.

OpenOffice.org is a free office suite that a lot of translators already use for its compatibility with MS Office and the fact that, well, it is a free download and a free use application. OpenOffice.org is developped in part by SUN Microsystem, contributions come from IBM and other major players in the software industry and there is a very strong community of users and volunteers that exchange in a variety of languages. The "Native Language Confederation" is where all the non-English things take place.

OpenOffice.org is thus localized by this community of communities under a separate project called, obviously, the "Localization Project".

The current available version of OpenOffice.org is version 2.3. Version 2.4 is expected to be released sometimes at the beginning of March and the localization efforts will thus start very soon.

Translators on Mac who do not use OpenOffice.org but prefer NeoOffice should be aware that all the localization work that goes into OpenOffice.org is automatically "recycled" into NeoOffice.

So, the deal is: you're enjoying a wonderful free office suite, and somehow you feel guilty for not having had to pay for it, or you feel that you'd love to "pay something back" but not being a programer you are not sure where to start...

Well, you are a translator by trade, aren't you ? Localization is where your skills can be used the best. Here is where you'll be able to find all the necessary informations for this version's translation.

There are TMX files available for some language communities and since the source files are in the PO format you can translate them in your favorite CAT tool.

First, get in touch with the translation group within your language community (from the Native Project page: click on your language community, go to the relevant page from there, either "contributions" or "participation" or "projects" etc. and propose your help !

OSX 10.5.1, Safari 3.0, TextEdit...

It took Apple three weeks to address the most problematic bugs in Leopard. Good job ! The security contents of the update are detailed here. See also the relevant TidBits link in the feed box below.

Update: Heise Security's opinion on the fixes.

Safari 3.0, shipped in 10.5, includes a WebKit update that will certainly improve our Web experience. The Ars Technica article gives an interesting perspective on the WebKit market and WebKit.org gives a technical review that highlights the main new features. This new version of the WebKit is available in Leopard and in the latest Tiger update (10.4.11) released yesterday.

If you make a search in TextEdit now, you'll see that the search results are highlighted as they are in Safari. Much easier to see the searched string !

Bento, NeoOffice, OSX 10.4.11

Bento is "the new personal database from FileMaker that's as easy to use as a Mac".

Daringfireball's analysis is quite interesting and convinced me to take a new look at Numbers for my simple database needs...

NeoOffice has just release the second patch for NeoOffice 2.2.

And Apple has also released a point update: OSX 10.4.11, that contains Safari 3.0 and ships a number of other items and fixes.

XLIFF 1.2, TMX 2.0, etc.

For some reason, my Google Alert for XLIFF decided to inform me today of the existence of the XLIFF 1.2 specification on the OASIS site...

Since we're at it, the other relevant standard body, LISA, is also working very hard, to produce the next version of TMX: TMX 2.0. With Heartsome's Rodolfo Raya as the standard editor, we can be sure that HTS will be one of the first application suite to support the standard.

Meanwhile, the Internationalization Activity of the World Wide Web Consortium has not ceased to produce very interesting documents, like an Updated Working Draft about the Best Practices for XML Internationalization.

The W3C's i18n group is also working on the Internationalization Tag Set. Yves Savourel of Okapi Framework fame being the chair of that Working Group, you've already guessed that the framework already supports a part of this tag set...

OpenWordFast

Christmas in November !!!

After the Okapi for Mono package 2 days ago, another package useable on the Mac has just been released: OpenWordFast, a macro for OpenOffice.org that accepts WordFast translation memories.

The project was registered on October 8th, which means that it is yet a little early to expect function parity with WordFast, currently it only accept 100% matches from the TM... But since the project is free software (GPL) I have no doubts that it will find a lot of contributors.

Update:

I received a mail from Oleg, OpenWordFast's developer after congratulating him for his work:


Hi, Jean-Christophe.

Thanks for your post. But OpenWordFast in the raw Beta stage. I'm not tested it on Mac yet.
Its lacks of vital functionality - Glossary, Terminology Recognition, search of not full match TU.

But I plan to work on this list in the future releases.

Best regards, Oleg Tsygany.


Here we go !

Transmug !

Yves, apologies !

Jost Zetzsche of The Tool Kit, the newsletter to read, even if Mac news are scarce, just reminded me of the existence of Transmug, your group of Mac using translators...

I found the Preaching to a Choir post and liked the PDF (10 mb) presentation attached to it.

ps: Amazing the number of Yves related to Mac and translation. Yves Averous of Transmug, Yves Savourel of the Okapi Framework, Yves Champollion of WordFast...

Okapi tools for Mono !

The Okapi framework is a set of applications designed to ease the work of the translator:

"The Okapi Framework is a set of interface specifications, format definitions, components and applications that provides an environment to build interoperable tools for the different steps of the translation and localization process.

The goal of the Okapi Framework is to allow tools developers and localizers to build new localization processes or enhance existing ones to best meet their needs, while preserving a level of compatibility and interoperability. It also provides them with a way to share (and re-use) components across different solutions. The project uses and promotes open standards, where they exist. For the aspects where open standards are not defined yet, the framework offers its own. The ultimate goal is to adopt the industry standards when they are defined and useable.

In short, the Okapi Framework aims at being a crucible where we forge common components that can be used in any localization and translation application, providing faster development time and better interoperability, but still allowing for the diversity of solutions."


(quote from http://okapi.sourceforge.net/)

Problem is, Okapi is developed on the .NET platform, basically a Windows only platform.

A few years ago, people on the Linux side have decided that .NET was a valuable platform and decided to create an implementation of .NET for Linux, that could run .NET applications out of the box. Mono was born.

Mono was also made to run on OSX... The problem was that until recently Mono's support for .NET was not sufficient to run the Okapi tools and that the Okapi tools had not been written with the lowest common denominator in mind to run on Mono.

Yesterday, Yves Savourel, lead developer of the Okapi Framework Project, released a first Okapi for Mono package for testing on existing Mono environments (including OSX and Linux). The totality of the tools is not yet available but command line tools are said to work.

As far as OSX workflows are concerned, Okapi can produce XLIFF files (or OmegaT projects, or XLIFF files for OmegaT) from a number of localization/translation formats. It is now relatively trivial for OSX translators to deal with inDesign files, for example, as long as they are saved in the inDesign XML format (.inx). Okapi will convert the .inx files to XLIFF for translation in OmegaT and will convert the target files back to .inx for delivery....

Very good news for translators on OSX and warm thanks to the Okapi team !

ps: I'll post a detailed description of how to install Mono and Okapi on OSX in a few days for the readers who don't feel too adventurous. Meanwhile, here are the respective download pages:

Dictionary.app development kit

Dictonary.app comes with a few dictionaries already, but limited to 2 languages: English and Japanese.

What if you want to use other data sets there ?

Apple has released a Dictionary Development Kit that you will find in /Developer/Extras/Dictionary Development Kit/, if you have installed the XCode tools.

From a first glance, a dictionary file is basically a twisted XHTML file that gets massaged with perl scripts and a few command line applications, for use in the dictionary application.

Here is the first part of the abstract from the file Dictionary Format.rtf located in ./documents

"This document explains an XML schema that enables developing dictionaries / references that are compatible with Dictionary.app and other Dictionary Services. The schema defines the source code format for the dictionary. The dictionary source needs to be validated to make sure it is in the correct format. It is then processed by the Dictionary Build tool together with css and other auxiliary files, and packaged into a dictionary bundle. The dictionary bundle can be installed into one of the Library/Dictionaries folders to make it accessible from Dictionary.app."

And here is the first entry in one of the sample provided (./samples/JapaneseDictionarySample.xml):


<?xml version="1.0" encoding="UTF-8"?>
<!--
This is a sample dictionary source file.
It can be built using Dictionary Development Kit.

Entry examples for Japanese dictionary, English-Japanese dictionary, and Japanese-English dictionary.
-->
<d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">

<d:entry id="annojou_j" d:title="案の定">
<d:index d:value="アンノジョウ" d:title="案の定" d:yomi="あんのじょう" />
<d:index d:value="あんのじょう" d:title="案の定" d:yomi="あんのじょう" />
<d:index d:value="案の定" d:title="案の定" d:yomi="あんのじょう" />
<h1>
<span class="headword">あん‐の‐じょう</span>
<span class="hyouki">【案の定】</span>
</h1>
<span class="meaning">予想通りに事が運ぶさま。</span>
</d:entry>



(Side note: interesting to see that .xml files open in Dashcode by default.)

Update: Apple has a full "Dictionary Services Programming Guide" available here. It also comes as a downloadable 1.3mb PDF file.

I have been trying to build a sample of EDICT provided for this purpose by Prof. Breen but even though the build process seems to be successful, Dictionary.app only displays that is finds an entry without actually displaying it... More later...

Update 2: I have spent a good part of the week end testing Prof. Breen's sample. The documentation provided by Apple is really minimal so we had to try a lot of different things.
hints:
  • The IDs that are meant to be unique are not validated if they are entirely numerical, but it seems numerical values are handled without problems by the build process.

  • The build process seems to have difficulties to properly build the dictionary if the project is not in the /Developer/Extras/Dictionary Development Kit/ directory.

Regular expressions and text editing

Regular expressions



When you edit glossaries or translation memories a few regular expressions always come in handy.

Regular expressions are pattern matching expressions. You create a pattern in the search field of a text editor and the text editor will look for anything that matches the pattern. Similarly, once you have found the pattern, you can replace it with a second pattern. That way, you end up with super powerful search-replace routines that can save you hours of stress on thorny texts...

Here are two articles that you can use to get up to speed on the topic:

Regular Expressions from Princeton University
Regular Expressions Unfettered from Apple Developer Connection

As you can see, regular expression creation is not always an easy task and a helper application can sometimes save you a lot of time.

There are a few free regexp testers for OSX but they basically all work from outside your text editor.

The one that seems to be the easiest to use is reggy, a nice piece of software licensed under the GNU General Public License v2.

On his blog, Bill Clementson talks about regex-tool.el. As the name indicates, regex-tool.el is an elisp library for Emacs

Besides for text editors, regular expressions are supported by all kinds of software. Including, of course, the major Office applications on the market. Look at their user manuals to find more information about regexp handling there.

Text editing



A number of very good text editors are available for the Mac. TextEdit, OSX's default text editing application, is fine but lacks even basic regular expressions support. It does plenty of other things though, and can be considered as a simple word processor with most of what is needed in that field (read and write Word files etc).

Others major text editors on Mac include:
Smultron (free as in "speech"),
TextWrangler (free as in "beer"),
SubEthaEdit (not free, except for the old version),
BBEdit (not free at all) and
TextMate (not free either and with very bad multibyte characters support).

There are also the "ancestors" that are VI (VIM) and Emacs. Both are available from the Terminal application but require some time to get used to. Still, they are definitely some of the most powerful application OSX hosts...

Emacs is not exactly the text editor that I'd recommend to my wife. But Aquamacs, a "Mac" version in terms of interface is much friendlier and can almost right away be used as a replacement for TextEdit as far as, well, text editing is concerned. Being an adaptation of Emacs, it is just as free and also distributed under the GNU General Public License...

Emacs is written in Elisp. So anything you write in Elisp within Emacs can de-facto extend the functionality of Emacs. In other words, Emacs is just a huge macro editing environment...

Whichever text editor you decide to use, don't forget to read the user manual and especially the "searches" and "regular expressions" chapters.

Automation on the Mac !

AppleScript



Apple just updated the AppleScript section of its site.

I have never found a practical way to use AppleScript in my workflows... Hopefully the new version and better integration between applications and OS will make that a no-brainer now...

Automator



I have more hopes for Automator, especially the new version that has full access to the user interface (see here).

Here is the state of the documentation about Automator on Apple's developer's site. It is not updated yet for Leopard but is a very good start (after the Help files themselves) for people who want to take a serious look at this technology.

It is also possible to create actions with AppleScript or other easy to learn languages.

OmegaT development status for October

It is nice to see on OmegaT's development site that October has been quite an active month.

For those who don't know what OmegaT looks like, take a look at the screenshots, and read the online documentation.

Back to the stats.

Project rank was 137, which is the highest ever in the project's history (out of more than a 100,000 projects). The Downloads figure was 3,466, which is second best, after November 2006 (4,127), when the first build in the 1.6 series was released. Of course, the downloads figure includes the current stable and test versions as well as a few other packages.

This 3,466 figure comes right after the latest test version (OmegaT 1.7.2) was released, at the end of September.

The Mac OSX zip packages that are available for the stable version (1.6.1_04) and for the test version (1.7.2) are soon going to be replaced with a real MacOSX application bundle that will make OmegaT even easier to use on the Mac.

The development code is available through the Terminal application, or any CVS interface.

Here is a rudimentary shell script that I use to update the sources and build them (you will need the OSX developer's tools installed, available from the install DVD):


#!/bin/sh
cd ~/Documents/OmegaT/application/current_cvs/
cvs -z3 -d:pserver:anonymous@omegat.cvs.sourceforge.net:/cvsroot/omegat co -P omegat
cd omegat
ant
cd dist
open -e readme.txt &
open -e changes.txt &
java -jar OmegaT.jar &


I am sure there are things to improve here, but it works for me :).

The current code in CVS is labelled OmegaT 1.8 and includes the OpenOffice.org spellchecking engine etc.

The changes.txt file indicates the features that have been implemented, the bugs that have been fixed and the localizations that have been added.

Feel free to test and look for bugs !

Office 2007 files (.docx, .xlsx, .pptx) on Mac

(updated to reflect the release of StarOffice 9 for Mac and the OOXML conversion software for Office 2004)

Microsoft Office 2007 for Windows (and its Mac counterpart: Microsoft Office 2008) uses a new file format that has been available for a while now as .docx, .xlsx or .pptx ("x" to distinguish them from the standard MS formats).

The file format is commonly known as OOXML or OpenXML, or more simply as Microsoft Office 2007 format.

Even if the new files don't seem to be very widely used, they sometimes end up on a Mac user's desktop, especially since they are the default file format of the two suites (i.e., you need to go through a number of loops to save to a different format)...

What to do when you encounter such files ?

Since I do not own Office 2008 and I did not have the OOXML update for Office 2004 at the time of the writing, I had to test access with OOXML files created with NeoOffice, from "real" Microsoft .doc, .ppt and .xls. All of the test files were pretty complex and quite heavy and had all been created originally on various versions of Microsoft for Windows.

Access through proprietary applications



The iWorks '08 way



iWorks: $79 from Apple

As far as I can tell, iWorks '08 applications Pages and Keynotes opened the .docx and .pptx files I had created without any problems.

And the result was as good looking as the original files. Very impressive.

When I tried to open the .xlsx file, Numbers was considered as the default application (even the converter was not listed) but it was unable to open it correctly. I'll need to have a "genuine" .xlsx file to test Numbers' capacities.

The problem with iWorks it that it cannot save a file to the new format. It can save it to the iWords default format or to the old Microsoft format, along with a few other more classical formats.


The Microsoft way


Microsoft Office 2008: $399.95 retail, $284.99 online, $239.95 retail upgrade version, $194.99 online upgrade version from Microsoft.
(The prices given correspond to the cheapest available version for professionals, the "Home & Student" package is not available for commercial activity.)

The Mac equivalent of Office 2007, Office 2008, has been available for a few months already. Office 2008 is the quickest way to access the new file format in a relatively smooth and painless way.

If you don't want to acquire Office 2008, you can download Microsoft's "Open XML File Format Converter for Mac". The application is available from here. It is at the bottom of the page, if the URL has not changed...

The converter requires OSX 10.4.8 or later. Microsoft also says that to view the files, you need either Microsoft Office 2004 11.3.4 or later, or Microsoft Office v.X 10.1.9 or later.

If you also install "Microsoft Office 2004 for Mac 11.5.0 Update" (description available here: http://support.microsoft.com/kb/953824) you'll also enable "Office 2004 for Mac to read and to write Office documents that are in Open XML Format".

The StarOffice 9 (beta) way


StarOffice: $69.95 (StarOffice 8 price, 9 is still beta), from Sun Microsystems.

StarOffice 9 beta is available from here:
http://www.sun.com/software/staroffice/get_beta.jsp

It should work pretty much as OpenOffice.org 3.0 beta. See below.

System wide support on Leopard (OSX 10.5)


Leopard: $129, from Apple.

If you don't (plan to) own any recent version of Office for Mac what can you do ?

Leopard users have the free option of using the new TextEdit. It can open and save the new file format.

OOXML support is system wide, which means that the Finder and other applications will also give you a "quicklook" of such files. Although not all files are equal under Quicklook. Some are displayed properly, some are displayed as a white icon and no contents is shown... The test .pptx worked, the .docx and xlsx did not.

So, support is not extremely good and I would not rely on it to check the translatable contents of a client file...

Access through free applications



OpenOffice.org and NeoOffice anyone?



Users on Panther (10.3) and above can use NeoOffice 2.2. NeoOffice is a sister application of OpenOffice.org.

The current available version of the standard OpenOffice.org (2.4) does not include OOXML support but NeoOffice includes special goodies, like OOXML support, that are found in Novell's version of OpenOffice.org, which is, sadly, not available for the Mac...

As of May 7th, the beta version of OpenOffice.org 3.0 is available. This version does include support for OOXML.

As written above, I used NeoOffice to create Office 2007/OOXML files with various degrees of success in terms of interoperability. I am pretty sure NeoOffice could open relatively complex files since the files I fed it for OOXML output were fairly complex, although I'd need to test that.

As text ?



An extreme way to access the contents of such files it to handle them as if they were zipped, unzip them and find the document.xml located somewhere in the folder hierarchy that appears (it would be under /word/ for a Word document). This file is standard XML and can be opened in any text editor.

To properly access the contents of the file, you'd need to use Okapi's Tikal utility, available for the Mono (free) running environment. Tikal should be able to extract the contents of the XML into an XLIFF file that you can later load into a translation tool...


Translation



Once you have access to the file, you can translate it by overwriting it in the application of your choice. Saving the resulting file to .docx will produce results that vary with the application you used. A best bet would be to save the result to .rtf for delivery.

OmegaT and other Java based applications



If you want to use a translation memory tool, the few I know that directly handle OOXML are OmegaT, Swordfish, the newborn from Maxprograms, and Heartsome's Translation Suite.

Appletrans



If you have converted the file to .rtf or HTML before translation, AppleTrans should be able to handle it directly.

Okapi's Tikal for conversion to XLIFF



Or, as written above, you can use Okapi's Tikal command line utility to convert its contents to XLIFF and translate it in any of the above mentioned applications.

Wordfast



The Microsoft converter opens the file in Word in the RTF format and you can then use WordFast to translate it directly (from within Word 2004 / Word v.X).

OpenLanguageTools



With hacks, you can also translate the document.xml file in OpenLanguageTools.

Have I forgotten your favorite tool ?

Popular, if not outdated, posts...

.docx .NET .pptx .sdf .xlsx AASync accented letters Accessibility Accessibility Inspector Alan Kay alignment Apple AppleScript ApplescriptObjC AppleTrans applications Aquamacs Arabic archive Automator backup bash BBEdit Better Call Saul bug Butler C Calculator Calendar Chinese Cocoa Command line CSV CSVConverter database defaults Devon Dictionary DITA DocBook Dock Doxygen EDICT Emacs emacs lisp ergonomics Excel external disk file formats file system File2XLIFF4j Finder Fink Font français Free software FSF Fun Get A Mac git GNU GPL Guido Van Rossum Heartsome Homebrew HTML IceCat Illustrator InDesign input system ITS iWork Japanese Java Java Properties Viewer Java Web Start json keybindings keyboard Keynote killall launchd LISA lisp locale4j localisation MacPorts Mail markdown MARTIF to TBX Converter Maxprograms Mono MS Office NeoOffice Numbers OASIS Ocelot ODF Okapi OLPC OLT OmegaT OnMyCommand oo2po OOXML Open Solaris OpenDocument OpenOffice.org OpenWordFast org-mode OSX Pages PDF PDFPen PlainCalc PO Preview programming python QA Quick Look QuickSilver QuickTime Player Rainbow RAM reggy regular expressions review rsync RTFCleaner Safari Santa Claus scanner Script Debugger Script Editor scripting scripting additions sdf2txt security Services shell shortcuts Skim sleep Smultron Snow Leopard Spaces Spanish spellchecking Spotlight SRX standards StarOffice Stingray Study SubEthaEdit Swordfish System Events System Preferences TBX TBXMaker Terminal text editing TextEdit TextMate TextWrangler The Tool Kit Time Capsule Time Machine tmutil TMX TMX Editor TMXValidator transifex Translate Toolkit translation Transmug troubleshooting TS TTX TXML UI Browser UI scripting Unix VBA vi Virtaal VirtualBox VLC W3C WebKit WHATWG Windows Wine Word WordFast wordpress writing Xcode XLIFF xml XO xslt YAML ZFS Zip