29 July 2009

Excel files with colored non translatables... (2)

Just to show you that I am not lying:

Here is one of the Excel cells that I have to translate:

<h1>This is ugly<br />ugly ugly ugly</h1>Ugly ugly ugly ugly ugly ugly:<br /><br /><a href='%%LINK_A%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly.<br /><a href='%%LINK_B%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.<br /><a href='%%LINK_C%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly. <br /><a href='%%LINK_D%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.!

Putting that straight into OmegaT as an ODS file

  1. keeps the various styles that exist in the original Excel file (fonts/colors/etc)
  2. keeps the HTML codes as plain text.

So, in OmegaT, you end up with segments like this (I colored in green the formating inherited from the Excel file, and in red the original HTML tags):

First segment:

<h1><f0>This is ugly</f0><f1><br /></f1><f2>ugly ugly ugly</f2><f3></h1></f3><f4>Ugly ugly ugly ugly ugly </f4><f5>ugly</f5><f6>:</f6><f7><br /><br /><a href='%%LINK_A%%' class='yellow'></f7><f8>Ugly </f8><f9>ugly:</f9><f10></a></f10><f11> </f11><f12><s13/>Ugly ugly ugly ugly</f12><f14>.</f14><f15><br /><a href='%%LINK_B%%' </f15><f16>class='yellow'></f16><f17>Ugly ugly:</f17><f18></a> </f18><f19><s20/>Ugly ugly ugly ugly ugly</f19><f21>.</f21><f22><br /><a </f22><f23>href='%%LINK_C%%' class='yellow'></f23><f24>Ugly ugly:</f24><f25></a> </f25><f26>Ugly </f26><f27><s28/>ugly ugly </f27><f29>ugly ugly</f29><f30>.

Second segment:

</f30><f31><br /><a href='%%LINK_D%%' class='yellow'></f31><f32>Ugly </f32><f33>ugly</f33><f34>:</f34><f35></a> </f35><f36>U</f36><f37>gly ugly ugly ugly ugly</f37><f38>.</f38>

You see ?

All the original HTML is here, and all the original formatting is applied to each cell part separately...

This is obviously not the solution. For one thing, the inherited formatting (all the green tags) is totally irrelevant as far as the translation is concerned. Then, the HTML tags should rather be handled as HTML and not as plain text, to be able to see the translatable text better, but also to reduce the modification risks when you translate the segment.

Now, if you follow my earlier advice, the above segment would look like this in OmegaT (with blue color for emphasis), after being handled as a "normal" HTML file:

First segment:

This is ugly<br0/>ugly ugly ugly

(notice that the tags <h1> and </h1> are not shown since they are block level tags and the whole block is the segment! You just reduced the total number of tags in the segment!)

Second segment:

Ugly ugly ugly ugly ugly ugly:<br0/><br1/><a2>Ugly ugly:</a2> Ugly ugly ugly ugly.<br3/><a4>Ugly ugly:</a4> Ugly ugly ugly ugly ugly.<br5/><a6>Ugly ugly:</a6> Ugly ugly ugly ugly ugly.

Third segment:

<br7/><a8>Ugly ugly:</a8> Ugly ugly ugly ugly ugly.!

Much better isn't it ? Now, how much time and energy do you think this trick will save you next time you have such a (not so uncommon) file to translate ?

ps: I did try to do all the heavy editing in Emacs, but I fear I am not yet familiar enough with its regex syntax. I eventually had to revert to TextWrangler, but I am not giving up...

ps2: If you know a CAT tool that does not require any manipulation to reach a manageable tag number, let me know !