29 July 2009

Excel files with colored non translatables...

Here comes an excel file, with pseudo HTML in the cells.

The HTML tags are red and must not be modified. If you want how the segments look check the follow up post.

Translating that in OmegaT is relatively straightforward:

  • save the file as ODS in Openffice.org (or NeoOffice)
  • put that source file in the /source/ folder of your OmegaT project
  • load the project and translate

The problem is that not only you are going to have all the HTML tags displayed for what they are within the translatable text, but you're going to have to deal with the red color tags that will surround all the HTML...

Not user friendly at all...

Another solution is to do like this:

  • copy-paste the column into a text file -> no more red color, will deal with that later
  • insert a visible marker like @@@ at each end of line
  • save the file as .html -> no more full HTML tags in the segments
  • put in /source/
  • go to Options > Segmentation and add 2 rules. One where you segment before @@@ and one where you segment after @@@, that way you'll nicely isolate the marker and it will be translated only once
  • load, translate

The resulting file should contain all the original tags, without modifications, but some characters in the original may have been converted to HTML references. Replace those with the original character if you think it is better.

Now, open your file in a text editor, remove the @@@ markers and paste the contents into a Write page in OpenOffice.

There, do a "Regular Expression" search for the string: (<[^>]*>)

The string means: a "<" followed by a number of anything but ">" followed by a ">": (basically any HTML tag). The surrounding parenthesis put the matching string into a memory for later retrieval!

and replace by "&" with the style "font color=red". "&" means "the group that was just put into memory".

All your HTML tags should be colored in red now.

Copy-paste the contents into the original file where it needs to be, and deliver !!! Also make sure that one line corresponds to one cell (manipulating the @@@ marker should not change the overall structure but one never knows!)

(There are probably easier ways to deal with such files. Let me know!)