How to support this blog?

To support this blog, you can hire me as an OmegaT consultant/trainer, or you can send translation and project management jobs my way.

Search the site:

Excel files with colored non translatables... (2)

Just to show you that I am not lying:

Here is one of the Excel cells that I have to translate:

<h1>This is ugly<br />ugly ugly ugly</h1>Ugly ugly ugly ugly ugly ugly:<br /><br /><a href='%%LINK_A%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly.<br /><a href='%%LINK_B%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.<br /><a href='%%LINK_C%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly. <br /><a href='%%LINK_D%%' class='yellow'>Ugly ugly:</a> Ugly ugly ugly ugly ugly.!

Putting that straight into OmegaT as an ODS file

  1. keeps the various styles that exist in the original Excel file (fonts/colors/etc)
  2. keeps the HTML codes as plain text.

So, in OmegaT, you end up with segments like this (I colored in green the formating inherited from the Excel file, and in red the original HTML tags):

First segment:

<h1><f0>This is ugly</f0><f1><br /></f1><f2>ugly ugly ugly</f2><f3></h1></f3><f4>Ugly ugly ugly ugly ugly </f4><f5>ugly</f5><f6>:</f6><f7><br /><br /><a href='%%LINK_A%%' class='yellow'></f7><f8>Ugly </f8><f9>ugly:</f9><f10></a></f10><f11> </f11><f12><s13/>Ugly ugly ugly ugly</f12><f14>.</f14><f15><br /><a href='%%LINK_B%%' </f15><f16>class='yellow'></f16><f17>Ugly ugly:</f17><f18></a> </f18><f19><s20/>Ugly ugly ugly ugly ugly</f19><f21>.</f21><f22><br /><a </f22><f23>href='%%LINK_C%%' class='yellow'></f23><f24>Ugly ugly:</f24><f25></a> </f25><f26>Ugly </f26><f27><s28/>ugly ugly </f27><f29>ugly ugly</f29><f30>.

Second segment:

</f30><f31><br /><a href='%%LINK_D%%' class='yellow'></f31><f32>Ugly </f32><f33>ugly</f33><f34>:</f34><f35></a> </f35><f36>U</f36><f37>gly ugly ugly ugly ugly</f37><f38>.</f38>

You see ?

All the original HTML is here, and all the original formatting is applied to each cell part separately...

This is obviously not the solution. For one thing, the inherited formatting (all the green tags) is totally irrelevant as far as the translation is concerned. Then, the HTML tags should rather be handled as HTML and not as plain text, to be able to see the translatable text better, but also to reduce the modification risks when you translate the segment.

Now, if you follow my earlier advice, the above segment would look like this in OmegaT (with blue color for emphasis), after being handled as a "normal" HTML file:

First segment:

This is ugly<br0/>ugly ugly ugly

(notice that the tags <h1> and </h1> are not shown since they are block level tags and the whole block is the segment! You just reduced the total number of tags in the segment!)

Second segment:

Ugly ugly ugly ugly ugly ugly:<br0/><br1/><a2>Ugly ugly:</a2> Ugly ugly ugly ugly.<br3/><a4>Ugly ugly:</a4> Ugly ugly ugly ugly ugly.<br5/><a6>Ugly ugly:</a6> Ugly ugly ugly ugly ugly.

Third segment:

<br7/><a8>Ugly ugly:</a8> Ugly ugly ugly ugly ugly.!

Much better isn't it ? Now, how much time and energy do you think this trick will save you next time you have such a (not so uncommon) file to translate ?

ps: I did try to do all the heavy editing in Emacs, but I fear I am not yet familiar enough with its regex syntax. I eventually had to revert to TextWrangler, but I am not giving up...

ps2: If you know a CAT tool that does not require any manipulation to reach a manageable tag number, let me know !

Popular, if not outdated, posts...

.docx .NET .pptx .sdf .xlsx AASync accented letters Accessibility Accessibility Inspector Alan Kay alignment Apple AppleScript ApplescriptObjC AppleTrans applications Aquamacs Arabic archive Automator backup bash BBEdit Better Call Saul bug Butler C Calculator Calendar Chinese Cocoa Command line CSV CSVConverter database defaults Devon Dictionary DITA DocBook Dock Doxygen EDICT Emacs emacs lisp ergonomics Excel external disk file formats file system File2XLIFF4j Finder Fink Font français Free software FSF Fun Get A Mac git GNU GPL Guido Van Rossum Heartsome Homebrew HTML IceCat Illustrator InDesign input system ITS iWork Japanese Java Java Properties Viewer Java Web Start json keybindings keyboard Keynote killall launchd LISA lisp locale4j localisation MacPorts Mail markdown MARTIF to TBX Converter Maxprograms Mono MS Office NeoOffice Numbers OASIS Ocelot ODF Okapi OLPC OLT OmegaT OnMyCommand oo2po OOXML Open Solaris OpenDocument OpenOffice.org OpenWordFast org-mode OSX Pages PDF PDFPen PlainCalc PO Preview programming python QA Quick Look QuickSilver QuickTime Player Rainbow RAM reggy regular expressions review rsync RTFCleaner Safari Santa Claus scanner Script Debugger Script Editor scripting scripting additions sdf2txt security Services shell shortcuts Skim sleep Smultron Snow Leopard Spaces Spanish spellchecking Spotlight SRX standards StarOffice Stingray Study SubEthaEdit Swordfish System Events System Preferences TBX TBXMaker Terminal text editing TextEdit TextMate TextWrangler The Tool Kit Time Capsule Time Machine tmutil TMX TMX Editor TMXValidator transifex Translate Toolkit translation Transmug troubleshooting TS TTX TXML UI Browser UI scripting Unix VBA vi Virtaal VirtualBox VLC W3C WebKit WHATWG Windows Wine Word WordFast wordpress writing Xcode XLIFF xml XO xslt YAML ZFS Zip