Localized Applications

The world doesn’t just speak English.

Writing localized applications with 8th

A persistent problem in writing applications for the global marketplace is “localization” (also known by the shorter moniker “L10N”). There are a number of issues relating to it, among them:

  • Proper representation of text (Chinese vs. Russian vs. English, for example)
  • Correct display of the text (font issues)
  • Choosing the text to display (“hello” vs “hola”)
  • Ordering the text correctly (“big house” vs “casa grande”)
  • Date formatting (“Sunday” vs “Domingo”)
  • Numeric separator (“,” vs “.”)

These are the principal issues which programmers of multi-lingual applications need to deal with, and there are many solutions of varying complexity being used today.

The 8th solution

More properly: “solutions”, since there is more than one issue.

Proper text representation

Text in 8th is stored in the UTF-8 encoding, which means that the text is always represented correctly, even when due to other issues (e.g. font problems) it may appear incorrect.

To make the programmer’s task easier, 8th not only lets you actually enter any UTF-8 text, it also lets you use special “escapes” in your text to make it easier to enter obscure characters. Thus, for example, this string: "qu\u00e9" results in this: qué.

The words (e.g. “functions”) 8th provides to do string manipulation are also UTF-8 aware, which means you don’t have to worry about creating an invalid bit of UTF-8 encoded text (unless you deliberately do so).

Correct text display

In general this can only be solved by ensuring the font used to display the text actually contains glyphs for the characters being displayed. There are very few fonts which contain all glyphs, as that is wasteful of system resources as well as difficult to make esthetically pleasing.

The 8th solution is to permit embedded fonts. In this manner you can have complete control over which fonts are used with which text, and ensure you can properly display the text.

Choosing the text

Many and varied solutions exist for the problem of choosing which text to display based on the choice of language. In 8th we’ve adopted a very simple approach: user-defined “assets” which are language-specific.

The programmer needs to define a separate asset for each supported language (e.g. “lang/de” for German, “lang/es” for Spanish and so forth). One of the contents of the asset is simply a JSON object whose keys are the text to translate, and whose values are the localized string. For example:

{
  "hi" : "hola!",
  "bye" : "hasta luego"
} s:intl!	

The s:intl! is there to inform 8th that the object previously defined is to be used for translating strings in the current language. The 8th distribution contains samples showing how this fits together, but essentially translation is as simple as:

"hi" s:intl	

Correct text ordering

Because different languages use different text ordering (that is, verb-object or object-verb), 8th provides an easy way to produce the proper text order: templated substitution. A simple example shows what we mean by that:

"%adj% %noun%"
{
  "adj" : "Big",
  "noun" : "house"
}
s:tsub

The string at the beginning is a “template”, and the object following contains the replacements for the template parameters. When the “s:tsub” word is invoked, it replaces the parameters in the template string with the values from the object — resulting in this case in “Big house”. Combining this with the language-specific text selection means you can have a different template per-language (if necessary) and ensure the text is presented in the correct order for that language.

Date formatting

8th includes the ability to format dates in a localized manner. You need to provide the correct strings for the short and long names of days of the week and months — which you would do in your language-specific asset. You would then provide the date-formatting string you desire to present the date in the correct localized manner (Y/M/D, D/M/Y or whatever is desired).

Numeric separator

In addition to all the above, 8th provides an easy way to specify that the thousands separator should be something other than “,” and that the decimal separator should be something other than “.”

Conclusion

8th lets you produce localized applications with a minimum of fuss, and without having to rely on the underlying operating-system giving you the localization information. This helps ensure your application looks correct and behaves correctly across multiple platforms, and multiple languages.