Website localization in PHP episode 1: Translate strings
Presenting a web site in multiple languages is a serious problem even when using dynamic server side technologies. In this article series, we'll try to figure out how to come out of common troubles. We'll start talking about string translation using a Pear module called Translation2.
Before doing that, allow me to make a premise.
Back in the old days, we could copy our site structure as many times as the number of languages we had to work on, go deep into the pages and make necessary changes.
This worked well enough for static content, but had its downsides:
- There were problems updating content
- If you wanted to send the text to a translation service, you had to pull it from the pages yourself
- When the text came back from the translator, you had to work again to insert it into the pages
- There isn't a standard way to format the text and many translation services don't understand HTML, so you had to exchange Word documents and manually convert the text into HTML again to preserve code cleaniness
- There are problems with foreign character sets, you have to translate the HTML entities or convert between encodings
- The first page on your site was nothing more than a list of languages and flags, that meant the first and most important page of all the site has no content
- If you made a guess choosing your first page language, your other languages had to be manually selected by the user to make the switch.
- If you positioned certain elements into the text, you had to insert them manually again when you put the new content.
- There was no way for the translators to make the changes their selves without asking the webmaster, if they had not basic design skills.
- If they claimed such skills, you had to trust them enough to give them access the site source files and have faith in the fact they will not screw up anything.
The list can go on forever. Guess what? Dynamic websites are even worse! The problem is that most of the programmers build up websites pulling data from an archive (text files, a database connection, a service on the Intertubes) and put this materials into template pages. When preparing or upgrading a site to be multilanguage, you had to have a way to pull different data from database for each idiom. Then you have to figure out how to translate the text into the templates.
There are many ways to solve these problems.
You could manually change all the tables to hold the data for all idioms, and prepare a very big array that holds the strings you put into the template, put this big array in different files, one array for each language, and include the files as the user switches language.
This would be only the beginning, because then you should mantain all these arrays in sync. If you wanted to change this data from a control panel you have to put all the strings in a database. Manage language fallbacks (what if a string is not translated in a language?). What about caching? Then you had to write the procedures for querying, inserting and modifying such data.
Meet Translation2, the Pear module that does all of this for us, and much more. After installing it from the repository, we initialize it using this code:
require_once 'Translation2.php';
$tr =& Translation2::factory('mdb2',$db, array(
'langs_avail_table' => 'translation_langs',
'lang_id_col' => 'id',
'lang_name_col' => 'name',
'lang_meta_col' => 'meta',
'lang_errmsg_col' => 'error_text',
'lang_encoding_col' => 'encoding',
'strings_default_table' => 'translation_strings',
'string_id_col' => 'ID',
'string_page_id_col' => 'page_id',
'string_text_col' => '%s',
'autoCleanCache' => true
) );
Translation2 uses MDB2 as an archive foundation, so you have either to create a MDB2 connection, or reuse an existing one by passing the database object. As I presume you already use a database connection, I'll skip this part. Let's assume the database object is called $db.
Next we pass an array of parameters, specifying the gory details of the tables holding out our translation data. The example depicts two tables, translation_langs (that lists all languages) and translation_strings (that contains all the text).
The table that holds all the languages needs a column for the language id (like for example 'it' or 'en' or 'fr'), name (like 'English', or 'Espanol', 'Italiano',' 'Francais'), meta (extended name, like 'American English', usually matches the name), error text (when the string is not translated, for example 'not available','non disponibile','non traduit') and encoding ('iso-8859-1','utf-8' and so on). You can call these columns and tables as you want, you just need to pass the names into the parameters array.
The strings table instead has an ID column (I put there an MD5 hash of the strings I want to be translated, but any text will do), the page ID column (so you separate and query strings for different pages), the text column for each language ('%s' in the example means the columns are called after the language ID) and any other parameter you wish to use.
To set the language ID and page ID, simply issue these two commands:
$tr->setLang("it");
$tr->setPageID("LOGIN_PAGE");
You then start to initialize the so called 'decorators', additional features of the translation system. The most useful ones are the Cache decorator (that uses the Cache Lite Pear module) and the fallback decorator (to provide fallback strings in other languages).
An initialization sample of the Cache decorator follows:
require_once "Cache/Lite.php";
$tr =& $tr->getDecorator 'CacheLiteFunction');
$tr->setOption('cacheDir', '/cache/lang/');
$tr->setOption('lifeTime', 3600);
with getDecorator you call the decorator. As you see we had to require the cache module file. Then we set the options: the time the system has to cache strings and the directory to use for caching. There are other options, feel free to explore.
The fallback system is even easier:
$tr =& $tr->getDecorator('Lang');
$tr->setOption('fallbackLang', 'en');
$tr->setOption('fallbackLang', 'it');
that means "if you can't find the string in the chosen language, fall back to english. If there isn't an english string, fall back to italian". You can cascade languages as you wish. The function works on a string level, so you can have a complete translated website with only a pair of strings defaulted to another language.
To do queries you use commands like:
$tr->get('PLEASE_LOGIN');
to ask for a translated string,
$tr->getRaw('PLEASE_LOGIN','LOGIN_PAGE');
to ask all translations for a string specifying the Page ID,
$tr->getLangs();
to enumerate all languages, and so on.
You can at this point insert data into the tables yourself, or use the handy administration API that the translation system exposes. Simply initialize it (the parameters array is the same):
$tr_admin = &Translation2_Admin::factory([...continue the same way as the Translation2 constructor...]);
Then you can issue commands like:
$tr_admin ->cleanCache();
to clean the cache
$tr_admin->add("PLEASE_LOGIN', 'LOGIN_PAGE', array("en"=>"Please insert your user name and password in the fields","it"=>"Prego inserire il tuo nome utente e password"));
to insert a new string into the database.
The API enables to easily add, modify and remove languages and strings.
If you want, Translation2 can use as a data source even an XML file or Gettext GNU .po / .mo compiled files. The decorators can also be used to convert between encodings.
That's all, folks!
I hope you liked the article, stay tuned for the next episode of Web localization in PHP: language detection by subscribing to the RSS feed.
I would also be happy to answer to any questions you could have on the Translation2 module. In particular, I've tightly integrated it into a Markdown parser (a PHPBB code like language) and some plugins to provide realtime translations into Smarty templates.