Gettext

Gettext is the defacto universal solution for internationalization (I18N) and localization (L10N), offering a set of tools that provides a framework to help other packages produce multi-lingual messages. It gives an opinionated way of how programs should be written to support translated message strings and a directory and file naming organisation for the messages that need to be translated.

In regards to directory conventions, we need to have a place to put our localised translations based on the specified locale language. For example, let’s say we need to support 2 languages English and Greek. Their language codes are en and el respectively.

We can create a directory named locales and inside we need to create directories for each language code and each folder will contain another directory named each LC_MESSAGES with one or multiple .po files.

So, the file structure should look like this:

locales/
├── el
│   └── LC_MESSAGES
│       └── base.po
└── en
    └── LC_MESSAGES
        └── base.po

A PO file contains a number of messages, partly independent text segments to be translated, which have been grouped into one file according to some logical division of what is being translated. Those groups are called domains. In the example above, we have only one domain named as base. The PO files themselves are also called message catalogs. The PO format is a plain text format.

Apart from PO files, you might sometimes encounter .mo files. MO, or Machine Object is a binary data file that contains object data referenced by a program. It is typically used to translate program code, and can be loaded or imported into the GNU gettext program.

In addition, there are also .pot files. These are the template files for PO files. They will have all the translation strings left empty. A POT file is essentially an empty PO file without the translations, with just the original strings. In practice we have the .pot files be generated from some tools and we should not modify them directly.

Usage⚑

The gettext module comes shipped with Python. It exposes two APIs. The first one is the basic API that supports the GNU gettext catalog API. The second one is the higher level one, class-based API that may be more appropriate for Python files. The class bases API offers more flexibility and greater convenience than the GNU gettext API and it is the recommended way of localizing your Python applications and modules.

In order to provide multilingual messages for your Python programs, you need to take the next steps:

Mark all translatable strings in your program with a wrapper function.
Run a suite of tools over your marked files to generate raw messages catalogs or POT files.
Duplicate the POT files into specific locale folders and write the translations.
Import and use the gettext module so that message strings are properly translated.

Let’s start with a function that prints some strings.

# main.py
def print_some_strings():
    print("Hello world")
    print("This is a translatable string")

if __name__ == '__main__':
    print_some_strings()

Now as it is you cannot provide localization options using gettext.

The first step is to specially mark all translatable strings in the program. To do that we need to wrap all the translatable strings inside _().

# main.py
import gettext
_ = gettext.gettext
def print_some_strings():
    print(_("Hello world"))
    print(_("This is a translatable string"))
if __name__=='__main__':
    print_some_strings()

Notice that we imported gettext and assigned _ as gettext.gettext. This is to ensure that our program compiles as well.

If you run the program, you will see that nothing has changed:

$: python main.py
Hello world
This is a translatable string

However, now we are able to proceed to the next steps which are extracting the translatable messages in a POT file.

Create the POT files⚑

For the purpose of automating the process of generating raw translatable messages from wrapped strings throughout the applications, the gettext library authors have provided a set to tools that help to parse the source files and to extract the messages in a general message catalog.

The Python distribution includes some specific programs called pygettext.py and msgfmt.py that recognize only python source code and not other languages.

Call it specifying the file you want to parse the strings for:

$: pygettext -d base -o locales/base.pot src/main.py

If you want to search for other strings than _, use the -k flag, for example -k gettext.

That will generate a base.pot file in the locales directory taken from our main.py program. Remember that POT files are just templates and we should not touch them. Let us inspect the contents of the base.pot file:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2018-01-28 16:47+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
#: src/main.py:5
msgid "Hello world"
msgstr ""
#: src/main.py:6
msgid "This is a translatable string"
msgstr ""

In a bigger program, we would have many translatable strings following. Here we specified a domain called base because the application is only one file. In bigger ones, I would use multiple domains in order to logically separate the different messages based on the application scope.

Notice that we have a simple convention for our translatable strings. msgid is the original string wrapped in _() . msgstr is the translation we need to provide.

Create the PO files⚑

Now we are ready to create our translations. Because we have the template generated for us, the next step is to create the required directory structure and copy the template into the right spot. We’ve seen the recommended file structure before. We are going to create 2 additional directories inside locales with the structure locales/$language/LC_MESSAGES/$domain.po

Where:

$language is the language identifier such as en or el
$domain is base.

Copy and rename the base.pot into the following directories locales/en/LC_MESSAGES/base.po and locales/el/LC_MESSAGES/base.po. Then modify their headers to include more information about the locale. For example, this is the Greek translation.

# My App.
# Copyright (C) 2018
#
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"POT-Creation-Date: 2018-01-28 16:47+0000\n"
"PO-Revision-Date: 2018-01-28 16:48+0000\n"
"Last-Translator: me <johndoe@example.com>\n"
"Language-Team: Greek <yourteam@example.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
#: main.py:5
msgid "Hello world"
msgstr "Χέρε Κόσμε"
#: main.py:6
msgid "This is a translatable string"
msgstr "Αυτό είναι ένα μεταφραζόμενο κείμενο"

Updating POT and PO files⚑

Once you add more strings or change some strings in your program, you execute again pygettext which regenerates the template file:

pygettext main.py -o po/hello.pot

Then you can update individual translation files to match newly created templates (this includes reordering the strings to match new template) with msgmerge:

msgmerge --previous --update po/cs.po po/hello.pot

Create the MO files⚑

The catalog is built from the .po file using a tool called msgformat.py. This tool will parse the .po file and generate an equivalent .mo file.

$: msgfmt -o base.mo base

This command will generate a base.mo file in the same folder as the base.po file.

So, the final file structure should look like this:

locales
├── el
│   └── LC_MESSAGES
│       ├── base.mo
│       └── base.po
├── en
│   └── LC_MESSAGES
│       ├── base.mo
│       └── base.po
└── base.pot

Switching Locale⚑

To have the ability to switch locales in our program we need to actually use the Class based gettext API. One of it's methods is gettext.translation, it accepts some parameters that can be used to load the associated .mo files of a particular language. If no .mo file is found, it raises an error.

Add the following code to the program:

import gettext
el = gettext.translation('base', localedir='locales', languages=['el'])
el.install()
_ = el.gettext # Greek

The first argument base is the domain and the method will look for a .po file with the same name in our locale directory. If you don’t specify a domain it will fallback to the messages domain. The localedir parameter is the directory location of the locales directory you created. The languages parameter is a hint for the searching mechanism to load particular language code more resiliently.

If you run the program again you will see the translations happening:

$ python main.py
Χαίρε Κόσμε
Αυτό είναι ένα μεταφραζόμενο κείμενο

The install method will cause all the _() calls to return the Greek translated strings globally into the built-in namespace. This is because we assigned _ to point to the Greek dictionary of translations. To go back to the English just assign _ to be the original gettext object.

_ = gettext.gettext

Finding Message Catalogs⚑

When there are cases where you need to locate all translation files at runtime, you can use the find function as provided by the class-based API. This function takes a few parameters in order to retrieve from the disk a list of .mo files available.

You can pass a localedir, a domain and a list of languages. If you don’t, the library module will use the respective defaults, which is not what you intended to do in most cases. For example, if you don’t specify a localdir parameter, it will fallback to sys.prefix + ‘/share/locale’ which is a global locale dir that can contain a lot of random files.

The language portion of the path is taken from one of several environment variables that can be used to configure localization features (LANGUAGE, LC_ALL, LC_MESSAGES, and LANG). The first variable found to be set is used. Multiple languages can be selected by separating the values with a colon :.

>>> os.environ['LANGUAGE']='el:en'
>>> gettext.find('base', 'locales')
'locales/el/LC_MESSAGES/base.mo'
>>> gettext.find('base', 'locales', all=True)
 ['locales/el/LC_MESSAGES/base.mo', 'locales/en/LC_MESSAGES/base.mo']

Using f-strings ⚑

You can't use f-strings inside gettext, you'll get an Seen unexpected token "f" error, you need to use the old format method:

_('Hey {},').format(username)

Integrations⚑

You can use it with weblate.