9.1. Internationalization Coding Guidelines

Preparing code to be presented in many languages can be complex and difficult. This section presents best practices for marking English strings in source so that they can be extracted, translated, and displayed to the user in the language of their choice.

9.1.1. General Internationalization Rules

For source files to be successfully localized, you need to prepare them so that any human-readable strings can be extracted by a pre-processing step, and then have localized strings used at runtime. This preparation requires attention to detail, and unfortunately limits what you can do with strings in the code.

Follow these general rules for internationalizing your code.

  1. Always mark complete sentences for translation. If you combine fragments at runtime, there is no way for the translator to construct a proper sentence in their language.
  2. Do not combine strings together at runtime.
  3. Limit the amount of text in strings that is not presented to the user. HTML markup is better applied after translation. If you include HTML in strings there is a chance that translators will translate your tags or attributes.
  4. Use placeholders with descriptive names: "Welcome {student_name}" is much better than "Welcome {0}".

For details, see Style Guidelines.

9.1.2. Editing Source Files

When you edit source files (including Python, JavaScript, or HTML template files), you should be aware of the following conventions.

  1. Know what has to be at the top of the file (if anything) to prepare it for i18n.
  2. Know how strings are marked for internationalization. This markup typically takes the form of a function call with the string as an argument.
  3. Know how translator comments are indicated. Such comments in the file will travel with the strings to the translators, giving them context to produce the best translation. Translator comments have a Translators: marker and must appear on the line preceding the text they describe. In Python, multi-line comments are supported for translator comments that need to be wrapped.

The code samples below show how to do each of these things for a variety of programming languages.

Note

Take into account not just the programming language involved, but the type of file that you are preparing for internationalization. For example, JavaScript embedded in an HTML Mako template is treated differently than JavaScript in a pure .js file. There are many different escaping methods that you can use. For more details, see Preventing Cross Site Scripting Vulnerabilities.

9.1.2.1. Python Source Code

In most Python source code, indicate strings for translation and add translator comments as shown. For more details, refer to the Django documentation.

from django.utils.translation import ugettext as _

# Translators: This will help the translator
message = _("Welcome!")

# Translators: This is a very long comment that needs to wrap
# over multiple lines because it would be too long otherwise.
message = _("Hello world")

Some edX code cannot use Django imports. To maintain portability, XBlocks, XModules, Inputtypes and Responsetypes forbid importing Django. Each of these has its own way of accessing translations, as shown in the following examples.

### for XBlock & XModule:
_ = self.runtime.service(self, "i18n").ugettext
# Translators: a greeting to newly-registered students.
message = _("Welcome!")

# for InputType and ResponseType:
_ = self.capa_system.i18n.ugettext
# Translators: a greeting to newly-registered students.
message = _("Welcome!")

Translator comments will work in these places too, so wherever possible, provide clarifying comments for translators. However, be aware of a quirk in the Python parser. When you write translator comments, make sure the message string is on the very next line after the comment.

The following example is not correct.

# Translators: this comment won't be properly harvested!
message = _(
    "Long message "
    "on a few lines."
)

This example is correct.

message = _(
    # Translators: this comment will be properly harvested!
    "Long message "
    "on a few lines."
)

9.1.2.2. Django Template Files

In Django template files (templates/*.html), indicate strings for translation and add translator comments as shown.

{% load i18n %}

{# Translators: this will help the translator. #}
{% trans "Welcome!" %}

9.1.2.3. Mako Template Files

In Mako template files (templates/*.html), you can use all of the tools available to Python programmers. Just make sure to import the relevant functions first. Here is a Mako template example.

<%page expression_filter="h"/>
<%! from django.utils.translation import ugettext as _ %>
...
## Translators: message to the translator. This comment may
## wrap on to multiple lines if needed, as long as they are
## lines directly above the marked up string.
${_("Welcome!")}

Make sure that all Mako comments, including translators comments, begin with two pound signs (#).

All translated strings should be text, not HTML. This means that for display in an HTML page, the strings must be HTML-escaped. In the example above, HTML- escaping is handled through the <%page> directive with the h filter. For more information, see Preventing Cross Site Scripting Vulnerabilities.

To mix plain text and HTML using format(), you must use the HTML() and Text() functions. Use the HTML() function when you have a replacement string that contains HTML tags. For the HTML() function to work, you must use the Text() function to wrap the plain text translated string. Both the HTML() and Text() functions must be closed before any calls to format().

<%page expression_filter="h"/>
<%!
from django.utils.translation import ugettext as _

from openedx.core.djangolib.markup import HTML, Text
%>
...
${Text(_("Click over to {link_start}the home page{link_end}.")).format(
    link_start=HTML('<a href="/home">'),
    link_end=HTML('</a>'),
)}

You can nest the formatting further. The rule is, any string which is HTML should be wrapped in the HTML() function, and any string which is not wrapped in HTML() should be escaped as needed to be displayed as regular text. Again, you must close the HTML() and Text() calls before making any call to format().

<%page expression_filter="h"/>
<%!
from django.utils.translation import ugettext as _

from openedx.core.djangolib.markup import HTML, Text
%>
...
${Text(_("Click over to {link_start}the home page{link_end}.")).format(
    link_start=HTML('<a href="{}">').format(home_page_link),
    link_end=HTML('</a>'),
)}

For more information on proper escaping, see Preventing Cross Site Scripting Vulnerabilities.

9.1.2.4. JavaScript Files

To internationalize JavaScript, the HTML template (base.html) must first load a special JavaScript library, and Django must be configured to serve it.

<script type="text/javascript" src="jsi18n/"></script>

Then, in JavaScript files (*.js):

// Translators: this will help the translator.
var message = gettext('Welcome!');

For interpolation with translated strings, you must use StringUtils.interpolate or HtmlUtils.interpolateHtml, as shown in the following example.

var message = StringUtils.interpolate(
    gettext('You are enrolling in {courseName}'),
    {
        courseName: 'Rock & Roll 101'
    }
)

For more details on how to use StringUtils and HtmlUtils, see Safe JavaScript Files.

Note that JavaScript embedded in HTML in a Mako template file is handled differently. There, you must use the Mako syntax even within the JavaScript.

9.1.2.5. CoffeeScript Files

CoffeeScript files are compiled to JavaScript files, so you indicate strings for translation and add translator comments mostly as you would in Javascript.

`// Translators: this will help the translator.`
message = gettext('Hey there!')

# Interpolation must use JavaScript, not CoffeeScript interpolation
var message = StringUtils.interpolate(
    gettext('You are enrolling in {courseName}'),
    {
        courseName: 'Rock & Roll 101'
    }
)

However, because strings are extracted from the compiled .js files, some native CoffeeScript features break the extraction from the .js files. Be aware of the following rules.

  1. Do not use CoffeeScript string interpolation. Doing so results in string concatenation in the .js file, preventing string extraction. Instead, use StringUtils.interpolate and HtmlUtils.interpolateHtml as documented in Safe JavaScript Files.
  2. Do not use CoffeeScript comments for translator comments. They are not passed through to the JavaScript files.
# DO NOT do this:
# Translators: this won't get to the translators!
message = gettext("This won't work")

# YES do this:
`// Translators: this will get to the translators.`
message = gettext("This works")

###
Translators: This will work, but takes three lines :(
###
message = gettext("Hey there")

9.1.2.6. Underscore Template Files

Underscore template files are used in conjunction with JavaScript, so the same techniques that are used for localization in Javascript are used for Underscore template files.

Make sure that the i18n JavaScript library has already been loaded, and then use the i18n function gettext and the StringUtils function StringUtils.interpolate in your template, as shown in this example.

<%-
    StringUtils.interpolate(
        gettext('You are enrolling in {courseName}'),
        {
            courseName: 'Rock & Roll 101'
        }
    )
%>

Important

Due to a bug in the underlying underscore extraction library, when StringUtils.interpolate and gettext are on the same line, the library will not work properly. In such cases, the library will extract the word gettext rather than the actual string that needs to be extracted. Make sure to separate StringUtils.interpolate and gettext into two different lines, as shown in the example above.

Note

You must use <%- for all translated strings that do not include HTML tags, as this will HTML-escape the strings before including them in the page.

If you have a translated string that includes a mix of HTML and plain text, you must use HtmlUtils.interpolateHtml along with <%=. Using <%= is only acceptable when you use an HtmlUtils function.

<%=
    HtmlUtils.interpolateHtml(
        gettext('You are enrolling in {spanStart}{courseName}{spanEnd}'),
        {
            courseName: 'Rock & Roll 101',
            spanStart: HtmlUtils.HTML('<span class="course-title">'),
            spanEnd: HtmlUtils.HTML('</span>')
        }
    )
%>

You can access HtmlUtils and StringUtils from inside a template that is processed using HtmlUtils.template(). For more details regarding the use of StringUtils and HtmlUtils, see Safe JavaScript Files.

Currently, translator comments are not supported in underscore template files, because the underlying library does not parse them out. You should add translator comments using standard comment syntax, so that when work is done to support translator comments, the comments are already defined in your code. Additionally, translator comments in the code will enable us to answer questions from translators.

9.1.2.7. Other Types of Content

We have not yet established guidelines for internationalizing the following types of content.

  • Course content (such as subtitles for videos)
  • Documentation (written for Sphinx as .rst files)

9.1.3. Coverage Testing

These instructions assume that you are a developer working on internationalizing new or existing user-facing features. To test that your code is properly internationalized, you generate a set of ‘dummy’ translations, then view those translations on your app’s page to make sure everything (scraping and serving) is working properly.

First, use the coverage tool to generate dummy files.

$ paver i18n_dummy

This step creates new dummy translations in the Esperanto directory (edx-platform/conf/local/eo/LC_MESSAGES) and the RTL directory (edx-platform/conf/local/rtl/LC_MESSAGES). DO NOT CHECK THESE FILES IN. You should discard these files once you have finished testing.

Next, restart the LMS and Studio to load in the new translation files.

$ paver devstack lms
$ paver devstack studio

Append /update_lang/ to the root LMS or Studio URL and use the form to set the preview language. The language code eo can be used to specify the test language.

Instead of plain English strings, you should see specially-accented English strings that look like this example.

Thé Fütüré øf Ønlïné Édüçätïøn Ⱡσяєм ι# Før änýøné, änýwhéré, änýtïmé Ⱡσяєм #

This dummy text is distinguished by extra accent characters. If you see plain English without these accents, it most likely means that the strings have not yet been marked for translation, or you have broken a rule. To fix this issue, follow these steps.

  1. Find the strings in the source tree (either in Python, JavaScript, or HTML template code).
  2. Refer to the coding guidelines above to make sure the strings have been properly externalized.
  3. Rerun the scripts and confirm that the strings are now properly converted into dummy text.

This dummy text is also distinguished by “Lorem ipsum” text at the end of each string, and is always terminated with “#”. The original English string is padded with about 30% more characters, to simulate languages (such as German) which tend to have longer strings than English. If you see problems with your page layout, such as columns that do not fit, or text that is truncated (the # character should always be displayed on every string), then you will probably need to fix the page layouts accordingly to accommodate the longer strings.

Finally, append /update_lang/ to the root LMS or Studio URL and set the language code to rtl to view your feature in the dummy right-to-left (RTL) language. Test to make sure that the user interface is properly “flipped” to right-to-left mode. Note that certain page elements might not look correct because they are controlled by the browser. For more effective testing, switch your browser’s language to Arabic or another RTL language (Hebrew, Persian, or Urdu) as well. See our RTL UI Guidelines for information about fixing any issues that you find.

When you have finished reviewing, append /update_lang/ to the LMS or Studio URL and reset your session to your base language.

9.1.3.1. Set Preview Language

Before you set the preview language, sign in to either LMS or Studio. Then append /update_lang/ to the root LMS or Studio URL. A form appears for you to set or clear the preview language. Set the Language code (for example use eo for the test language Esperanto), and then select Submit to set the preview language. Use the Reset button to reset the preview language to your default setting. Refresh your browser page to display the page in the selected language. The language persists for the duration of your session.

9.1.4. Style Guidelines

9.1.4.1. Do Not Append Strings or Interpolate Values

It can be difficult for translators to provide reasonable translations of small sentence fragments. If your code appends sentence fragments, even if it seems fine in English, the same concatenation is very unlikely to work properly for other languages.

Bad:

message = _("Welcome to the ") + settings.PLATFORM_NAME + _(" dashboard.")

In this scenario, the translator has to figure out how to translate these two separate strings. It is very difficult to translate a fragment such as “Welcome to the.” In some languages, the fragments will be in a different order. For example, in Spanish this phrase would be ordered as “Welcome to the dashboard of edX”.

It is much easier for a translator to figure out how to translate the entire sentence, using the pattern “Welcome to the {platform_name} dashboard.”

Good:

message = _("Welcome to the {platform_name} dashboard.").format(platform_name=settings.PLATFORM_NAME)

Note that you cannot concatenate (+) within the gettext call at all. The following example does not work.

Bad:

message = _(
    "Welcome to {platform_name}, the online learning platform " +
    "that hosts courses from world-class universities around the world!"
).format(platform_name=settings.PLATFORM_NAME)

In Python, because _() is a function, the following example works.

Good (Python only!):

message = _(
    "Welcome to {platform_name}, the online learning platform "
    "that hosts courses from world-class universities around the world!"
).format(platform_name=settings.PLATFORM_NAME)

However, in JavaScript and other languages, the gettext call cannot be broken up over multiple lines. You will have to live with long lines on gettext calls, and we make a style exception for this.

Bad:

message = gettext('Here is a really really long message that is' +
    'incorrectly broken over two lines.')

Good (JavaScript):

message = gettext('Here is a really really long message that is correctly left on a single line.')

9.1.4.2. Use Named Placeholders

Python string formatting provides both positional and named placeholders. Use named placeholders, never use positional placeholders. Positional placeholders cannot be translated into other languages, which might need to re-order them to make syntactically correct sentences. Even with a single placeholder, a named placeholder provides more context to the translator.

Bad:

message = _('Today is %s %d.') % (m, d)

OK:

message = _('Today is %(month)s %(day)s.') % {'month': m, 'day': d}

Best:

message = _('Today is {month} {day}.').format(month=m, day=d)

Notice that in English, the month comes first, but in Spanish the day comes first. This is reflected in the .po file in the following way.

# fragment from edx-platform/conf/locale/es/LC_MESSAGES/django.po
msgid "Today is {month} {day}."
msgstr "Hoy es {day} de {month}."

The resulting output is correct in each language.

English output: "Today is November 26."
Spanish output: "Hoy es 26 de Noviembre."

9.1.4.3. Only Translate Literal Strings

As programmers, we are used to using functions in flexible ways. But translation functions such as _() and gettext() cannot be used in the same ways as other functions. At runtime, they are real functions like any other, but they also serve as markers for the string extraction process.

For string extraction to work properly, the translation functions must be called only with literal strings. If you use them with a computed value, the string extracter will not have a string to extract.

The difference between the right way and the wrong way can be very subtle, as shown in these examples.

# BAD: This tries to translate the result of .format()
_("Welcome, {name}".format(name=student_name))

# GOOD: Translate the literal string, then use it with .format()
_("Welcome, {name}").format(name=student_name))
# BAD: The dedent always makes the same string, but the extractor can't find it.
_(dedent("""
.. very long message ..
"""))

# GOOD: Dedent the translated string.
dedent(_("""
.. very long message ..
"""))
# BAD: The string is separated from _(), the extractor won't find it.
if hello:
    msg = "Welcome!"
else:
    msg = "Goodbye."
message = _(msg)

# GOOD: Each string is wrapped in _()
if hello:
    message = _("Welcome!")
else:
    message = _("Goodbye.")

9.1.4.4. Be Aware of Nested Context

When you provide strings in templated files for translation, you have to be careful of nested context. For example, consider this JavaScript fragment in a Mako template.

<script>
var feeling = '${_("I love you.")}';
</script>

When the string is rendered in French, it will produce the following invalid JavaScript.

<script>
var feeling = 'Je t'aime.';
</script>

Avoid this issue by following the best practices detailed in Preventing Cross Site Scripting Vulnerabilities. Here is the same example with proper escaping.

<%!
from django.utils.translation import ugettext as _

from openedx.core.djangolib.js_utils import js_escaped_string
%>
...
<script>
var feeling = '${_("I love you.") | n, js_escaped_string}';
</script>

The code with proper escaping produces the following JavaScript-escaped code.

<script>
var feeling = 'Je t\u0027aime.';
</script>

9.1.4.5. Singular vs Plural

It can be tempting to improve a message by selecting singular or plural based on a count, as shown in the following example.

if count == 1:
    msg = _("There is 1 file.")
else:
    msg = _("There are {file_count} files.").format(file_count=count)

The example above is not the correct way to choose a string, because other languages have different rules for when to use singular and when plural, and there might be more than two choices.

One option is not to use different text for different counts.

msg = _("Number of files: {file_count}").format(file_count=count)

If you want to choose based on number, you need to use another gettext variant to do so.

from django.utils.translation import ungettext
msg = ungettext("There is {file_count} file", "There are {file_count} files", count)
msg = msg.format(file_count=count)

The example above will properly use count to find a correct string in the translation file; you can then use that string to format in the count.

9.1.4.6. Translating Too Early

When the _() function is called, it will fetch a translated string using the current user’s language to decide which string to fetch.

If you invoke the _() function before we know the user, then the wrong language might be used.

from django.utils.translation import ugettext as _

HELLO = _("Hello")
GOODBYE = _("Goodbye")

def get_greeting(hello):
    if hello:
        return HELLO
    else:
        return GOODBYE

Here the HELLO and GOODBYE constants are assigned when the module is first imported, at server startup. There is no current user at that time, so ugettext will use the server’s default language. When we eventually use those constants to show a message to the user, they are not looked up again, and the user will get the wrong language.

There are a few ways to deal with this situation. The first is to avoid calling _() until we have the user.

def get_greeting(hello):
    if hello:
        return _("Hello")
    else:
        return _("Goodbye")

Another way is to use Django’s ugettext_lazy function. Instead of returning a string, it returns a lazy object that will wait to do the lookup until it is actually used as a string.

from django.utils.translation import ugettext_lazy as _

Using this method can be tricky, because the lazy object does not act like a string in all cases.

The last way to solve the problem is to mark the string so that it will be extracted properly, but not actually do the lookup when the constant is defined.

from django.utils.translation import ugettext

_ = lambda text: text

HELLO = _("Hello")
GOODBYE = _("Goodbye")

def get_greeting(hello):
    if hello:
        return ugettext(HELLO)
    else:
        return ugettext(GOODBYE)

Here we define _() as a pass-through function, so the string will be found during extraction, but will not be translated too early. At runtime, we then use the real translation function to get the localized string.

9.1.4.7. Multi-line Strings

Translator notes must directly precede the string literals to which they refer. For example, the translator note here will not be passed along to translators.

# Translators: you will not be able to see this note because
# I do not directly prepend the line with the translated string literal.
# See the line directly below this one does not contain part of the string?
long_translated_string = _(
    "I am a long string, with many, many words. So many words that it is "
    "advisable that I be split over this line."
)

In such a case, make sure you format your code so that the string begins on a line directly below the translator note.

# Translators: you will be able to see this note.
# See how the line directly below this one contains the start of the string?
long_translated_string = _("I am a long string, with many, many words. "
                           "So many words that it is advisable that I "
                           "be split over this line.")