Formatting Messages

Overview

Messages are user-visible strings, often with variable elements like names, numbers and dates. Message strings are typically translated into the different languages of a UI, and translators move around the variable elements according to the grammar of the target language.

For this to work in many languages, a message has to be written and translated as a single unit, typically a string with placeholder syntax for the variable elements. If the user-visible string were concatenated directly from fragments and formatted elements, then translators would not be able to rearrange the pieces, and they would have a hard time translating each of the string fragments.

MessageFormat

The ICU MessageFormat class uses message "pattern" strings with variable-element placeholders (called "arguments" in the API docs) enclosed in {curly braces}. The argument syntax can include formatting details, otherwise a default format is used. For details about the pattern syntax and the formatting behavior see the MessageFormat API docs (JavaC++C).

Complex Argument Types

Certain types of arguments select among several choices which are nested MessageFormat pattern strings. Keeping these choices together in one message pattern string facilitates translation in context, by one single translator. (Commercial translation systems often distribute different messages to different translators.)

  • Use a "plural" argument to select sub-messages based on a numeric value, together with the plural rules for the specified language.
  • Use a "select" argument to select sub-messages via a fixed set of keywords.
  • Use of the old "choice" argument type is discouraged. It cannot handle plural rules for many languages, and is clumsy for simple selection.

It is tempting to cover only a minimal part of a message string with a complex argument (e.g., plural). However, this is difficult for translators for two reasons: 1. They might have trouble understanding how the sentence fragments in the argument sub-messages interact with the rest of the sentence, and 2. They will not know whether and how they can shrink or grow the extent of the part of the sentence that is inside the argument to make the whole message work for their language.

Recommendation: If possible, use complex arguments as the outermost structure of a message, and write full sentences in their sub-messages. If you have nested select and plural arguments, place the select arguments (with their fixed sets of choices) on the outside and nest the plural arguments (hopefully at most one) inside.

For example:

"{gender_of_host, select, "
  "female {"
    "{num_guests, plural, offset:1 "
      "=0 {{host} does not give a party.}"
      "=1 {{host} invites {guest} to her party.}"
      "=2 {{host} invites {guest} and one other person to her party.}"
      "other {{host} invites {guest} and # other people to her party.}}}"
  "male {"
    "{num_guests, plural, offset:1 "
      "=0 {{host} does not give a party.}"
      "=1 {{host} invites {guest} to his party.}"
      "=2 {{host} invites {guest} and one other person to his party.}"
      "other {{host} invites {guest} and # other people to his party.}}}"
  "other {"
    "{num_guests, plural, offset:1 "
      "=0 {{host} does not give a party.}"
      "=1 {{host} invites {guest} to their party.}"
      "=2 {{host} invites {guest} and one other person to their party.}"
      "other {{host} invites {guest} and # other people to their party.}}}}"

Note: In a plural argument like in the example above, if the English message has both =0 and =1 (up to =offset+1) then it does not need a "one" variant because that would never be selected. It does always need an "other" variant.

Note: The translation system and the translator together need to add "one", "few" etc. if and as necessary per target language.

Quoting/Escaping

If syntax characters occur in the text portions, then they need to be quoted by enclosing the syntax in pairs of ASCII apostrophes. A pair of ASCII apostrophes always represents one ASCII apostrophe, similar to %% in printf representing one %, although this rule still applies inside quoted text. ("This '{isn''t}' obvious" → "This {isn't} obvious")

  • Before ICU 4.8, ASCII apostrophes always started quoted text and had inconsistent behavior in nested sub-messages, which was a source of problems with authoring and translating message pattern strings.
  • Starting with ICU 4.8, an ASCII apostrophe only starts quoted text if it immediately precedes a character that requires quoting (that is, "only where needed"), and works the same in nested messages as on the top level of the pattern. The new behavior is otherwise compatible; for details see the MessageFormat and MessagePattern (new in ICU 4.8) API docs.
  • Recommendation: Use the real apostrophe (single quote) character ’ (U+2019) for human-readable text, and use the ASCII apostrophe ' (U+0027) only in program syntax, like quoting in MessageFormat. See the annotations for U+0027 Apostrophe in The Unicode Standard.

Custom Format Objects

The MessageFormat class allows setting custom Format objects to format arguments, overriding the arguments' pattern specification. This is discouraged: For custom formatting of some values it should normally suffice to format them externally and to provide the formatted strings to the MessageFormat.format() methods.

Only the top-level arguments are accessible and settable via setFormat(), getFormat() etc. Arguments inside nested sub-messages, inside choice/plural/select arguments, are "invisible" via these API methods.

Some of these methods (the ones corresponding to the original JDK MessageFormat API) address the top-level arguments in their order of appearance in the pattern string, which is usually not useful because it varies with translations. Newer methods address arguments by argument number ("index") or name.

Examples

The following code fragment created this output: "At 4:34:20 PM on 23-Mar-98, there was a disturbance in the Force on planet 7."

    UErrorCode err = U_ZERO_ERROR;
    Formattable arguments[] = {
       (int32_t)7,
       Formattable(Calendar.getNow(), Formattable::kIsDate),
       "a disturbance in the Force"
    };

    UnicodeString result;
    result = MessageFormat::format(
       "At {1,time} on {1,date}, there was {2} on planet{0,number,integer}.",
       arguments,
       3,
       result,
       err);

There are several more usage examples for the MessageFormat and ChoiceFormat classes in C , C++ and Java.

Comments