Formatting and Parsing

Overview

Formatters translate between binary data and human-readable textual representations of these values. For example, you cannot display the computer representation of the number 103. You can only display the numeral 103 as a textual representation (using three text characters). The result from a formatter is a string that contains text that the user will recognize as representing the internal value. A formatter can also parse a string by converting a textual representation of some value back into its internal representation. For example, it reads the characters 1, 0 and 3 followed by something other than a digit, and produces the value 103 as an internal binary representation.

These classes encapsulate information about the display of localized times, days, numbers, currencies, and messages. Formatting classes do both formatting and parsing and allow the separation of the data that the end-user sees from the code. Separating the program code from the data allows a program to be more easily localized. Formatting is converting a date, time, number, message or other object from its internal representation into a string. Parsing is the reverse operation. It is the process of converting a string to an internal representation of the date, time, number, message or other object.

Using the formatting classes is an important step in internationalizing your software because the format() and parse() methods in each of the classes make your software language neutral, by replacing implicit conversions with explicit formatting calls.

Internationalization Formatting Tips

This section discusses some of the ways you can format and parse numbers, currencies, dates, times and text messages in your program so that the data is separate from the code and can be easily localized. This is the information your users see on their computer screens, so it needs to be in a language and format that conforms to their local conventions.

Some things you need to keep in mind while you are creating your code are the following:

  • Keep your code and your data separate

  • Format the data in a locale-sensitive manner

  • Keep your code locale-independent

  • Avoid writing special routines to handle specific locales

  • String objects formatted by format() are parseable by the parse() method

Numbers and Currencies

Programs store and operate on numbers using a locale-independent binary representation. When displaying or printing a number it is converted to a locale-specific string. For example, the number 12345.67 is "12,345.67" in the US, "12 345,67" in France and "12.345,67" in Germany.

By invoking the methods provided by the NumberFormat class, you can format numbers, currencies, and percentages according to the specified or default locale. NumberFormat is locale-sensitive so you need to create a new NumberFormat for each locale. NumberFormat methods format primitive-type numbers, such as double and output the number as a locale-specific string.

For currencies you call getCurrencyInstance to create a formatter that returns a string with the formatted number and the appropriate currency sign. Of course, the NumberFormat class is unaware of exchange rates so, the number output is the same regardless of the specified currency. This means that the same number has different monetary values depending on the currency locale. If the number is 9988776.65 the results will be:

  • 9 988 776,65 € in France

  • 9.988.776,65 € in Germany

  • $9,988,776.65 in the United States

In order to format percentages, create a locale-specific formatter and call the getPercentInstance method. With this formatter, a decimal fraction such as 0.75 is displayed as 75%.

Customizing Number Formats

If you need to customize a number format you can use the DecimalFormat (§) and the DecimalFormatSymbols (§) classes in the Formatting Numbers chapter. This not usually necessary and it makes your code much more complex, but it is available for those rare instances where you need it. In general, you would do this by explicitly specifying the number format pattern.

If you need to format or parse spelled-out numbers, you can use the RuleBasedNumberFormat class (§) (see the Formatting Numbers chapter). You can instantiate a default formatter for a locale, or by using the RuleBasedNumberFormat rule syntax, specify your own.

Using NumberFormat (§) class methods (see the Formatting Numbers chapter) with a predefined locale is the easiest and the most accurate way to format numbers, and currencies.

Note See Properties and ICU Rule Syntax for information regarding syntax characters.

Date and Times

You display or print a Date by first converting it to a locale-specific string that conforms to the conventions of the end user's Locale. For example, Germans recognize 20.4.98 as a valid date, and Americans recognize 4/20/98.

NoteThe appropriate Calendar support is required for different locales. For example, the Buddhist calendar is the official calendar in Thailand so the typical assumption of Gregorian Calendar usage should not be used. ICU will pick the appropriate Calendar based on the locale you supply when opening a Calendar or DateFormat.

Messages

Message format helps make the order of display elements localizable. It helps address problems of grammatical differences in languages. For example, consider the sentence, "I go to work by car everyday." In Japanese, the grammar equivalent can be "Everyday, I to work by car go." Another example will be the plurals in text, for example, "no space for rent, one room for rent and many rooms for rent," where "for rent" is the only constant text among the three.

Formatting and Parsing Classes

ICU provides four major areas and twelve classes for formatting numbers, dates and messages:

General Formatting

  • Format
    The abstract superclass of all format classes. It provides the basic methods for formatting and parsing numbers, dates, strings and other objects.

  • FieldPosition
    A concrete class for holding the field constant and the begin and end indices for number and date fields.

  • ParsePosition
    A concrete class for holding the parse position in a string during parsing.

  • Formattable
    Formattable objects can be passed to the Format class or its subclasses for formatting. It encapsulates a polymorphic piece of data to be formatted and is used with MessageFormat. Formattable is used by some formatting operations to provide a single "type" that encompasses all formattable values (e.g., it can hold a number, a date, or a string, and so on).

  • UParseError
    UParseError is used to returned detailed information about parsing errors. It is used by the ICU parsing engines that parse long rules, patterns, or programs. This is helpful when the text being parsed is long enough that more information than a UErrorCode is needed to localize the error.

Formatting Numbers

  • NumberFormat (§)
    The abstract superclass that provides the basic fields and methods for formatting Number objects and number primitives to localized strings and parsing localized strings to Number objects.

  • DecimalFormat (§)
    A concrete class for formatting Number objects and number primitives to localized strings and parsing localized strings to Number objects, in base 10.

  • RuleBasedNumberFormat (§)
    A concrete class for formatting Number objects and number primitives to localized text, especially spelled-out format such as found in check writing (e.g. "two hundred and thirty-four"), and parsing text into Number objects.

  • DecimalFormatSymbols (§)
    A concrete class for accessing localized number strings, such as the grouping separators, decimal separator, and percent sign. Used by DecimalFormat.

Formatting Dates and Times

  • DateFormat (§)
    The abstract superclass that provides the basic fields and methods for formatting Date objects to localized strings and parsing date and time strings to Date objects.

  • SimpleDateFormat (§)
    A concrete class for formatting Date objects to localized strings and parsing date and time strings to Date objects, using a GregorianCalendar.

  • DateFormatSymbols (§)
    A concrete class for accessing localized date-time formatting strings, such as names of the months, days of the week and the time zone.

Formatting Messages

  • MessageFormat (§)
    A concrete class for producing a language-specific user message that contains numbers, currency, percentages, date, time and string variables.

  • ChoiceFormat (§)
    A concrete class for mapping strings to ranges of numbers and for handling plurals and names series in user messages.