Friday, May 8, 2009

Unicode Releases Common Locale Data Repository, Version 1.7

Mountain View, CA, May 8, 2009 - The Unicode® Consortium announced today the release of the new version of the Unicode Common Locale Data Repository (Unicode CLDR 1.7), providing key building blocks for software to support the world's languages. Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others.

CLDR 1.7 contains data for 146 languages and 159 territories: 468 locales in all. Version 1.7 of the repository contains over 21% more locale data than the previous release, with over 40,000 new or modified data items from over 140 different contributors. Major contributors to CLDR 1.7 include Adobe, Apple, Google, IBM, and Sun, plus official representatives from a number of countries. Many other organizations and volunteers around the globe, including Gnome, Kotoistus, LISA, OpenOffice, and Utilika, have also made important contributions. The data for CLDR is gathered through the CLDR Survey Tool, which allows organizations and volunteers to contribute, compare, and vet locale data. In the development of this release, the process of gathering data was sped up, and the voting process was simplified.

The new features of Unicode CLDR 1.7 include:

  • New and improved data, including Indic data.
  • Enhanced number system support, including many non-decimal formats as well as spelled-out forms ("twenty-three")
  • Postal code format validity
  • New IETF BCP 47 (RFC 4646) support
  • Calendar preference data
  • Improved language population data, and language-script mapping data
  • Local DTD access
  • Improved currency symbols
  • Clarified specification of timezone parsing

Unicode CLDR 1.7 is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML: http://unicode.org/reports/tr35/). LDML is an XML format used for general interchange of locale data, such as in Microsoft's .NET. For web pages with different views of CLDR data, see http://unicode.org/cldr/charts.html.


For more information about the Unicode CLDR project (including charts) see http://cldr.unicode.org. The latest features of CLDR will also be showcased at the 33rd Internationalization and Unicode Conference (IUC) on October 14-16, 2009 in San Jose, CA — see http://unicodeconference.org/.