Monday, March 31, 2014

Proposed Updates for Unicode Security-Related Publications

Proposed updates are now available for UTR #36, Unicode Security Considerations, and UTS #39, Unicode Security Mechanisms. These are both being updated to correspond with Unicode 7.0.

PRI #272, Proposed Update UTR #36, Unicode Security Considerations:
This UTR is being updated. In this draft, a description has been added about the downside of displaying URLs as Punycode. A note has also been added on the use of Catalan in identifiers.

PRI #273, Proposed Update UTS #39, Unicode Security Mechanisms:
This UTS is being updated to correspond with Unicode 7.0. Text has been added about the use of NFC, and on the use of Catalan in identifiers. A note has been added on the collection of confusable data outside of Status=allowed, such as for non-NFKC characters.

Review notes solicit feedback on whether to (a) add multi-character sequences to the data file, (b) change some of the Type values, and (c) base the data more on CLDR exemplars, and/or (d) change the format of the data files.

The closing date for both of these issues is April 28, 2014. For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions on the PRI pages.

The Public Review Issues page is:

Wednesday, March 19, 2014

CLDR Version 25 Released

Unicode CLDR 25 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Unicode CLDR 25 focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:
  • New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
  • Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
  • Two new locales: West Frisian (fy) and Uyghur (ug).
  • Two new metazones: Mexico_Pacific and Mexico_Northwest
  • Updated zh pinyin & zhuyin collations and translators for Unicode 6.3 kMandarin data
  • Updated keyboard layout data for OSX, Windows and others.
This version contains data for 238 languages and 259 territories—740 locales in all.

Details are provided in, along with a detailed Migration section.

Thursday, February 20, 2014

Unicode 7.0.0 Beta

The next version of the Unicode Standard will be Version 7.0.0. The beta information page for Unicode 7.0.0 is located at:
This version is planned for release in July 2014. A beta version of the 7.0.0 Unicode Character Database files is also available for public comment. We strongly encourage implementers to download these files and test them with their programs, well before the end of the beta period, April 28, 2014. These files are located in:
For detailed information and guidance on how to focus your review, see the section Notable Issues for Beta Testers on the beta page.

The Unicode Collation Algorithm (UCA) will be released in parallel with Unicode 7.0.0, and a beta version of the UCA is available at See also PRI #260.

Thursday, February 13, 2014

Feedback requested for Unicode 7.0

Unicode 7.0 is slated to be released early in 2014Q3. Now is your opportunity to comment on the contents of this release.

The text of the Unicode Standard Annexes (segmentation, normalization, line breaking, identifiers, etc.) is open for comments and feedback, with proposed update versions for many of the documents posted at UAX Proposed Updates. Changes for the text of the annexes is relatively minor for Unicode 7.0, but the documents could still benefit from careful review. For some of the annexes, no proposed update version is posted, because there is no planned change to the content other than a nominal change to the version numbering. In such cases, feedback provided on the existing, published Unicode Standard Annex is still welcome, and will be taken into consideration by the Unicode Technical Committee before the release of Unicode 7.0.

Feedback on the Unicode Standard Annexes for Unicode 7.0 should be submitted by April 28, 2014, for review at the Unicode Technical Committee meeting in May.

Other planned changes for Unicode 7.0 include a large number of additions to the character repertoire and corresponding updates to the Unicode character properties. An announcement will be sent soon, when the beta version of the Unicode Character Database for Unicode 7.0 is available for comment.

Tuesday, January 28, 2014

PRI #265, UAX #29, Unicode Text Segmentation

This Unicode Standard Annex will be updated for Unicode 7.0. The proposed update is now available for general public review and comment.

The exception list for SpacingMark has been updated for Unicode 7.0. That list excludes specific characters from being assigned the Grapheme_Cluster_Break property value SpacingMark by default. 

Please see the PRI page for details and instructions on how to review this issue and provide comments: