Wednesday, October 8, 2014

Unicode Version 7.0 - Complete Text of the Core Specification Published

Mountain View, CA, October 8, 2014 - The Unicode® Consortium announces the publication of the core specification for Unicode 7.0. The Version 7.0 core specification contains significant changes:
  • Major reorganization of the chapters and overall layout
  • New page size tailored for easy viewing on e-readers and other mobile devices
  • Addition of twenty-two new scripts and a shorthand writing system
  • Alignment with updates to the Unicode Bidirectional Algorithm
In Version 7.0, the standard grew by 2,834 characters. This version continues the Unicode Consortium's long-term commitment to support the full diversity of languages around the world with its newly encoded scripts and other additional characters. The text of the latest version documents two newly adopted currency symbols: the manat, used in Azerbaijan, and the ruble, used in Russia and other countries. It also includes information about newly added pictographic symbols, geometric symbols, arrows and ornaments.

This version of the Standard brings technical improvements to support implementers, including further clarification of the case pair stability policy, and a new stability policy for Numeric_Type.

All other components of Unicode 7.0 were released on June 16, 2014: the Unicode Standard Annexes, code charts, and the Unicode Character Database, to allow vendors to update their implementations of Unicode 7.0 as early as possible. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 7.0.

For more information on all of The Unicode Standard, Version 7.0, see

Thursday, September 25, 2014

Updated Unicode Security Specifications and Guidelines

The major Unicode security-related specifications and guidelines have been updated for Unicode 7.0. The security-related data files have undergone a major revision to improve their algorithmic consistency, as well as to take into account new information about confusable character data. We strongly advise that implementations be updated to make use of this new data. Pay particular attention to persistent data stores, such as database indexes, that use strings folded with the previous version of the data files. Mixing strings folded with new and old data files in the same persistent store will likely cause failures. It may be necessary to provide APIs for both old and new folding during a migration.

The guidelines have also been updated with descriptions of additional security issues. In particular, it is now clear that display of Punycode URLs as a security measure can, in some circumstances, actually make the spoofing problem worse.

Punycode Spoofing Image

For details, see:

Unicode Security Considerations:
Unicode Security Mechanisms:

Wednesday, September 24, 2014

Proposed Update UAXes for Unicode 8.0

Proposed updates for several of the Unicode Standard Annexes for Version 8.0 of the Unicode Standard have been posted for public review. See for details and links to the various documents.

UTS #10, Unicode Collation Algorithm has also been posted for public review. In this update, Cyrillic contractions have been removed. See the Modifications section of the draft document for further information.

Review periods for provision of feedback on these proposed updates close on October 20, 2014 for the November UTC meeting, but there will be further opportunities for feedback on the annexes after that November meeting.

To supply feedback on these issues, please see

Monday, September 22, 2014

New version of UTR #50, Unicode Vertical Text Layout released

A new revision of UTR #50, Unicode Vertical Text Layout, has been released. The data tables have been updated, to bring them into line with Unicode 7.0. A few additional changes in the property values were made, mainly for consistency across similar characters.

Thursday, September 18, 2014

CLDR Version 26 Released

CLDR 26 Coverage Unicode CLDR 26 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. This release focused primarily on Unicode 7.0 compatibility, Survey Tool improvements, increased coverage, new units, and improvements to collation and RBNF. Changes include the following:
  • Data Growth: Major increase in the number of translations, with 77 locales now reaching the 100% modern coverage level, and an overall growth of about 20% in data.
  • Units: Added 72 new units, added display names for all units and a new perUnitPattern (eg, liters per second).
  • Collation: Updated collation (sorting) to Unicode 7.0, moved Unihan radical-stroke collation into root to avoid duplication, used import to reduce source size by 23% and ease maintenance. Major changes to Arabic collation.
  • Spell-out numbers: improvements for round-trip fidelity; new syntax for use of plural categories.
  • Specification: documented new structure, \x{h...h} syntax for Unicode code points; construction of “unit per unit” formats; clarified BCP47 and Unicode identifiers, and different kinds of locale lookup, matching, and inheritance.
  • Survey Tool: Major improvements to the UI to make it easier and faster to enter and check data.
Details are provided in, along with a detailed Migration section.