Thursday, February 2, 2012

UTS #10, Unicode Collation Algorithm, Version 6.1 Released

Mountain View, CA, USA – February 2, 2010 – The new version of Unicode Technical Standard #10, Unicode Collation Algorithm has been released, updating to Unicode Version 6.1.
This new version adds a number of features:
  • The collation ordering for the 732 new Unicode characters.
  • A major revision to the ordering of "variable" characters into groups, separating punctuation and symbols. This change may present migration issues for some implementations.
  • Options added for ignoring spaces and punctuation (but not symbols), and for reordering groupings of characters, such as putting Latin characters before Greek (for Greek users), or digits after letters.
There are also important improvements in documentation:
  • A new section on asymmetric search (where a query of the base character 'e' matches é, è,…, but a query of the more specific é doesn't match other accented versions or the base character).
  • Important restructuring and clarifications of other sections.

Wednesday, February 1, 2012

UTS #46, Unicode IDNA Compatibility Processing, Version 6.1 Released

Mountain View, CA, USA – February 1, 2010 – The new version of Unicode Technical Standard #46, Unicode IDNA Compatibility Processing has been released, updating to Unicode Version 6.1. It adds support for 528 additional characters in internationalized domain names (IDN).
The specification provides two main features for use with the internationalized domain names specification released in August 2010 (IDNA2008):
  1. A comprehensive mapping to reflect user expectations for casing and other variants of domain names. This mapping is allowed by IDNA2008, and follows the same principles as in the previous version of that specification (IDNA2003). It thus provides users consistency between old and new versions.
  2. A compatibility mechanism that supports internationalized domain names valid under the IDNA2003 specification and the IDNA2008 specification. This second feature allows browsers, search engines, and other clients to handle both old and new domain names during the transitional period until registries update their rules to follow IDNA2008.
UTS #46 supplies normative data tables that are synchronized with the latest version of the Unicode Standard, allowing implementations to update without recalculation. This new version also provides an "NV8" flag in the data files, making it easier for implementations to disable the compatibility mechanism.

Tuesday, January 31, 2012

Announcing the Unicode Standard, Version 6.1

Mountain View, January 31, 2012. The Unicode Consortium announces the release of Version 6.1 of the Unicode Standard, continuing Unicode's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added. For full details, see http://www.unicode.org/versions/Unicode6.1.0/.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E U+26FA+U+FE0E/ TENT text style
26FA FE0F U+26FA+U+FE0F/ TENT emoji style
26FD FE0E U+26FD+U+FE0E/ FUEL PUMP text style
26FD FE0F U+26FD+U+FE0F/ FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
  • UTS #10, Unicode Collation Algorithm
  • UTS #46, Unicode IDNA Compatibility Processing

Friday, January 6, 2012

Release candidate for Unicode 6.1 character data

Because Unicode is at the foundation of all modern software using text, it is important to verify that problems are not introduced with new versions. If your implementation uses Unicode data, please download and test the final release candidate of the Unicode 6.1 data (UCD) with your implementation now. Please note that the Unicode Collation Algorithm (UCA) and the Unicode IDNA Compatibility Processing are correlated with version 6.1; if you have an implementation of them, please check the data below as well.

That data can be found in:
  1. Unicode
    1. http://unicode.org/Public/6.1.0/ucd/ (data, semicolon-delimited)
    2. http://unicode.org/Public/6.1.0/ucdxml/ (data, xml)
    3. http://www.unicode.org/reports/tr44/proposed.html (documentation)
  2. UCA
    1. http://unicode.org/Public/UCA/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr10/proposed.html (documentation)
  3. IDNA compatibility
    1. http://unicode.org/Public/idna/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr46/proposed.html (documentation)
For more information, see http://unicode.org/versions/beta.html.
Note that at this point in the process, no substantive changes can be made unless:
  1. a problem is found in carrying out the actions directed by the Unicode Technical Committee for the release, or
  2. an editorial problem is found in the data comments or documentation.
The Unicode Consortium is planning to move up the release date of Unicode 6.1 (UCD and UAXes) to January instead of February, so any final comments should be made by January 6th. You can send your comments using the Contact Form (http://www.unicode.org/reporting.html).

The draft code charts for Unicode 6.1 have also been updated. We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.1 and to ensure that there are no regressions in glyph shapes for previously encoded characters. For links to the charts, see http://unicode.org/versions/beta.html.

Tuesday, December 13, 2011

Two New Public Review Issues: UTR #36, UTS #39

The Unicode Technical Committee has posted two new issues for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on January 30, 2012.

Please see the page for links to discussion and relevant documents.
Briefly, the new issues are:

Issue #208 Proposed Update UTR #36: Unicode Security Considerations
http://www.unicode.org/review/pri208/

This UTR is being prepared for an update to bring the IDNA 2008
references up to date. Public review and comment is invited on this draft.

Issue #209 Proposed Update Unicode Technical Standard #39 Unicode
Security Mechanisms
http://www.unicode.org/review/pri209/

This UTS is being prepared for an update to align with Unicode 6.1.
Public review and comment is invited on this draft.


To supply feedback on these issues, see
http://www.unicode.org/review/#feedback .

----
All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
see http://www.unicode.org/consortium/distlist.html#announcements