Tuesday, May 6, 2025

Unicode Technology Workshop 2025 — Call for Submissions Now Open!

📣 Call for Submissions Now Open!

Unicode is pleased to announce that session proposals for UTW 2025 are now being accepted!

We are seeking proposals for workshops, seminars, case studies, and tutorials that center around:

Unicode i18n libraries
Locale data frameworks
Globalization tooling
Localization pipelines
Input methods
Character encoding
Text rendering …and more!

Tutorial topics might include: font design and Unicode properties, introduction to Software Internationalization (i18n), and how to best support Bidirectional text.

Come connect with other Unicode users, share your knowledge and experience, and help us envision the future of Unicode technology. You will come away with deeper knowledge on how to solve tough problems in the i18n and l10n space and how to engineer products that work better for global users. Program and product managers who work with engineering teams are also strongly encouraged to join and propose sessions.

Deadline for submissions is June 30, 2025 by 5:00PM PT. Proposals will be reviewed in July and session hosts will be notified late July.

‼️Note: To encourage maximum collaboration amongst the attendees, this is an in-person-only event.

🗓️ Mark Your Calendars for Key Dates!

By May 16 - Early Bird Registration for Tutorials and UTW 2025 Opens

June 30 - Call for Submissions Closes - All Proposals, including Tutorials, Due

July 21 - Program Committee Notifications Go Out

August 11 - Early Bird Registration for Tutorials and UTW 2025 Closes

August 12 - Regular Registration for Tutorials and UTW 2025 Opens

See you there!

🫶 Sponsorship Opportunities

Sponsorship opportunities are available at various levels. Sponsorship benefits include complimentary registrations, opportunities to lead a session or workshop, recognition on the event website, program and event materials, visibility on social media, and much more. Specific offerings vary by sponsorship level.

If you want to demonstrate your industry leadership, enhance your brand, share your knowledge, promote your products and services, and foster community building, contact events@unicode.org today to learn more. Sponsorship discounts are available to Unicode Full and Supporting Members.

If you have any questions, please contact us at UTW2025@unicode.org

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Monday, May 5, 2025

Unicode CLDR Version 48: Submission Open

The Unicode CLDR Survey Tool is open for submission for version 48. CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 48 is focusing on:

Unicode 17 additions: new emoji, script names, …
Changes to the root and/or English names of many exemplar cities and some metazones
Additional number and date formats:
- New “relative” variant for date-time combining pattern
- Two new currency formats
- Rational number formats
- New ‘Year-First’ calendar formatting for year-month-day order (Gregorian).
Units:
- New units for languages in modern coverage
- Reworking certain concentration units
New Languages available for submission in Survey Tool:
- Buryat (bua)
- Coptic (cop)
- Haitian Creole (ht)
- Kazakh (Latin) (kk-Latn)
- Laz (lzz)
- Luri Bakhtiari (bqi)
- Nselxcin (Okanagan) (oka)
- Pāli (pi)
- Piedmontese (pms)
- Q’eqchi’ (kek)
- Samogitian (sgs)
- Sunuwar (suz)
- Chinese (Latin) (zh-Latn)

Submission of new data opened recently and is slated to finish on June 11. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on June 30. A public alpha makes the draft data available in early August, and the final release targets mid-October.

Each new locale starts with a small set of Core Data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.

Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Highlights from UTC #183

By Peter Constable, Chair of UTC

Unicode Technical Committee (UTC) meeting #183 was held April 22 – 24. Thanks to member company Microsoft for hosting at its Mountain View, CA campus. Here are some highlights.

Unicode 17.0 Beta

Unicode 17.0 is scheduled for release in September of this year. At UTC #183, technical decisions were taken for updates to be reflected in the Beta release, which will be available for public review later this month.

The most significant changes affecting Unicode 17.0 are encoding of 14 additional characters:

A new currency symbol, SAUDI RIYAL SIGN, was proposed by the Saudi Central Bank and will be added to Unicode 17.0. This has been assigned to code point U+20C1.

Note: We know that many vendors will want to implement support for this quickly. Keep in mind that, while it's unlikely that the code point will change, this isn't completely guaranteed until Unicode 17.0 is finalized at the next UTC meeting, in July.
For more background, see a recent Unicode Blog article, Support for the New Saudi Riyal Currency Symbol.

Thirteen new CJK unified ideographs will be added, twelve of which are needed for use in China. These were reviewed by experts in the Ideographic Research Group (IRG—a working group within ISO/IEC JTC 1/SC2), who recommended immediate encoding. For more information, see Sections 25 and 27 of the CJK & Unihan Working Group recommendations (L2/25-090).

Three characters that were to be newly-added have been removed. The Unicode 17.0 Alpha included the addition of Sidetic script, with 29 characters. (Sidetic is an historic script used in ancient Anatolia.) Based on expert feedback during the Alpha review, three of the characters were deemed not ready for encoding, and so will be removed from Unicode 17.0. Hence, the Beta will include only 26 Sidetic characters.

With these repertoire changes, Unicode 17.0 Beta will include 4,847 new characters.

There were other notable changes related to CJK Unified Ideographs. Thanks to ongoing research by IRG experts, a number of corrections will be made affecting already-encoded ideographs, including changes to the region-specific glyphs shown in the code charts and to source references (the details that map CJK Unified Ideographs to the specific ideograph forms used in different regions). One significant change being made is the horizontal extension of 2,145 existing CJK Unified Ideographs with the addition of glyphs and source data for those characters reflecting use in China. For details, see section 28 of L2/25-090.

Operational criteria for security-related classification of characters

One Unicode specification, UTS 39, Unicode Security Mechanisms, provides guidance on Unicode characters that should or should not be used in identifier systems where security is an issue, such as Internet domain names. It defines a General Security Profile for identifiers, which gives all Unicode characters a status of allowed or restricted. This is based on a classification of characters by a character property, Identifier_Type.

Up to now, there has been a basic description of the different Identifier_Type values, but not detailed operational criteria for assigning characters to the various types. UTC reviewed a proposal for such operational criteria—see L2/25-069, Factors used in determining the Identifier_Type of characters. These criteria were informed by work done in ICANN in defining rules used for determining permitted DNS and second-level domain name labels. UTC approved these criteria to be incorporated into UTS #39 and used for this purpose going forward.

Related to this, the Identifier_Type classifications of over 1000 characters will be revised in Unicode 17.0, in line with these criteria. (Similar changes were made during UTC #182 for a large number of CJK Unified Ideographs.)

New Unicode Technical Standards in development

When I sent email mentioning highlights from UTC #182, I mentioned two technical documents in early stages of development that were available for public review:

PRI #509, Proposed Draft UTS #58, Unicode Link Detection and Serialization
PRI #510, Proposed Draft UTR #59, East Asian Spacing

UTC #183 advanced both of these from Proposed Draft to Draft status.

Also, the specification for East Asian spacing will be changed from a Unicode Technical Report (UTR) to a Unicode Technical Standard (UTS). Technical reports are used to provide technical information, which could include potential algorithms that could be useful for implementations. But they are not used as a basis for specifying data or algorithms where interoperability between implementations is required. As pointed out in document L2/25-138, this new Unicode technical document will be referenced by CSS specifications for the text-autospace property which is in development and being implemented in browsers. Hence, it is appropriate for this Unicode document to be designated as a UTS.

In addition, UTC reviewed a proposal for another UTS and authorized its development: Proposed Draft UTS #61, Unicode Set Notation. Unicode specs for properties and algorithms often need to refer to sets of code points or strings using property assignments. Certain conventions have been used in UTC specs as well as in certain Unicode-provided tools and implementations, including the Unicode Utilities and ICU, and in the Unicode CLDR LDML spec. However, the conventions used in these various contexts have not been mutually consistent and interoperable. The proposed new UTS is a first step toward convergence of the conventions across these contexts. The proposed draft UTS has been posted for public review, and UTC invites feedback on it:

PRI #523, Proposed Draft UTS #61, Unicode Set Notation

Note: some working group reports are referred to for background details, but be sure to check the minutes for definitive outcomes, which sometimes differ from what working groups recommended. For complete details, see the draft UTC #183 minutes.

Internationalization & Unicode Technologies: Learn the Concepts, Apply the Tools. @LocWorld Malmö

We are pleased to announce that the Unicode Consortium will be onsite at LocWorld53 from June 3-5, 2025 in Malmö, Sweden. LocWorld is a premier conference for localization professionals, networking, and industry innovation. We hope to see you there!

On June 3rd, Unicode will offer two training sessions designed specifically for localization specialists. These sessions provide a comprehensive introduction to software internationalization (i18n) and Unicode technologies.

While each session stands independently, they are also complementary. Session 1 offers a beginner-friendly overview, while Session 2 dives deeper into practical implementation. Whether you're new to i18n or looking to refine your skills, these sessions will equip you with the knowledge and tools to collaborate effectively with developers and create globally accessible software.

June 3rd - Global Toolbox Sessions Highlights

Discount available for attendees from Unicode Organizational Members. More details below.

Registration for LocWorld Malmö is not required to attend a Global Toolbox session.

Session 1

A Friendly Introduction to Software Internationalization (i18n) and Unicode Technologies

Ideal for: Localization Program & Project Managers (15-40 attendees)

Join us for an engaging session that simplifies the complex world of software internationalization (i18n) and Unicode technologies. This session is tailored for non-technical localization specialists who want to bridge the gap between their expertise and the needs of software development teams. By the end, you’ll feel empowered to contribute to the creation of globally accessible, easily localizable applications and services.

Why You Should Attend

Whether you’re new to i18n or looking to deepen your understanding, this session will demystify the topic and leave you equipped to champion internationalization in your organization. Plus, you'll get practical tips and insights straight from a Spotify expert!

Ready to expand your horizons? Reserve your spot today and take the first step toward mastering software i18n!

Session 2

Practical Software Internationalization (i18n) with Unicode Technologies

Ideal for: Localization Program & Project Managers (15-40 attendees)

Take your internationalization (i18n) knowledge to the next level with this practical, hands-on session focused on applying Unicode i18n technologies. Whether you’re continuing from Session 1 or joining as a standalone attendee, this session is perfect for localization specialists looking to guide developers in implementing effective i18n solutions.

Why You Should Attend

Localization is about more than just translating words—it’s about creating seamless, culturally relevant experiences. This session breaks down the practical side of i18n and gives you the tools to turn concepts into action. Whether you're tackling a global project or simply want to enhance your skill set, this session equips you with the knowledge to make an impact.

Ready to get practical? Sign up today and learn how to bring software i18n to life with confidence!

Pricing and Member Discounts

Attendees from Unicode Organization Members are being offered a discount for attending - €250 for each session, €450 for both (for a single attendee). The cost for Non-Members will be €300 for each session, €550 for both (for a single attendee). This discount is also available to Unicode individual members along with regular contributors and volunteers of the Unicode Technical Committees and Working Groups. Please contact jill@localizationinstitute.com from your company email address for your discount code, if eligible. Registration to attend LocWorld is not required.

About Our Session Host: Joel Sahleen

Joel Sahleen most recently led the Internationalization Engineering team at Spotify, and is the volunteer lead for the Unicode Education Initiative. Trained as a classical Chinese linguist and a cross-cultural ethical theorist, for the last decade and a half, he has been working in the field of software internationalization and localization as an engineer, an architect, a team lead and a manager. A regular speaker at both internationalization and localization conferences, he is passionate about the technical, practical, and ethical aspects of developing software for a global audience.

If you have any questions, please contact us at events@unicode.org.

Sunday, April 27, 2025

Call for Submissions Open! -- So You Have an Idea for an Emoji? That’s Amazing!

First of all—congratulations on even thinking about contributing a new emoji to the world! The fact that you're curious about the Unicode emoji proposal process means you're already part of a fun, creative, and meaningful journey.

The Unicode Consortium is the organization that makes sure emojis work across all devices and platforms. Every year, the Emoji Standards & Research Working Group (ESR) reviews proposals from people just like you—yes, anyone can submit an emoji idea! Whether it’s a symbol that represents your culture, a concept you feel is missing, or something universally relatable, the process is open and inclusive.

A strong proposal includes data to support the emoji’s relevance (like search trends and usage patterns), a compelling explanation of its significance, and mockups showing how it might look. It’s not just about a cool idea—it’s about making a case for why this emoji will resonate with people around the world.

Jennifer Daniel, Chair of the ESR, highlights both the do’s and dont’s of this process with wit and insight, reminding us that while emoji are fun, their selection is thoughtful and rigorous. The emoji you propose today could be something people everywhere use in a few years to laugh, cry, or express love—one tiny symbol at a time.

Emoji proposals are being accepted until 2025-07-31. Please review the Guidelines for submitting Unicode® Emoji Proposals and submit your proposals here.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, March 13, 2025

ICU 77 Now Available!

Unicode® ICU 77 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

ICU 77 updates to CLDR 47 (beta blog) locale data with new locales, and various additions and corrections.

ICU 77 is mostly focused on bug fixes, segmentation conformance, and other refinements.

The Java technology preview implementation of the CLDR MessageFormat 2.0 specification has been updated to incorporate the CLDR 46.1 spec plus most but not all of the CLDR 47 changes.

The C++ technology preview implementation of MessageFormat 2.0 is not yet quite up to date with CLDR 46.1.

Please note that for ICU 78 (2025-oct) we are planning to (a) upgrade from Java 8 to Java 11, and (b) remove the ICU4J Locale Service Provider. See the ICU 77 page for details.

Unicode CLDR 47 Release: MessageFormat 2.0 Stable

CLDR 47 is now available and has been integrated into version 77 of ICU. The CLDR 47 release page has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues including upcoming changes planned in CLDR 48.

The Unicode CLDR project provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Key changes in CLDR 47

CLDR 47 did not have a Survey Tool submission phase, and focused on tooling and just a few functional areas. The biggest change is that the MessageFormat 2.0 specification has advanced from Final Candidate to Stable. This means that the stability guarantees are in place and implementations can finalize their APIs.

MessageFormat 2.0 Stable

Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. MessageFormat 2.0 enables developers and translators to create natural-sounding user interfaces that can appear in any language and support the needs of various cultures.

The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats) and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides the means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.

Tech Preview implementations are available in C++, Java, and JavaScript:

ICU4J, Java: com.ibm.icu.message2, part of ICU 76, is a tech preview implementation of the MessageFormat 2.0, together with a formatting API. See the ICU User Guide for examples and a quickstart guide, and Trying MF 2.0 Final Candidate to try a “Hello World”.
ICU4C, C++: icu::message2::MessageFormatter, part of ICU 76, is a tech preview implementation of MessageFormat 2.0, together with a formatting API. See the ICU User Guide for examples and a quickstart guide, and Trying MF 2.0 Final Candidate to try a “Hello World”.
Javascript: messageformat 4.0 provides a formatter and conversion tools for the MessageFormat 2 syntax, together with a polyfill of the runtime API proposed for ECMA-402.

(Because of the timing, these implement a slightly earlier version of the spec, but can be used for initial evaluation, testing, and experimentation.)

Tooling changes

Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in CLDR 47 provide faster turn-around for linguists and higher data quality. They are targeted at the CLDR 48 submission period, starting in April 2025.

For more information

See the CLDR 47 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.

Tuesday, May 6, 2025

Unicode Technology Workshop 2025 — Call for Submissions Now Open!

Adopt a Character and Support Unicode’s Mission