BabelStone Blog

Saturday, 26 May 2012

What's new in Unicode 6.2 ?

Previously discussed :

The answer to the question "What's new in Unicode 6.2 ?" is rather short :


Yep, that's it, just a single new character. The Unicode Technical Committee (UTC) decided earlier this month to fast track the encoding of the recently announced currency symbol, as it had previously done with the newly invented Indian Rupee Sign ₹ (U+20B9, added to Unicode 6.0 in 2010) and the Euro Sign € (U+20AC, one of only two characters added to Unicode 2.1 in 1998 [kudos to anyone who knows what the other character was, and a special prize to anyone who has ever had cause to use it]). However, whereas the Indian Rupee Sign was fast tracked into an already scheduled release, the Turkish Lira Sign has the dubious honour of being the first ever character to be given an entirely new version of Unicode all to itself, Unicode 6.2, which will probably be released in late September or early October 2012. This also means that 2012 will be the first ever year during which more than one major or minor version of Unicode has been released.

Unicode releases are normally coordinated with publications of new editions or amendments to the corresponding international standard, ISO/IEC 10646 (see Unicode and ISO/IEC 10646 for details of the relationship between these two standards), but the next amendment to ISO/IEC 10646:2012 (i.e. Amendment 1, covering Linear A, Palmyrene, Manichaean, Khojki, Khudawadi, Bassa Vah, Duployan, and additional Wingdings symbols) isn't scheduled to start its final ballot until the end of this year, so a version of Unicode corresponding to Amendment 1 could not be released until spring 2013. In order to meet expected demand to use the newly devised currency sign as soon as possible, the UTC therefore decided not to wait until the next anticipated version of Unicode next year, but instead release a new version especially for the Turkish Lira Sign, on the assumption that the character is uncontroversial and will be accepted into ISO/IEC 1064 anyway. Of course this puts the ISO committee (WG2) in a slightly awkward position, as the ISO/IEC 10646 and Unicode repertoires need to be identical (and preferably synchronised), but Unicode 6.2 will probably be published before the committee even has a chance to discuss the proposal for the first time at its next meeting in October, and so faced with a fait accompli by the UTC it will have to accept the Turkish Lira Sign into ISO/IEC 1064 at the earliest opportunity regardless of what individual national body members of the committee may think of the new currency symbol. And as the UTC is looking into ways of making quicker releases of Unicode in response to industry demand to encode urgent-use characters, perhaps we will see more intercalary releases of Unicode with only one or two character additions in the future (there are probably some people who are looking forward to an accelerated release of Unicode 6.3 to meet the demand for the New Greek Drachma Sign, but that might be more controversial given the existence of the unused and unloved Drachma Sign ₯ at U+20AF [not to be confused with the ancient Greek Drachma Sign 𐅻 at U+1017B]).

The broader Unicode community did not all agree with the assessment that this was an uncontroversial addition, and a tsunami of emails has engulfed the Unicode mailing list since the initial announcement on 15 May. I don't want to be drawn into this futile argument, but if you want to start using the Turkish Lira Sign today, you can, as it is already included in Michael Everson's free Rupakara font. And if you are eager to take a closer look at Unicode 6.2, then I have just released beta versions of BabelPad and BabelMap that support Unicode 6.2 (NB the Unicode 6.2 data incorporated into BabelMap and BabelPad is provisional and subject to change before Unicode 6.2 is officially released, and so should not yet be relied on).

What Else ?

What else can we say about Unicode 6.2 ? Well, U+0709 ܉ SYRIAC SUBLINEAR COLON SKEWED RIGHT is getting a new formal alias: SYRIAC SUBLINEAR COLON SKEWED LEFT; U+1240F 𒐏 CUNEIFORM NUMERIC SIGN FOUR U through U+12414 𒐔 CUNEIFORM NUMERIC SIGN NINE U are having their numeric values changed from '4' through '9' to '40' through '90'; and U+065F ARABIC WAVY HAMZA BELOW is moving from inherited script to Arabic script. On a more practical point, the Unicode 6.2 code charts will for the first time show variation sequences, which are now growing in number at a startling rate.

On Beyond 6.2

The main side effect of this special release of Unicode 6.2 will be to push back the date of the release of version of Unicode synchronised with ISO/IEC 10646:2012 Amendment 1, which was originally anticipated for release next spring. It is now probable that the next version of Unicode (shall we call it Unicode 7.0?) will be synchronised with ISO/IEC 10646:2012 Amendments 1 and 2, and will not be released until early 2014. I will blog about the contents of Unicode 7.0 in October this year.

In the meantime, it is probable that an "update version" of Unicode (i.e. Unicode 6.2.1), which includes any required changes to character properties and updates to the standard annexes, but which does not include any changes to character repertoire, will be released in spring 2013. Unicode 6.2.1 will include the addition of 1,002 standardized variants for CJK Unified Ideographs, corresponding to CJK Compatibility Ideographs, as an alternative, roundtripable mechanism for representing compatibility ideographs. I suspect that this will confuse the hell out of implementations that assumed that variation sequences for CJK Unified Ideographs only ever used Variation Selectors 17 through 256, and that VS1 through VS16 were only used for variation sequences that did not feature Han ideographs.



Index of BabelStone Blog Posts