BabelStone : How many Unicode characters are there ?



The short answer is 136,690.

The long answer is it all depends on what you mean by a "Unicode character". The Unicode Standard version 10.0 (released June 2017) defines 136,690 encoded characters with unique identifying names mapped to immutable code points. However, these characters do not always correspond to user-perceived characters, as many user-perceived characters are represented in Unicode as a sequence of two or more encoded characters (and conversely, some single encoded characters may look like two or more distinct characters). For example, lowercase j with caron (Į°) is represented as a single encoded character (U+01F0 LATIN SMALL LETTER J WITH CARON), but the corresponding uppercase character (JĖŒ) is represented in Unicode as a sequence of two encoded characters (U+004A LATIN CAPITAL LETTER J + U+030C COMBINING CARON). Likewise, some emoji are encoded as single characters (e.g. U+2603 ☃ SNOWMAN), but many emoji are composed from sequences of encoded characters, in some cases arbitrarily long sequences of simple encoded emoji characters joined together with a Zero Width Joiner format character, and modified by a set of five skin tone modifier characters. Thus, a single emoji glyph representing a mixed race family of four might be represented by the eleven-character sequence <1F468 1F3FC 200D 1F469 1F3FD 200D 1F467 1F3FE 200D 1F466 1F3FF> (ðŸ‘Ļ🏞‍ðŸ‘ĐðŸ―â€ðŸ‘§ðŸū‍ðŸ‘ĶðŸŋ). Because the creation of characters using combining marks or as sequences of encoded characters is open-ended, it is not possible to say how many characters can be represented by Unicode. Nevertheless, this page attempts to plot the growth of the Unicode Standard since its initial release in 1991 in the tables and charts below.


Table of Unicode Data over Time


Unicode Version History
Version Date Scripts Blocks Total
Code
Points
Total Code Points Assigned Code Points Named Characters Number of
Characters
Added or
Removed
Assigned Unassigned Named
Characters
Control
Characters
Private
Use
Characters
Non
characters
Surrogate
Code
Points
Graphic
Characters
Format
Characters
1.0.0 October
1991
24 57 65,536 12,795 52,741 7,129 32 5,632 2 0 7,085 44 +7,129
1.0.1 June
1992
25 59 65,536 34,505 31,031 28,327 32 6,144 2 0 28,283 44 +21,204
-6
1.1 June
1993
24 63 65,536 40,635 24,901 34,168 65 6,400 2 0 34,151 17 +5,963
-89
(-33)
2.0 July
1996
25 67 1,114,112 178,500 935,612 38,885 65 137,468 34 2,048 38,867 18 +11,373
-6,656
2.1 May
1998
25 67 1,114,112 178,502 935,610 38,887 65 137,468 34 2,048 38,869 18 +2
3.0 September
1999
38 86 1,114,112 188,809 925,303 49,194 65 137,468 34 2,048 49,168 26 +10,307
3.1 March
2001
41 95 1,114,112 233,787 880,325 94,140 65 137,468 66 2,048 94,009 131 +44,946
3.2 March
2002
45 107 1,114,112 234,803 879,309 95,156 65 137,468 66 2,048 95,023 133 +1,016
4.0 April
2003
52 122 1,114,112 236,029 878,083 96,382 65 137,468 66 2,048 96,243 139 +1,226
4.1 March
2005
59 142 1,114,112 237,302 876,810 97,655 65 137,468 66 2,048 97,515 140 +1,273
5.0 July
2006
64 151 1,114,112 238,671 875,441 99,024 65 137,468 66 2,048 98,884 140 +1,369
5.1 April
2008
75 168 1,114,112 240,295 873,817 100,648 65 137,468 66 2,048 100,507 141 +1,624
5.2 October
2009
90 194 1,114,112 246,943 867,169 107,296 65 137,468 66 2,048 107,154 142 +6,648
6.0 October
2010
93 206 1,114,112 249,031 865,081 109,384 65 137,468 66 2,048 109,242 142 +2,088
6.1 January
2012
100 217 1,114,112 249,763 864,349 110,116 65 137,468 66 2,048 109,975 141 +732
6.2 September
2012
100 217 1,114,112 249,764 864,348 110,117 65 137,468 66 2,048 109,976 141 +1
6.3 September
2013
100 217 1,114,112 249,769 864,343 110,122 65 137,468 66 2,048 109,975 147 +5
7.0 June
2014
123 249 1,114,112 252,603 861,509 112,956 65 137,468 66 2,048 112,804 152 +2,834
8.0 June
2015
129 259 1,114,112 260,319 853,793 120,672 65 137,468 66 2,048 120,520 152 +7,716
9.0 June
2016
135 270 1,114,112 267,819 846,293 128,172 65 137,468 66 2,048 128,019 153 +7,500
10.0 June
2017
139 277 1,114,112 276,337 837,775 136,690 65 137,468 66 2,048 136,537 153 +8,518

Characters removed in Unicode 1.0.1:


Characters removed in Unicode 1.1:


Characters removed in Unicode 2.0:


Notes

For historic versions of Unicode the statistics are based on the General Category of the characters at the time of encoding, and do not take into account any subsequent changes in General Category. Thus the fact that 4.0 has 139 format characters and 4.1 has 140 format characters is not due to a new format character having been added in 4.1, but rather due to the General Category of U+200B ZERO WIDTH SPACE having been changed from Zs to Cf in Unicode 4.0.1. Note that the statistics for 1.0.0 and 1.0.1 are based upon Ken Whistler's reconstructed Unicode Character Data (named characters with reconstructed general category of Cc, including 0000..001F and 007F, are counted as format characters).

To help understand what we're talking about, here are some definitions of some of the terms used in the table (see Section 2.4 of the Unicode Standard for further information).



Scripts

Allocation of Characters by Script for Unicode 10.0
Script Name ISO 15924 Code Number of
Characters
Version
Introduced
Common Zyyy 7,279 1.0
Inherited Zinh 564 1.0
Arabic Arab 1,279 1.0
Armenian Armn 93 1.0
Bengali Beng 93 1.0
Bopomofo Bopo 70 1.0
Coptic Copt 137 1.0
Cyrillic Cyrl 443 1.0
Devanagari Deva 154 1.0
Georgian Geor 127 1.0
Greek Grek 518 1.0
Gujarati Gujr 85 1.0
Gurmukhi Guru 79 1.0
Han Hani 81,734 1.0
Hangul Hang 11,739 1.0
Hebrew Hebr 133 1.0
Hiragana Hira 91 1.0
Kannada Knda 88 1.0
Katakana Kana 300 1.0
Lao Laoo 67 1.0
Latin Latn 1,350 1.0
Malayalam Mlym 114 1.0
Oriya Orya 90 1.0
Tamil Taml 72 1.0
Telugu Telu 96 1.0
Thai Thai 86 1.0
Tibetan Tibt 207 1.0
Unknown Zzzz 985,875 1.0
Braille Brai 256 3.0
Canadian Aboriginal Cans 710 3.0
Cherokee Cher 172 3.0
Ethiopic Ethi 495 3.0
Khmer Khmr 146 3.0
Mongolian Mong 166 3.0
Myanmar Mymr 223 3.0
Ogham Ogam 29 3.0
Runic Runr 86 3.0
Sinhala Sinh 110 3.0
Syriac Syrc 77 3.0
Thaana Thaa 50 3.0
Yi Yiii 1,220 3.0
Deseret Dsrt 80 3.1
Gothic Goth 27 3.1
Old Italic Ital 36 3.1
Buhid Buhd 20 3.2
Hanunoo Hano 21 3.2
Tagalog Tglg 20 3.2
Tagbanwa Tagb 18 3.2
Cypriot Cprt 55 4.0
Limbu Limb 68 4.0
Linear B Linb 211 4.0
Osmanya Osma 40 4.0
Shavian Shaw 48 4.0
Tai Le Tale 35 4.0
Ugaritic Ugar 31 4.0
Buginese Bugi 30 4.1
Glagolitic Glag 132 4.1
Kharoshthi Khar 65 4.1
New Tai Lue Talu 83 4.1
Old Persian Xpeo 50 4.1
Syloti Nagri Sylo 44 4.1
Tifinagh Tfng 59 4.1
Balinese Bali 121 5.0
Cuneiform Xsux 1,234 5.0
N'Ko Nkoo 59 5.0
Phags-pa Phag 56 5.0
Phoenician Phnx 29 5.0
Carian Cari 49 5.1
Cham Cham 83 5.1
Kayah Li Kali 47 5.1
Lepcha Lepc 74 5.1
Lycian Lyci 29 5.1
Lydian Lydi 27 5.1
Ol Chiki Olck 48 5.1
Rejang Rjng 37 5.1
Saurashtra Saur 82 5.1
Sundanese Sund 72 5.1
Vai Vaii 300 5.1
Avestan Avst 61 5.2
Bamum Bamu 657 5.2
Egyptian Hieroglyphs Egyp 1,071 5.2
Imperial Aramaic Armi 31 5.2
Inscriptional Pahlavi Phli 27 5.2
Inscriptional Parthian Prti 30 5.2
Javanese Java 90 5.2
Kaithi Kthi 66 5.2
Lisu Lisu 48 5.2
Meetei Mayek Mtei 79 5.2
Old South Arabian Sarb 32 5.2
Old Turkic Orkh 73 5.2
Samaritan Samr 61 5.2
Tai Tham Lana 127 5.2
Tai Viet Tavt 72 5.2
Batak Batk 56 6.0
Brahmi Brah 109 6.0
Mandaic Mand 29 6.0
Chakma Cakm 67 6.1
Meroitic Cursive Merc 90 6.1
Meroitic Hieroglyphs Mero 32 6.1
Miao Plrd 133 6.1
Sharada Shrd 94 6.1
Sora Sompeng Sora 35 6.1
Takri Takr 66 6.1
Bassa Vah Bass 36 7.0
Caucasian Albanian Aghb 53 7.0
Duployan Dupl 143 7.0
Elbasan Elba 40 7.0
Grantha Gran 85 7.0
Khojki Khoj 62 7.0
Khudawadi Sind 69 7.0
Linear A Lina 341 7.0
Mahajani Mahj 39 7.0
Manichaean Mani 51 7.0
Mende Kikakui Mend 213 7.0
Modi Modi 79 7.0
Mro Mroo 43 7.0
Nabataean Nbat 40 7.0
Old North Arabian Narb 32 7.0
Old Permic Perm 43 7.0
Pahawh Hmong Hmng 127 7.0
Palmyrene Palm 32 7.0
Pau Cin Hau Pauc 57 7.0
Psalter Pahlavi Phlp 29 7.0
Siddham Sidd 92 7.0
Tirhuta Tirh 82 7.0
Warang Citi Wara 84 7.0
Ahom Ahom 57 8.0
Anatolian Hieroglyphs Hluw 583 8.0
Hatran Hatr 26 8.0
Multani Mult 38 8.0
Old Hungarian Hung 108 8.0
SignWriting Sgnw 672 8.0
Adlam Adlm 87 9.0
Bhaiksuki Bhks 97 9.0
Marchen Marc 68 9.0
Newa Newa 92 9.0
Osage Osge 72 9.0
Tangut Tang 6,881 9.0
Masaram Gondi Gonm 75 10.0
Nushu Nshu 397 10.0
Soyombo Soyo 80 10.0
Zanabazar Square Zanb 72 10.0


Charts








[Last updated : 2017-06-21]



BabelMap Online | Unicode Slide Show | What Unicode Character is This ?