BabelStone : How many Unicode characters are there ?



The short answer is 143,859.

The long answer is it all depends on what you mean by a "Unicode character". The Unicode Standard version 13.0 (released 10 March 2020) defines 143,859 encoded characters with unique identifying names mapped to immutable code points. However, these characters do not always correspond to user-perceived characters, as many user-perceived characters are represented in Unicode as a sequence of two or more encoded characters (and conversely, some single encoded characters may look like two or more distinct characters). For example, lowercase j with caron (Į°) is represented as a single encoded character (U+01F0 LATIN SMALL LETTER J WITH CARON), but the corresponding uppercase character (JĖŒ) is represented in Unicode as a sequence of two encoded characters (U+004A LATIN CAPITAL LETTER J + U+030C COMBINING CARON). Likewise, some emoji are encoded as single characters (e.g. U+2603 ☃ SNOWMAN), but many emoji are composed from sequences of encoded characters, in some cases arbitrarily long sequences of simple encoded emoji characters joined together with a Zero Width Joiner format character, and modified by a set of five skin tone modifier characters. Thus, a single emoji glyph representing a mixed race family of four might be represented by the eleven-character sequence <1F468 1F3FC 200D 1F469 1F3FD 200D 1F467 1F3FE 200D 1F466 1F3FF> (ðŸ‘Ļ🏞‍ðŸ‘ĐðŸ―â€ðŸ‘§ðŸū‍ðŸ‘ĶðŸŋ). Because the creation of characters using combining marks or as sequences of encoded characters is open-ended, it is not possible to say how many characters can be represented by Unicode. Nevertheless, this page attempts to plot the growth of the Unicode Standard since its initial release in 1991 in the tables and charts below.


Table of Unicode Data over Time


Unicode Version History
Version Date Scripts Blocks Total
Code
Points
Total Code Points Assigned Code Points Named Characters Number of
Characters
Added or
Removed
Assigned Unassigned Named
Characters
Control
Characters
Private
Use
Characters
Non
characters
Surrogate
Code
Points
Graphic
Characters
Format
Characters
1.0.0 October
1991
24 57 65,536 12,795 52,741 7,129 32 5,632 2 0 7,085 44 +7,129
1.0.1 June
1992
25 59 65,536 34,505 31,031 28,327 32 6,144 2 0 28,283 44 +21,204
-6
1.1 June
1993
24 63 65,536 40,635 24,901 34,168 65 6,400 2 0 34,151 17 +5,963
-89
(-33)
2.0 July
1996
25 67 1,114,112 178,500 935,612 38,885 65 137,468 34 2,048 38,867 18 +11,373
-6,656
2.1 May
1998
25 67 1,114,112 178,502 935,610 38,887 65 137,468 34 2,048 38,869 18 +2
3.0 September
1999
38 86 1,114,112 188,809 925,303 49,194 65 137,468 34 2,048 49,168 26 +10,307
3.1 March
2001
41 95 1,114,112 233,787 880,325 94,140 65 137,468 66 2,048 94,009 131 +44,946
3.2 March
2002
45 107 1,114,112 234,803 879,309 95,156 65 137,468 66 2,048 95,023 133 +1,016
4.0 April
2003
52 122 1,114,112 236,029 878,083 96,382 65 137,468 66 2,048 96,243 139 +1,226
4.1 March
2005
59 142 1,114,112 237,302 876,810 97,655 65 137,468 66 2,048 97,515 140 +1,273
5.0 July
2006
64 151 1,114,112 238,671 875,441 99,024 65 137,468 66 2,048 98,884 140 +1,369
5.1 April
2008
75 168 1,114,112 240,295 873,817 100,648 65 137,468 66 2,048 100,507 141 +1,624
5.2 October
2009
90 194 1,114,112 246,943 867,169 107,296 65 137,468 66 2,048 107,154 142 +6,648
6.0 October
2010
93 206 1,114,112 249,031 865,081 109,384 65 137,468 66 2,048 109,242 142 +2,088
6.1 January
2012
100 217 1,114,112 249,763 864,349 110,116 65 137,468 66 2,048 109,975 141 +732
6.2 September
2012
100 217 1,114,112 249,764 864,348 110,117 65 137,468 66 2,048 109,976 141 +1
6.3 September
2013
100 217 1,114,112 249,769 864,343 110,122 65 137,468 66 2,048 109,975 147 +5
7.0 June
2014
123 249 1,114,112 252,603 861,509 112,956 65 137,468 66 2,048 112,804 152 +2,834
8.0 June
2015
129 259 1,114,112 260,319 853,793 120,672 65 137,468 66 2,048 120,520 152 +7,716
9.0 June
2016
135 270 1,114,112 267,819 846,293 128,172 65 137,468 66 2,048 128,019 153 +7,500
10.0 June
2017
139 277 1,114,112 276,337 837,775 136,690 65 137,468 66 2,048 136,537 153 +8,518
11.0 June
2018
146 288 1,114,112 277,021 837,091 137,374 65 137,468 66 2,048 137,220 154 +684
12.0 March
2019
150 297 1,114,112 277,575 836,537 137,928 65 137,468 66 2,048 137,765 163 +554
12.1 May
2019
150 297 1,114,112 277,576 836,536 137,929 65 137,468 66 2,048 137,766 163 +1
13.0 March
2020
154 305 1,114,112 283,506 830,606 143,859 65 137,468 66 2,048 143,696 163 +5,930

Characters removed in Unicode 1.0.1:


Characters removed in Unicode 1.1:


Characters removed in Unicode 2.0:


Notes

For historic versions of Unicode the statistics are based on the General Category of the characters at the time of encoding, and do not take into account any subsequent changes in General Category. Thus the fact that 4.0 has 139 format characters and 4.1 has 140 format characters is not due to a new format character having been added in 4.1, but rather due to the General Category of U+200B ZERO WIDTH SPACE having been changed from Zs to Cf in Unicode 4.0.1. Note that the statistics for 1.0.0 and 1.0.1 are based upon Ken Whistler's reconstructed Unicode Character Data (named characters with reconstructed general category of Cc, including 0000..001F and 007F, are counted as format characters).

To help understand what we're talking about, here are some definitions of some of the terms used in the table (see Section 2.4 of the Unicode Standard for further information).



Scripts

Allocation of Characters by Script for Unicode 13.0
Script Name ISO 15924 Code Number of
Characters
Version
Introduced
Common Zyyy 8,088 1.0
Inherited Zinh 575 1.0
Adlam Adlm 88 9.0
Ahom Ahom 58 8.0
Anatolian Hieroglyphs Hluw 583 8.0
Arabic Arab 1,291 1.0
Armenian Armn 95 1.0
Avestan Avst 61 5.2
Balinese Bali 121 5.0
Bamum Bamu 657 5.2
Bassa Vah Bass 36 7.0
Batak Batk 56 6.0
Bengali Beng 96 1.0
Bhaiksuki Bhks 97 9.0
Bopomofo Bopo 77 1.0
Brahmi Brah 109 6.0
Braille Brai 256 3.0
Buginese Bugi 30 4.1
Buhid Buhd 20 3.2
Canadian Aboriginal Cans 710 3.0
Carian Cari 49 5.1
Caucasian Albanian Aghb 53 7.0
Chakma Cakm 71 6.1
Cham Cham 83 5.1
Cherokee Cher 172 3.0
Chorasmian Chrs 28
Coptic Copt 137 1.0
Cuneiform Xsux 1,234 5.0
Cypriot Cprt 55 4.0
Cyrillic Cyrl 443 1.0
Deseret Dsrt 80 3.1
Devanagari Deva 154 1.0
Dives Akuru Diak 72 13.0
Dogra Dogr 60 11.0
Duployan Dupl 143 7.0
Egyptian Hieroglyphs Egyp 1,080 5.2
Elbasan Elba 40 7.0
Elymaic Elym 23 12.0
Ethiopic Ethi 495 3.0
Georgian Geor 173 1.0
Glagolitic Glag 132 4.1
Gothic Goth 27 3.1
Grantha Gran 85 7.0
Greek Grek 518 1.0
Gujarati Gujr 91 1.0
Gunjala Gondi Gong 63 11.0
Gurmukhi Guru 80 1.0
Han Hani 94,202 1.0
Hangul Hang 11,739 1.0
Hanifi Rohingya Rohg 50 11.0
Hanunoo Hano 21 3.2
Hatran Hatr 26 8.0
Hebrew Hebr 134 1.0
Hiragana Hira 379 1.0
Imperial Aramaic Armi 31 5.2
Inscriptional Pahlavi Phli 27 5.2
Inscriptional Parthian Prti 30 5.2
Javanese Java 90 5.2
Kaithi Kthi 67 5.2
Kannada Knda 89 1.0
Katakana Kana 304 1.0
Kayah Li Kali 47 5.1
Kharoshthi Khar 68 4.1
Khitan Small Script Kits 471 13.0
Khmer Khmr 146 3.0
Khojki Khoj 62 7.0
Khudawadi Sind 69 7.0
Lao Laoo 82 1.0
Latin Latn 1,374 1.0
Lepcha Lepc 74 5.1
Limbu Limb 68 4.0
Linear A Lina 341 7.0
Linear B Linb 211 4.0
Lisu Lisu 49 5.2
Lycian Lyci 29 5.1
Lydian Lydi 27 5.1
Mahajani Mahj 39 7.0
Makasar Maka 25 11.0
Malayalam Mlym 118 1.0
Mandaic Mand 29 6.0
Manichaean Mani 51 7.0
Marchen Marc 68 9.0
Masaram Gondi Gonm 75 10.0
Medefaidrin Medf 91 11.0
Meetei Mayek Mtei 79 5.2
Mende Kikakui Mend 213 7.0
Meroitic Cursive Merc 90 6.1
Meroitic Hieroglyphs Mero 32 6.1
Miao Plrd 149 6.1
Modi Modi 79 7.0
Mongolian Mong 167 3.0
Mro Mroo 43 7.0
Multani Mult 38 8.0
Myanmar Mymr 223 3.0
Nabataean Nbat 40 7.0
Nandinagari Nand 65 12.0
New Tai Lue Talu 83 4.1
Newa Newa 97 9.0
N'Ko Nkoo 62 5.0
Nushu Nshu 397 10.0
Nyiakeng Puachue Hmong Hmnp 71 12.0
Ogham Ogam 29 3.0
Ol Chiki Olck 48 5.1
Old Hungarian Hung 108 8.0
Old Italic Ital 39 3.1
Old North Arabian Narb 32 7.0
Old Permic Perm 43 7.0
Old Persian Xpeo 50 4.1
Old Sogdian Sogo 40 11.0
Old South Arabian Sarb 32 5.2
Old Turkic Orkh 73 5.2
Oriya Orya 91 1.0
Osage Osge 72 9.0
Osmanya Osma 40 4.0
Pahawh Hmong Hmng 127 7.0
Palmyrene Palm 32 7.0
Pau Cin Hau Pauc 57 7.0
Phags-pa Phag 56 5.0
Phoenician Phnx 29 5.0
Psalter Pahlavi Phlp 29 7.0
Rejang Rjng 37 5.1
Runic Runr 86 3.0
Samaritan Samr 61 5.2
Saurashtra Saur 82 5.1
Sharada Shrd 96 6.1
Shavian Shaw 48 4.0
Siddham Sidd 92 7.0
SignWriting Sgnw 672 8.0
Sinhala Sinh 111 3.0
Sogdian Sogd 42 11.0
Sora Sompeng Sora 35 6.1
Soyombo Soyo 83 10.0
Sundanese Sund 72 5.1
Syloti Nagri Sylo 45 4.1
Syriac Syrc 88 3.0
Tagalog Tglg 20 3.2
Tagbanwa Tagb 18 3.2
Tai Le Tale 35 4.0
Tai Tham Lana 127 5.2
Tai Viet Tavt 72 5.2
Takri Takr 67 6.1
Tamil Taml 123 1.0
Tangut Tang 6,914 9.0
Telugu Telu 98 1.0
Thaana Thaa 50 3.0
Thai Thai 86 1.0
Tibetan Tibt 207 1.0
Tifinagh Tfng 59 4.1
Tirhuta Tirh 82 7.0
Ugaritic Ugar 31 4.0
Vai Vaii 300 5.1
Wancho Wcho 59 12.0
Warang Citi Wara 84 7.0
Yezidi Yezi 47 13.0
Yi Yiii 1,220 3.0
Zanabazar Square Zanb 72 10.0


Charts

Click on a chart to see at full size.









[Last updated : 2020-03-10]



BabelMap Online | Unicode Slide Show | What Unicode Character is This ?