BabelStone : How many Unicode characters are there ?



The short answer is 149,813.

The long answer is it all depends on what you mean by a "Unicode character". The Unicode Standard version 15.1 (released 12 September 2023) defines 149,813 encoded characters with unique identifying names mapped to immutable code points. However, these characters do not always correspond to user-perceived characters, as many user-perceived characters are represented in Unicode as a sequence of two or more encoded characters (and conversely, some single encoded characters may look like two or more distinct characters). For example, lowercase j with caron (Į°) is represented as a single encoded character (U+01F0 LATIN SMALL LETTER J WITH CARON), but the corresponding uppercase character (JĖŒ) is represented in Unicode as a sequence of two encoded characters (U+004A LATIN CAPITAL LETTER J + U+030C COMBINING CARON). Likewise, some emoji are encoded as single characters (e.g. U+2603 ☃ SNOWMAN), but hundreds of emoji are composed from sequences of encoded characters, in some cases arbitrarily long sequences of simple encoded emoji characters joined together with a Zero Width Joiner format character, and modified by a set of five skin tone modifier characters. Thus, the emoji for a man and a woman of particular skin tones kissing 👩đŸŧ‍❤ī¸â€đŸ’‹â€đŸ‘¨đŸž which should render as a single glyph (it does on Windows with the Segoe UI Emoji font) comprises a sequence of ten Unicode characters (1F469 1F3FC 200D 2764 FE0F 200D 1F48B 200D 1F468 1F3FE), and the emoji for the flag of Scotland 🏴ķ §ķ ĸķ ŗķ Ŗķ ´ķ ŋ comprises a sequence of seven Unicode characters (1F3F4 E0067 E0062 E0073 E0063 E0074 E007F). As of the Unicode 15.1, there are 1,386 single-character emoji and emoji components, but 2,396 emoji sequences that are "recommended for general interchange" (RGI), comprising 1,468 ZWJ sequences, 258 two-letter country/region flag sequences, 3 tag sequences (flags of England, and Scotland, and Wales), 655 skin tone modifier sequences, and 12 keycap sequences. In addition, many thousands of emoji tag sequences representing sub-national flags are possible but are not recommended for general interchange so are not generally supported by fonts.

Because the creation of characters using combining marks or as sequences of encoded characters is open-ended, it is not possible to say how many user-perceived characters can be represented by Unicode. Nevertheless, this page attempts to plot the growth of the Unicode Standard since its initial release in 1991 in the tables and charts below.


Table of Unicode Data over Time


Unicode Version History
Version Date Scripts Blocks Total
Code
Points
Total Code Points Assigned Code Points Named Characters Number of
Characters
Added or
Removed
Assigned Unassigned Named
Characters
Control
Characters
Private
Use
Characters
Non
characters
Surrogate
Code
Points
Graphic
Characters
Format
Characters
1.0.0 October
1991
24 57 65,536 12,795 52,741 7,129 32 5,632 2 0 7,085 44 +7,129
1.0.1 June
1992
25 59 65,536 34,505 31,031 28,327 32 6,144 2 0 28,283 44 +21,204
-6
1.1 June
1993
24 63 65,536 40,635 24,901 34,168 65 6,400 2 0 34,151 17 +5,963
-89
(-33)
2.0 July
1996
25 67 1,114,112 178,500 935,612 38,885 65 137,468 34 2,048 38,867 18 +11,373
-6,656
2.1 May
1998
25 67 1,114,112 178,502 935,610 38,887 65 137,468 34 2,048 38,869 18 +2
3.0 September
1999
38 86 1,114,112 188,809 925,303 49,194 65 137,468 34 2,048 49,168 26 +10,307
3.1 March
2001
41 95 1,114,112 233,787 880,325 94,140 65 137,468 66 2,048 94,009 131 +44,946
3.2 March
2002
45 107 1,114,112 234,803 879,309 95,156 65 137,468 66 2,048 95,023 133 +1,016
4.0 April
2003
52 122 1,114,112 236,029 878,083 96,382 65 137,468 66 2,048 96,243 139 +1,226
4.1 31 March
2005
59 142 1,114,112 237,302 876,810 97,655 65 137,468 66 2,048 97,515 140 +1,273
5.0 14 July
2006
64 151 1,114,112 238,671 875,441 99,024 65 137,468 66 2,048 98,884 140 +1,369
5.1 4 April
2008
75 168 1,114,112 240,295 873,817 100,648 65 137,468 66 2,048 100,507 141 +1,624
5.2 1 October
2009
90 194 1,114,112 246,943 867,169 107,296 65 137,468 66 2,048 107,154 142 +6,648
6.0 11 October
2010
93 206 1,114,112 249,031 865,081 109,384 65 137,468 66 2,048 109,242 142 +2,088
6.1 31 January
2012
100 217 1,114,112 249,763 864,349 110,116 65 137,468 66 2,048 109,975 141 +732
6.2 26 September
2012
100 217 1,114,112 249,764 864,348 110,117 65 137,468 66 2,048 109,976 141 +1
6.3 30 September
2013
100 217 1,114,112 249,769 864,343 110,122 65 137,468 66 2,048 109,975 147 +5
7.0 16 June
2014
123 249 1,114,112 252,603 861,509 112,956 65 137,468 66 2,048 112,804 152 +2,834
8.0 17 June
2015
129 259 1,114,112 260,319 853,793 120,672 65 137,468 66 2,048 120,520 152 +7,716
9.0 21 June
2016
135 270 1,114,112 267,819 846,293 128,172 65 137,468 66 2,048 128,019 153 +7,500
10.0 20 June
2017
139 277 1,114,112 276,337 837,775 136,690 65 137,468 66 2,048 136,537 153 +8,518
11.0 5 June
2018
146 288 1,114,112 277,021 837,091 137,374 65 137,468 66 2,048 137,220 154 +684
12.0 5 March
2019
150 297 1,114,112 277,575 836,537 137,928 65 137,468 66 2,048 137,765 163 +554
12.1 7 May
2019
150 297 1,114,112 277,576 836,536 137,929 65 137,468 66 2,048 137,766 163 +1
13.0 10 March
2020
154 305 1,114,112 283,506 830,606 143,859 65 137,468 66 2,048 143,696 163 +5,930
14.0 14 September
2021
159 317 1,114,112 284,344 829,768 144,697 65 137,468 66 2,048 144,532 165 +838
15.0 13 September
2022
161 324 1,114,112 288,833 825,279 149,186 65 137,468 66 2,048 149,014 172 +4,489
15.1 12 September
2022
161 325 1,114,112 289,460 824,652 149,813 65 137,468 66 2,048 149,641 172 +627

Characters removed in Unicode 1.0.1:


Characters removed in Unicode 1.1:


Characters removed in Unicode 2.0:


Notes

For historic versions of Unicode the statistics are based on the General Category of the characters at the time of encoding, and do not take into account any subsequent changes in General Category. Thus the fact that 4.0 has 139 format characters and 4.1 has 140 format characters is not due to a new format character having been added in 4.1, but rather due to the General Category of U+200B ZERO WIDTH SPACE having been changed from Zs to Cf in Unicode 4.0.1. Note that the statistics for 1.0.0 and 1.0.1 are based upon Ken Whistler's reconstructed Unicode Character Data (named characters with reconstructed general category of Cc, including 0000..001F and 007F, are counted as format characters).

To help understand what we're talking about, here are some definitions of some of the terms used in the table (see Section 2.4 of the Unicode Standard for further information).



Scripts

Allocation of Characters by Script for Unicode 15.0
Script Name ISO 15924 Code Number of
Characters
Version
Introduced
Notes
Common Zyyy 8,301 1.0 Characters that are common to two or more scripts
Inherited Zinh 657 1.0 Combining characters that inherit the script of the character they are applied to
Adlam Adlm 88 9.0
Ahom Ahom 65 8.0
Anatolian Hieroglyphs Hluw 583 8.0
Arabic Arab 1,368 1.0
Armenian Armn 96 1.0
Avestan Avst 61 5.2
Balinese Bali 124 5.0
Bamum Bamu 657 5.2
Bassa Vah Bass 36 7.0
Batak Batk 56 6.0
Bengali Beng 96 1.0
Bhaiksuki Bhks 97 9.0
Bopomofo Bopo 77 1.0
Brahmi Brah 115 6.0
Braille Brai 256 3.0 First defined as a script in 4.0
Buginese Bugi 30 4.1
Buhid Buhd 20 3.2
Canadian Aboriginal Cans 726 3.0
Carian Cari 49 5.1
Caucasian Albanian Aghb 53 7.0
Chakma Cakm 71 6.1
Cham Cham 83 5.1
Cherokee Cher 172 3.0
Chorasmian Chrs 28 13.0
Coptic Copt 137 1.0 Disunified from Greek in 4.1
Cuneiform Xsux 1,234 5.0
Cypriot Cprt 55 4.0
Cypro Minoan Cpmn 99 14.0
Cyrillic Cyrl 506 1.0
Deseret Dsrt 80 3.1
Devanagari Deva 164 1.0
Dives Akuru Diak 72 13.0
Dogra Dogr 60 11.0
Duployan Dupl 143 7.0
Egyptian Hieroglyphs Egyp 1,110 5.2
Elbasan Elba 40 7.0
Elymaic Elym 23 12.0
Ethiopic Ethi 523 3.0
Georgian Geor 173 1.0
Glagolitic Glag 134 4.1
Gothic Goth 27 3.1
Grantha Gran 85 7.0
Greek Grek 518 1.0
Gujarati Gujr 91 1.0
Gunjala Gondi Gong 63 11.0
Gurmukhi Guru 80 1.0
Han Hani 98,408 1.0
Hangul Hang 11,739 1.0
Hanifi Rohingya Rohg 50 11.0
Hanunoo Hano 21 3.2
Hatran Hatr 26 8.0
Hebrew Hebr 134 1.0
Hiragana Hira 381 1.0
Imperial Aramaic Armi 31 5.2
Inscriptional Pahlavi Phli 27 5.2
Inscriptional Parthian Prti 30 5.2
Javanese Java 90 5.2
Kaithi Kthi 68 5.2
Kannada Knda 91 1.0
Katakana Kana 321 1.0
Kawi Kawi 86 15.0
Kayah Li Kali 47 5.1
Kharoshthi Khar 68 4.1
Khitan Small Script Kits 471 13.0
Khmer Khmr 146 3.0
Khojki Khoj 65 7.0
Khudawadi Sind 69 7.0
Lao Laoo 83 1.0
Latin Latn 1,481 1.0
Lepcha Lepc 74 5.1
Limbu Limb 68 4.0
Linear A Lina 341 7.0
Linear B Linb 211 4.0
Lisu Lisu 49 5.2
Lycian Lyci 29 5.1
Lydian Lydi 27 5.1
Mahajani Mahj 39 7.0
Makasar Maka 25 11.0
Malayalam Mlym 118 1.0
Mandaic Mand 29 6.0
Manichaean Mani 51 7.0
Marchen Marc 68 9.0
Masaram Gondi Gonm 75 10.0
Medefaidrin Medf 91 11.0
Meetei Mayek Mtei 79 5.2
Mende Kikakui Mend 213 7.0
Meroitic Cursive Merc 90 6.1
Meroitic Hieroglyphs Mero 32 6.1
Miao Plrd 149 6.1
Modi Modi 79 7.0
Mongolian Mong 168 3.0 Unifies Mongolian, Todo, Manchu, and Sibe alphabets
Mro Mroo 43 7.0
Multani Mult 38 8.0
Myanmar Mymr 223 3.0
Nabataean Nbat 40 7.0
Nag Mundari Nagm 42 15.0
Nandinagari Nand 65 12.0
New Tai Lue Talu 83 4.1
Newa Newa 97 9.0
N'Ko Nkoo 62 5.0
Nushu Nshu 397 10.0
Nyiakeng Puachue Hmong Hmnp 71 12.0
Ogham Ogam 29 3.0
Ol Chiki Olck 48 5.1
Old Hungarian Hung 108 8.0
Old Italic Ital 39 3.1
Old North Arabian Narb 32 7.0
Old Permic Perm 43 7.0
Old Persian Xpeo 50 4.1
Old Sogdian Sogo 40 11.0
Old South Arabian Sarb 32 5.2
Old Turkic Orkh 73 5.2
Old Uyghur Ougr 26 14.0
Oriya Orya 91 1.0
Osage Osge 72 9.0
Osmanya Osma 40 4.0
Pahawh Hmong Hmng 127 7.0
Palmyrene Palm 32 7.0
Pau Cin Hau Pauc 57 7.0
Phags-pa Phag 56 5.0
Phoenician Phnx 29 5.0
Psalter Pahlavi Phlp 29 7.0
Rejang Rjng 37 5.1
Runic Runr 86 3.0 Unifies various runic alphabets
Samaritan Samr 61 5.2
Saurashtra Saur 82 5.1
Sharada Shrd 96 6.1
Shavian Shaw 48 4.0
Siddham Sidd 92 7.0
SignWriting Sgnw 672 8.0
Sinhala Sinh 111 3.0
Sogdian Sogd 42 11.0
Sora Sompeng Sora 35 6.1
Soyombo Soyo 83 10.0
Sundanese Sund 72 5.1
Syloti Nagri Sylo 45 4.1
Syriac Syrc 88 3.0
Tagalog Tglg 23 3.2
Tagbanwa Tagb 18 3.2
Tai Le Tale 35 4.0
Tai Tham Lana 127 5.2
Tai Viet Tavt 72 5.2
Takri Takr 68 6.1
Tamil Taml 123 1.0
Tangsa Tnsa 89 14.0
Tangut Tang 6,914 9.0
Telugu Telu 100 1.0
Thaana Thaa 50 3.0
Thai Thai 86 1.0
Tibetan Tibt 207 1.0 Removed in 1.1 and reintroduced in 2.0
Tifinagh Tfng 59 4.1
Tirhuta Tirh 82 7.0
Toto Toto 31 14.0
Ugaritic Ugar 31 4.0
Vai Vaii 300 5.1
Vithkuqi Vith 70 14.0
Wancho Wcho 59 12.0
Warang Citi Wara 84 7.0
Yezidi Yezi 47 13.0
Yi Yiii 1,220 3.0 Liangshan Yi (Nuosu) syllabary
Zanabazar Square Zanb 72 10.0


Charts

Click on a chart to see at full size.


Unicode Blocks and Scripts


Unicode Characters per Version


Unicode Characters per Plane


Total Unicode Characters in each Version


Unicode Characters added in each Version


Scripts with most Characters in Unicode 15.0

This chart shows all scripts with more than 500 characters



[Last updated : 2023-09-12]



BabelMap Online | Unicode Slide Show | What Unicode Character is This ?