BabelStone : How many Unicode characters are there ?

The short answer is 154,998.

The long answer is it all depends on what you mean by a "Unicode character". The Unicode Standard version 16.0 (released 10 September 2024) defines 154,998 encoded characters with unique identifying names mapped to immutable code points. However, these characters do not always correspond to user-perceived characters, as many user-perceived characters are represented in Unicode as a sequence of two or more encoded characters (and conversely, some single encoded characters may look like two or more distinct characters). For example, lowercase j with caron (ǰ) is represented as a single encoded character (U+01F0 LATIN SMALL LETTER J WITH CARON), but the corresponding uppercase character (J̌) is represented in Unicode as a sequence of two encoded characters (U+004A LATIN CAPITAL LETTER J + U+030C COMBINING CARON). Likewise, some emoji are encoded as single characters (e.g. U+2603 ☃ SNOWMAN), but hundreds of emoji are composed from sequences of encoded characters, in some cases arbitrarily long sequences of simple encoded emoji characters joined together with a Zero Width Joiner format character, and modified by a set of five skin tone modifier characters. Thus, the emoji for a man and a woman of particular skin tones kissing 👩🏼‍❤️‍💋‍👨🏾 which should render as a single glyph (it does on Windows with the Segoe UI Emoji font) comprises a sequence of ten Unicode characters (1F469 1F3FC 200D 2764 FE0F 200D 1F48B 200D 1F468 1F3FE), and the emoji for the flag of Scotland 🏴󠁧󠁢󠁳󠁣󠁴󠁿 comprises a sequence of seven Unicode characters (1F3F4 E0067 E0062 E0073 E0063 E0074 E007F). As of the Unicode 16.0, there are 1,393 single-character emoji and emoji components, but 2,397 emoji sequences that are "recommended for general interchange" (RGI), comprising 1,468 ZWJ sequences, 259 two-letter country/region flag sequences, 3 tag sequences (flags of England, and Scotland, and Wales), 655 skin tone modifier sequences, and 12 keycap sequences. In addition, many thousands of emoji tag sequences representing sub-national flags are possible but are not recommended for general interchange so are not generally supported by fonts.

Because the creation of characters using combining marks or as sequences of encoded characters is open-ended, it is not possible to say how many user-perceived characters can be represented by Unicode. Nevertheless, this page attempts to plot the growth of the Unicode Standard since its initial release in 1991 in the tables and charts below.

Table of Unicode Data over Time

Unicode Version History
Version	Date	Scripts	Blocks	Total Code Points	Total Code Points		Assigned Code Points					Named Characters		Number of Characters Added or Removed
Version	Date	Scripts	Blocks	Total Code Points	Assigned	Unassigned	Named Characters	Control Characters	Private Use Characters	Non characters	Surrogate Code Points	Graphic Characters	Format Characters	Number of Characters Added or Removed
1.0.0	October 1991	24	57	65,536	12,795	52,741	7,129	32	5,632	2	0	7,085	44	+7,129
1.0.1	June 1992	25	59	65,536	34,505	31,031	28,327	32	6,144	2	0	28,283	44	+21,204 -6
1.1	June 1993	24	63	65,536	40,635	24,901	34,168	65	6,400	2	0	34,151	17	+5,963 -89 (-33)
2.0	July 1996	25	67	1,114,112	178,500	935,612	38,885	65	137,468	34	2,048	38,867	18	+11,373 -6,656
2.1	May 1998	25	67	1,114,112	178,502	935,610	38,887	65	137,468	34	2,048	38,869	18	+2
3.0	September 1999	38	86	1,114,112	188,809	925,303	49,194	65	137,468	34	2,048	49,168	26	+10,307
3.1	March 2001	41	95	1,114,112	233,787	880,325	94,140	65	137,468	66	2,048	94,009	131	+44,946
3.2	March 2002	45	107	1,114,112	234,803	879,309	95,156	65	137,468	66	2,048	95,023	133	+1,016
4.0	April 2003	52	122	1,114,112	236,029	878,083	96,382	65	137,468	66	2,048	96,243	139	+1,226
4.1	31 March 2005	59	142	1,114,112	237,302	876,810	97,655	65	137,468	66	2,048	97,515	140	+1,273
5.0	14 July 2006	64	151	1,114,112	238,671	875,441	99,024	65	137,468	66	2,048	98,884	140	+1,369
5.1	4 April 2008	75	168	1,114,112	240,295	873,817	100,648	65	137,468	66	2,048	100,507	141	+1,624
5.2	1 October 2009	90	194	1,114,112	246,943	867,169	107,296	65	137,468	66	2,048	107,154	142	+6,648
6.0	11 October 2010	93	206	1,114,112	249,031	865,081	109,384	65	137,468	66	2,048	109,242	142	+2,088
6.1	31 January 2012	100	217	1,114,112	249,763	864,349	110,116	65	137,468	66	2,048	109,975	141	+732
6.2	26 September 2012	100	217	1,114,112	249,764	864,348	110,117	65	137,468	66	2,048	109,976	141	+1
6.3	30 September 2013	100	217	1,114,112	249,769	864,343	110,122	65	137,468	66	2,048	109,975	147	+5
7.0	16 June 2014	123	249	1,114,112	252,603	861,509	112,956	65	137,468	66	2,048	112,804	152	+2,834
8.0	17 June 2015	129	259	1,114,112	260,319	853,793	120,672	65	137,468	66	2,048	120,520	152	+7,716
9.0	21 June 2016	135	270	1,114,112	267,819	846,293	128,172	65	137,468	66	2,048	128,019	153	+7,500
10.0	20 June 2017	139	277	1,114,112	276,337	837,775	136,690	65	137,468	66	2,048	136,537	153	+8,518
11.0	5 June 2018	146	288	1,114,112	277,021	837,091	137,374	65	137,468	66	2,048	137,220	154	+684
12.0	5 March 2019	150	297	1,114,112	277,575	836,537	137,928	65	137,468	66	2,048	137,765	163	+554
12.1	7 May 2019	150	297	1,114,112	277,576	836,536	137,929	65	137,468	66	2,048	137,766	163	+1
13.0	10 March 2020	154	305	1,114,112	283,506	830,606	143,859	65	137,468	66	2,048	143,696	163	+5,930
14.0	14 September 2021	159	317	1,114,112	284,344	829,768	144,697	65	137,468	66	2,048	144,532	165	+838
15.0	13 September 2022	161	324	1,114,112	288,833	825,279	149,186	65	137,468	66	2,048	149,014	172	+4,489
15.1	12 September 2023	161	325	1,114,112	289,460	824,652	149,813	65	137,468	66	2,048	149,641	172	+627
16.0	10 September 2024	168	335	1,114,112	294,645	819,467	154,998	65	137,468	66	2,048	149,641	172	+5,187

Characters removed in Unicode 1.0.1:

4 Cyrillic
2 Miscellaneous Technical

Characters removed in Unicode 1.1:

7 Greek and Coptic
1 Hebrew
5 Thai
5 Lao
71 Tibetan
33 named control characters (0000..001F and 007F) reclassified as unnamed control characters

Characters removed in Unicode 2.0:

6656 Hangul Syllables, replaced with 11,172 new Hangul Syllables at different code points

Notes

For historic versions of Unicode the statistics are based on the General Category of the characters at the time of encoding, and do not take into account any subsequent changes in General Category. Thus the fact that 4.0 has 139 format characters and 4.1 has 140 format characters is not due to a new format character having been added in 4.1, but rather due to the General Category of U+200B ZERO WIDTH SPACE having been changed from Zs to Cf in Unicode 4.0.1. Note that the statistics for 1.0.0 and 1.0.1 are based upon Ken Whistler's reconstructed Unicode Character Data (named characters with reconstructed general category of Cc, including 0000..001F and 007F, are counted as format characters).

To help understand what we're talking about, here are some definitions of some of the terms used in the table (see Section 2.4 of the Unicode Standard for further information).

Block is a named range of Unicode characters. A block normally includes a set of related characters, either characters from a particular script or a set of symbols. Many scripts and sets of symbols are distributed over two or more blocks. The block count in the above table excludes the three blocks for surrogate code points: High Surrogates (D800..DB7F), High Private Use Surrogates (DB80..DBFF), and Low Surrogates (DC00..DFFF).
Graphic characters are those characters with a General Category other than Cc, Cn, Co, Cs, Cf, Zl and Zp, that is to say ordinary visible characters (including spaces with a non-zero width).
Format characters are those characters with a General Category of 'Cf', 'Zl' or 'Zp'. These are invisible characters defined by Unicode for a particular function. These include things like U+200D ZERO WIDTH JOINER, U+202D LEFT-TO-RIGHT OVERRIDE, interlinear annotation characters (FFF9..FFFB) and the set of Tag characters (E0001 and E0020..E007F). They work behind the scenes to do useful things like bidirectional control and character shaping.
Control characters are those characters with a General Category of 'Cc'. These are invisible characters that perform a certain function that is defined by a protocol or standard other than Unicode (they are inherited from pre-existing 8-bit standards). They include familiar characters such as Tab, Carriage Return and Line Feed that are essential to writing Unicode (U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR never took off as Unicode replacements for Carriage Return and/or Line Feed), as well as many characters that you should never see in plain text (the full range of control characters is 0000..001F and 007F..009F). Control characters are not assigned character names, although they do have character name aliases that refelect their original function.
Private use characters are code points that are assigned as characters for private interchange. The Unicode Standard does not assign any semantics to these characters, and they are not assigned character names. There are three Private Use Areas: E000..F8FF (6,400 characters); F0000..FFFFF (65,534 characters); and 100000..10FFFF (65,534 characters).
Noncharacters are code points that are permanently reserved, and are guaranteed never to be assigned as characters. They are the thirty-four code points ending in (X)XXFE and (X)XXFF, as well as the thirty-two code points in the range FDD0..FDEF.
Surrogate code points are a set of 2,048 code points that are used in the UTF-16 encoding form to extend the Unicode code space beyond 16 bits. All Unicode characters outside the Basic Multilingual Plane (BMP) are encoded in UTF-16 as a pair of surrogate code points (a high surrogate code point in the range D800..DBFF, followed by a low surrogate code point in the range DC00..DFFF). Unpaired surrogate code points are noty valid in UTF-16, and any surrogate code point is invalid in other encoding forms such as UTF-8 and UTF-32.
Named characters = Graphic characters + Format characters. These characters are each assigned a unique and unchangeable character name,
Assigned characters = Named characters + Control characters + Private use characters.
Assigned code points = Assigned characters + Noncharacters + Surrogate code points.
Total code points = Assigned code points + Unassigned (or Reserved) code points.

Scripts

Allocation of Characters by Script for Unicode 15.1
Script Name	ISO 15924 Code	Number of Characters	Version Introduced	Notes
Common	Zyyy	8,306	1.0	Characters that are common to two or more scripts
Inherited	Zinh	657	1.0	Combining characters that inherit the script of the character they are applied to
Adlam	Adlm	88	9.0
Ahom	Ahom	65	8.0
Anatolian Hieroglyphs	Hluw	583	8.0
Arabic	Arab	1,368	1.0
Armenian	Armn	96	1.0
Avestan	Avst	61	5.2
Balinese	Bali	124	5.0
Bamum	Bamu	657	5.2
Bassa Vah	Bass	36	7.0
Batak	Batk	56	6.0
Bengali	Beng	96	1.0
Bhaiksuki	Bhks	97	9.0
Bopomofo	Bopo	77	1.0
Brahmi	Brah	115	6.0
Braille	Brai	256	3.0	First defined as a script in 4.0
Buginese	Bugi	30	4.1
Buhid	Buhd	20	3.2
Canadian Aboriginal	Cans	726	3.0
Carian	Cari	49	5.1
Caucasian Albanian	Aghb	53	7.0
Chakma	Cakm	71	6.1
Cham	Cham	83	5.1
Cherokee	Cher	172	3.0
Chorasmian	Chrs	28	13.0
Coptic	Copt	137	1.0	Disunified from Greek in 4.1
Cuneiform	Xsux	1,234	5.0
Cypriot	Cprt	55	4.0
Cypro Minoan	Cpmn	99	14.0
Cyrillic	Cyrl	506	1.0
Deseret	Dsrt	80	3.1
Devanagari	Deva	164	1.0
Dives Akuru	Diak	72	13.0
Dogra	Dogr	60	11.0
Duployan	Dupl	143	7.0
Egyptian Hieroglyphs	Egyp	1,110	5.2
Elbasan	Elba	40	7.0
Elymaic	Elym	23	12.0
Ethiopic	Ethi	523	3.0
Georgian	Geor	173	1.0
Glagolitic	Glag	134	4.1
Gothic	Goth	27	3.1
Grantha	Gran	85	7.0
Greek	Grek	518	1.0
Gujarati	Gujr	91	1.0
Gunjala Gondi	Gong	63	11.0
Gurmukhi	Guru	80	1.0
Han	Hani	99,030	1.0
Hangul	Hang	11,739	1.0
Hanifi Rohingya	Rohg	50	11.0
Hanunoo	Hano	21	3.2
Hatran	Hatr	26	8.0
Hebrew	Hebr	134	1.0
Hiragana	Hira	381	1.0
Imperial Aramaic	Armi	31	5.2
Inscriptional Pahlavi	Phli	27	5.2
Inscriptional Parthian	Prti	30	5.2
Javanese	Java	90	5.2
Kaithi	Kthi	68	5.2
Kannada	Knda	91	1.0
Katakana	Kana	321	1.0
Kawi	Kawi	86	15.0
Kayah Li	Kali	47	5.1
Kharoshthi	Khar	68	4.1
Khitan Small Script	Kits	471	13.0
Khmer	Khmr	146	3.0
Khojki	Khoj	65	7.0
Khudawadi	Sind	69	7.0
Lao	Laoo	83	1.0
Latin	Latn	1,481	1.0
Lepcha	Lepc	74	5.1
Limbu	Limb	68	4.0
Linear A	Lina	341	7.0
Linear B	Linb	211	4.0
Lisu	Lisu	49	5.2
Lycian	Lyci	29	5.1
Lydian	Lydi	27	5.1
Mahajani	Mahj	39	7.0
Makasar	Maka	25	11.0
Malayalam	Mlym	118	1.0
Mandaic	Mand	29	6.0
Manichaean	Mani	51	7.0
Marchen	Marc	68	9.0
Masaram Gondi	Gonm	75	10.0
Medefaidrin	Medf	91	11.0
Meetei Mayek	Mtei	79	5.2
Mende Kikakui	Mend	213	7.0
Meroitic Cursive	Merc	90	6.1
Meroitic Hieroglyphs	Mero	32	6.1
Miao	Plrd	149	6.1
Modi	Modi	79	7.0
Mongolian	Mong	168	3.0	Unifies Mongolian, Todo, Manchu, and Sibe alphabets
Mro	Mroo	43	7.0
Multani	Mult	38	8.0
Myanmar	Mymr	223	3.0
Nabataean	Nbat	40	7.0
Nag Mundari	Nagm	42	15.0
Nandinagari	Nand	65	12.0
New Tai Lue	Talu	83	4.1
Newa	Newa	97	9.0
N'Ko	Nkoo	62	5.0
Nushu	Nshu	397	10.0
Nyiakeng Puachue Hmong	Hmnp	71	12.0
Ogham	Ogam	29	3.0
Ol Chiki	Olck	48	5.1
Old Hungarian	Hung	108	8.0
Old Italic	Ital	39	3.1
Old North Arabian	Narb	32	7.0
Old Permic	Perm	43	7.0
Old Persian	Xpeo	50	4.1
Old Sogdian	Sogo	40	11.0
Old South Arabian	Sarb	32	5.2
Old Turkic	Orkh	73	5.2
Old Uyghur	Ougr	26	14.0
Oriya	Orya	91	1.0
Osage	Osge	72	9.0
Osmanya	Osma	40	4.0
Pahawh Hmong	Hmng	127	7.0
Palmyrene	Palm	32	7.0
Pau Cin Hau	Pauc	57	7.0
Phags-pa	Phag	56	5.0
Phoenician	Phnx	29	5.0
Psalter Pahlavi	Phlp	29	7.0
Rejang	Rjng	37	5.1
Runic	Runr	86	3.0	Unifies various runic alphabets
Samaritan	Samr	61	5.2
Saurashtra	Saur	82	5.1
Sharada	Shrd	96	6.1
Shavian	Shaw	48	4.0
Siddham	Sidd	92	7.0
SignWriting	Sgnw	672	8.0
Sinhala	Sinh	111	3.0
Sogdian	Sogd	42	11.0
Sora Sompeng	Sora	35	6.1
Soyombo	Soyo	83	10.0
Sundanese	Sund	72	5.1
Syloti Nagri	Sylo	45	4.1
Syriac	Syrc	88	3.0
Tagalog	Tglg	23	3.2
Tagbanwa	Tagb	18	3.2
Tai Le	Tale	35	4.0
Tai Tham	Lana	127	5.2
Tai Viet	Tavt	72	5.2
Takri	Takr	68	6.1
Tamil	Taml	123	1.0
Tangsa	Tnsa	89	14.0
Tangut	Tang	6,914	9.0
Telugu	Telu	100	1.0
Thaana	Thaa	50	3.0
Thai	Thai	86	1.0
Tibetan	Tibt	207	1.0	Removed in 1.1 and reintroduced in 2.0
Tifinagh	Tfng	59	4.1
Tirhuta	Tirh	82	7.0
Toto	Toto	31	14.0
Ugaritic	Ugar	31	4.0
Vai	Vaii	300	5.1
Vithkuqi	Vith	70	14.0
Wancho	Wcho	59	12.0
Warang Citi	Wara	84	7.0
Yezidi	Yezi	47	13.0
Yi	Yiii	1,220	3.0	Liangshan Yi (Nuosu) syllabary
Zanabazar Square	Zanb	72	10.0

Charts

Click on a chart to see at full size.

Unicode Blocks and Scripts

Unicode Characters per Version

Unicode Characters per Plane

Total Unicode Characters in each Version

Unicode Characters added in each Version

Scripts with most Characters in Unicode 15.1

This chart shows all scripts with more than 500 characters

[Last updated : 2023-09-12]

BabelMap Online | Unicode Slide Show | What Unicode Character is This ?