BabelPad (Unicode Text Editor for Windows)
All screenshots of BabelPad on babelstone.co.uk are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA-3.0) by Andrew West.
BabelPad is a free Unicode text editor for Windows that supports the proper rendering of most complex scripts, and allows you to assign different fonts to different scripts in order to facilitate multi-script text editing. It also provides many useful features and special utilities, as described below. BabelPad supports the most recent version of the Unicode Standard, currently Unicode 8.0 (released June 2015).
Summary of Features
- Swap between Edit Mode and Browser Mode :
- Edit Mode allows documents of any size to be edited in plain text format.
- Browser Mode allows the current document to be viewed in an Internet Explorer browser window.
- The user interface menus and other text elements may be displayed in any of the following languages :
- Chinese (simplified)
- Chinese (traditional)
- Multiple instances of BabelPad may be tiled (horizontally, vertically or patchwork), cascaded, minimized, maximized, restored or closed from the "Window" menu of any open BabelPad window.
- Open files encoded as :
- Unicode : UTF-8
- Unicode : UTF-16 (Big Endian or Little Endian)
- Unicode : UTF-32 (Big Endian or Little Endian)
- Unicode : UTF-7
- Unicode : SCSU
- Unicode : CESU-8
- Unicode 1.0 : UCS-2
- Unicode 1.1 : UCS-2
- Unicode 1.1 : UTF-7
- ISO-8859-1 (Latin1) : Western European
- ISO-8859-2 (Latin2) : Non-Cyrillic Central European
- ISO-8859-3 (Latin3) : Esperanto, Galician, Maltese, Turkish
- ISO-8859-4 (Latin4) : Baltic Rim
- ISO-8859-5 (Cyrillic)
- ISO-8859-6 (Arabic)
- ISO-8859-7 (Greek)
- ISO-8859-8 (Hebrew)
- ISO-8859-9 (Latin5) : Improved Turkish
- ISO-8859-10 (Latin6) : Inuit, Lappish
- ISO-8859-11 (Thai)
- ISO-8859-13 (Latin7) : Improved Baltic Rim
- ISO-8859-14 (Latin8) : Celtic
- ISO-8859-15 (Latin9, a.k.a. Latin0) : Improved Western European
- ISO-8859-16 (Latin10) : South-Eastern European
- Windows CP 874 (Thai)
- Windows CP 932 (extension of Shift-JIS) : Japanese
- Windows CP 936 (extension of GB2312) : Simplified Chinese
- Windows CP 949 (Unified Hangul Code) : Korean
- Windows CP 950 (extension of Big5) : Traditional Chinese
- Windows CP 1133 (Lao)
- Windows CP 1250 (East European)
- Windows CP 1251 (Cyrillic)
- Windows CP 1252 (West European)
- Windows CP 1253 (Greek)
- Windows CP 1254 (Turkish)
- Windows CP 1255 (Hebrew)
- Windows CP 1256 (Arabic)
- Windows CP 1257 (Baltic)
- Windows CP 1258 (Vietnamese)
- EUC-JA (Japanese)
- EUC-KR (Korean)
- GB18030 (Extended Chinese) : Unicode-mapped superset of GB2312
- GB2312 (Simplified Chinese)
- Big5 (Traditional Chinese)
- Big5-HKSCS (Big5 plus Hong Kong Supplementary Character Set)
- Shift-JIS (Japanese) (optionally converting DoCoMo/KDDI/SoftBank emoji extensions)
- JIS X 0201 (Latin plus Katakana)
- JIS X 0208 (Japanese)
- KPS 9566-2003 (Korean [DPRK])
- KSC 5601 (KS X 1001) (Korean)
- Wansung (Korean)
- Johab (Korean)
- KOI8-R (Russian)
- KOI8-U (Ukranian)
- ARMSCII-8 (Armenian)
- VISCII (Vietnamese)
- VIQR (Vietnamese Quoted Readable)
- TIS-620 (Thai)
- Mulelao-1 (Lao)
- TSCII (Tamil)
- TAM (Tamil Monolingual)
- TAB (Tamil Bilingual)
- I.S. 434 (Ogham)
- Autodetects Unicode encoding forms and character sets declared in HTML or XML documents.
- Automatically convert CR/LF, CR, LF, Line Separator and Paragraph Separator characters.
- Option to convert Numeric Character References (NCR) and/or Universal Character Names (UCN) to Unicode characters on Open.
- Save the current document as :
- Unicode : UTF-8 (with or without a Byte Order Mark)
- Unicode : UTF-16 Big Endian or Little Endian (with or without a Byte Order Mark)
- Unicode : UTF-32 Big Endian or Little Endian (with or without a Byte Order Mark)
- GB18030 (with or without a Byte Order Mark)
- ASCII with Hexadecimal Numeric Character Reference (NCR) substitution of non Basic Latin characters
- ASCII with Decimal Numeric Character Reference (NCR) substitution of non Basic Latin characters
- ASCII with Universal Character Name (UCN) substitution of non Basic Latin characters
- ASCII with HTML Entity substitution of non Basic Latin characters
- SCSU (Standard Compression Scheme for Unicode) [encoder/decoder code kindly supplied by Doug Ewell]
- Save line breaks as CR/LF, LF, CR, or as Unicode Line Separator [U+2028] or Paragraph Separator characters [U+2029].
- Left-To-Right (LTR) or Right-To-Left (RTL) page layout.
- Line Wrap mode or No Line Wrap mode.
- Drag and Drop editing.
- Multiple Undo/Redo.
- Indent and Unindent selected lines of text using TAB and Shift-TAB.
- Option to Auto-Indent text as you type (useful for writing code).
- Select a "word" by double-clicking and navigate by "word" by means of the left/right arrows (works for most Unicode scripts).
- Select a line of text by left-clicking in the margin (select a paragraph by double-clicking in the margin).
- Find and Replace functions.
- Supports preferred font family and subfamily for font families with more than four font styles.
- Load uninstalled font files from file for use in the current instance of BabelPad only.
- Quick highlight all occurences of a character, word or phrase (or any arbitrary text) by right-clicking on selected text and selecting 'Highlight'.
- Highlight an arbitrary number of characters, words or phrases (or any arbitrary text) in user-specified colours by loading a highlighting definition file.
- Sorting using the Unicode Collation Algorithm (UCA) or the CLDR Collation Algorithm, with custom tailorings for some languages, including Tibetan.
- Manipulate delimited columns of text (reorder, cut, copy, paste and sort columns delimited by tabs or any user-specified character or string).
- Transcode from one list of characters or code points to another list of characters or code points
- Batch replace one list of text strings with another list of text strings
- Select default font and font size from dropdown list on the toolbar.
- Configure individual Unicode blocks to always use a particular font regardless of which font is currently selected for default display.
- Status Bar displays code point and Unicode name of the character at the current caret position.
- For CJK ideographs the status bar also displays the Mandarin, Korean or Vietnamese reading for the character at the current caret position (choice of reading is user-selectable).
- Able to open and edit very large (multi-megabyte) files with little degredation in performance.
- Standard printing functionality enabled.
- Case Conversion (covering all scripts that have upper/lower case distinctions, including Latin, Greek, Cyrillic, Armenian and Deseret) :
- Convert the selected alphabetic text to upper case.
- Convert the selected alphabetic text to lower case.
- Convert the selected alphabetic text to title case.
- Normalization (conforms to the Unicode 8.0 normalization algorithm) :
- Convert the selected text to Normalization Form NFD (cannonical decomposition).
- Convert the selected text to Normalization Form NFC (cannonical composition).
- Convert the selected text to Normalization Form NFKD (cannonical decomposition with compatibility characters replaced).
- Convert the selected text to Normalization Form NFKC (cannonical composition with compatibility characters replaced).
- Options to customize normalization for Hebrew and Tibetan, to avoid suboptimal reordering of characters.
- CJK Conversion :
- Convert the selected Simplified Chinese text to Traditional Chinese.
- Convert the selected Traditional Chinese text to Simplified Chinese.
- Entity Conversion :
- Convert all HTML Entities (e.g. ü) in the selected text to Unicode characters.
- Convert all non-Basic Latin characters in the selected text to HTML Entities or hexadecimal Numeric Character References (NCRs).
- Convert all Numeric Character References (e.g. ü or ü) in the selected text to Unicode characters.
- Convert all non-Basic Latin characters in the selected text to hexadecimal Numeric Character References (NCRs).
- Convert all non-Basic Latin characters in the selected text to decimal Numeric Character References (NCRs).
- Convert all Universal Character Names (e.g. \u00FC) in the selected text to Unicode characters.
- Convert all non-Basic Latin characters in the selected text to Universal Character Names (UCNs).
- Convert all characters in the selected text to their Unicode Names (e.g. LATIN SMALL LETTER U WITH DIAERESIS).
- Convert the selected Unicode character name to its corresponding character
- Convert all characters in the selected text to U+XXXX notation (e.g. U+00FC).
- Convert hexadecimal scalar value in front of the caret to a Unicode character or vice versa by hitting Alt-X (emulates the ALt-X functionality in Micrososft Word).
- Transliteration Conversion :
- Convert the selected CJK text to Mandarin pinyin readings.
- Convert the selected CJK text to Cantonese jyutping readings.
- Convert the selected Extended Wylie Tibetan transliteration to Unicode Tibetan.
- Convert the selected Mongolian transliteration to Unicode Mongolian.
- Convert the selected Manchu transliteration to Unicode Manchu.
- Convert the selected Yi romanisation to Unicode Yi.
- Convert the selected Yi romanisation to International Phonetic Alphabet (IPA).
- Convert the selected Unicode Yi text to Yi romanisation.
- Convert the selected Unicode Yi text to International Phonetic Alphabet (IPA).
- Convert the selected Vietnamese Unicode text to VIQR transliteration.
- Convert the selected VIQR transliteration to Vietnamese Unicode.
- PUA Conversion :
- Convert precomposed Tibetan (SetA) to standard Unicode Tibetan.
- Convert standard Unicode Tibetan to precomposed Tibetan (SetA).
- Convert Hong Kong Supplementary Character Set (HKSCS) PUA characters to CJK Unified Ideograph characters.
- Reordering :
- Reverse the order of all selected characters in a line.
- Utilises Microsoft's Uniscribe rendering engine to correctly render complex text.
- Option to render all Unicode characters as individual spacing glyphs (i.e. with no shaping or ligation of complex text, and combining characters not combined).
- Option to display text in different colours for different Unicode-defined scripts.
- Select any installed Windows Keyboard Layout or IME from a dropdown list on the toolbar.
- Romanised input methods for the following scripts :
- Unicode Input Mode :
- Enter Unicode characters in the range U+0001 through U+10FFFF as scalar hexadecimal values (with or without leading zeros), demarcated by pressing the Space or Return key.
- Select One-off Unicode Input Mode by pressing Ctrl+Q (this allows you to enter a single Unicode character as described above, but on pressing Space, Enter or Escape you are returned to the original keyboard/IME).
Tools and Utilities
- Fonts Overview Utility : lists essential details for all enumerated TrueType and OpenType fonts.
- Font Analysis Utility : lists all Unicode blocks covered by a particular font or lists all fonts that cover a particular Unicode block.
- Font Information Utility : provides information about the currently selected font.
- Font Glyph Export Utility : export any or all glyphs from any font to file in BMP, GIF, JPG or PNG format (optionally specify which characters to export the glyphs for by loading a glyph export definition file).
- Font Coverage Utility : List all fonts that cover a particular character or all the characters in a piece of text or all the characters in the BabelMap edit buffer.
- Advanced Character Search Utility : lists all characters that meet specified criteria.
- UCD Data Utility : generates UCD-format data for a given range of characters for any version of Unicode.
- Character History Utility : enumerates the UCD properties for a given character for all versions of Unicode, including mappings to Unicode 1.0.0 and 1.0.1 where appropriate.
- Han Radical Lookup Utility : lists all Han ideographs with a given radical and number of strokes (covers all 74,616 characters in the CJK, CJK-A, CJK-B, CJK-C and CJK-D blocks).
- Mandarin Pinyin Lookup Utility : lists all Han ideographs with a given Mandarin pinyin pronunciation.
- Cantonese Jyutping Lookup Utility : lists all Han ideographs with a given Cantonese jyutping pronunciation.
- Yi Radical Lookup Utility : lists all Yi syllables with a given radical and number of strokes.
- Unicode Version History Utility : provides a summary of the repertoire of each version of Unicode from 1.0 onwards.
- Document Analysis Utility : provides statistical information about the current document, and highlights any invalid characters.
- Character Frequency Utility : lists all the characters in the document by frequency.
- Undefined Glyphs Utility : lists all characters in the document which are rendered with an undefined glyph using the currently selected font (not available when the composite font is selected).
BabelPad Version 18.104.22.168 (supports Unicode 8.0) [2016-01-14]
See the BabelStone forum for details of bug fixes, enhancements and new features in the latest version.
BabelPad is distributed as a single executable file (no installer). Simply download the zipped file, and then unzip the file BabelPad.exe to the desired location on your computer. A help file is available, but is currently out of date.
- BabelPad.zip (32-bit version for Windows 2000, XP, Vista, 7, 8, 8.1 and 10) [3.41 MB]
BabelPad is free and fully functional for personal or commercial use, but you are welcome to make a small donation via PayPal to help support its continued development if you wish ($5 or equivalent suggested).
A portable version of BabelPad is also available for free download from PortableApps.com.
The latest version of BabelPad runs on Windows 2000* or later, but an unsupported old version of BabelPad that runs on earlier versions of Windows is available here:
* BabelPad requires GDI+ (gdiplus.dll), which may not be installed on systems running Windows 2000; if this is the case you may download it directly from Microsoft (here), and copy the file gdiplus.dll to the same location that BabelPad.exe is run from.
Feature requests, bug reports and general questions or comments about BabelPad or BabelMap may be made at the BabelStone forum or directly to me by email.
BabelPad Limitations and Bugs
- Horizontal scroll is fixed width, and so some extremely long lines may be truncated when not in Line Wrap mode.
- When in Line Wrap mode, it is not possible to scroll into view the trailing part of a line that is so long that it does not completely fit onto the screen.
- The Unicode Bidirectional algorithm has not yet been implemented, and so complex bidirectional text may not be displayed as expected. However, simple bidirectional text (e.g. a Hebrew phrase embedded in English text) should display correctly.
- Line breaking behaviour does not conform to the Line Breaking Properties specified by Unicode.
- There is an annoying bug that I have been unable to track down yet which means that occasionally you cannot click correctly on the required position in the text (you try to click on a particular character but the caret ends up somewhere else). You can get BabelPad back to normal by simply hitting the F5 key, which will redraw the screen and reset the caret position.
- When making global changes to a huge (multi-megabyte) document, first disable Undo/Redo (Options : Edit Options from the menu). This will greatly improve the speed of Replace operations.
- If you want to convert a large file with a high proportion of characters above U+007F to UCN (\uABCD) format, NCR (ꯍ or Ӓ) format or HTML entity (&entity;) format, then Save As and select "ASCII plus ..." from the encoding dropdown list (and then reopen the file if necessary). This takes a fraction of the time compared with selecting the entire document and using the appropriate function from the Convert menu.
- If you want to convert a large file with a high proportion of UCN (\uABCD) or NCR (ꯍ or Ӓ) entities to Unicode characters, then check the Convert NCRs and/or the Convert UCNs checkbox when opening the file. This takes a fraction of the time compared with selecting the entire document and using the appropriate function from the Convert menu after the file has been opened.
- To enter a single Unicode character by hexadecimal code point value, press Ctrl-Q and enter the code point value followed by Enter or Space.
- To convert a Unicode hexadecimal code point value to its corresponding Unicode character place the caret after the last hexadecimal digit and press Alt-X.
- To convert a single Unicode character to its hexadecimal code point value place the caret after the character and press Alt-X.
- To enter a sequence of Unicode characters by hexadecimal code point values, select the Unicode Input Mode (Input : Unicode from the menu or "U+" from the Input toolbar), and type in the code point values separated by spaces. Press Ctrl-D to return to the default input mode.
- When entering Tibetan, Mongolian, Manchu or Yi text using BabelPad's custom input methods for these scripts or entering Unicode text as scalar values using BabelPad's Unicode input method, you may access the keyboard normally by using the AltGr key (or Ctrl + Alt if your keyboard does not have an AltGr key). For example, when using BabelPad's Tibetan Input Method, pressing the numeral keys will enter Tibetan digits, but holding down the AltGr key at the same time as pressing the numeral keys will allow Arabic digits to be entered instead.
- If you are working with GB2312-encoded documents open and save your files as GB18030 (GB18030 is a superset of GB2312 that has a one-to-one mapping to Unicode).
- If you are using a composite font, some characters may be clipped at their top and/or bottom (this is because the line height for the composite font is based on the font mapped to the Basic Latin block, but fonts mapped to other blocks may have significantly higher line heights, especially for scripts such as Tibetan that have superscript and subscript components). To avoid this clipping simply increase the line-spacing for the document (drop down list on the main toolbar, or Ctrl+Shift+MouseWheel).
BabelPad uses Microsoft's Unicode Script processor, Uniscribe (filename usp10.dll), to format and render Unicode text. The more recent a version of Uniscribe you have installed on the computer the better support you will have for complex scripts such as Indic and south-east Asian scripts, Tibetan and Mongolian. The version of Uniscribe that BabelPad is using is indicated in the About BabelPad... dialogue box.
Uniscribe is constantly being updated to support new scripts and to add new functionality to existing script support, so it is important that you have the latest possible version of Uniscribe installed on your PC. Even if you do not use complex scripts, you will only get advanced features for Latin script such as ligatures with a recent version of Uniscribe (to see this try entering <s ZWJ t> with Code2000). You may run BabelPad with a particular version of Uniscribe by simply placing a copy of the Uniscribe file (usp10.dll) in the same directory that BabelPad.exe is located.
Uniscribe comes pre-installed on Windows 2000 and later, and should also have been installed if you are running Internet Explorer Version 5 or above on other Windows operating systems (i.e. Windows 95, 98, ME). If when you attempt to run BabelPad, a dialog box entitled "Unable to Locate DLL" with the message "The dynamic link library USP10.dll could not be found in the specified path" appears, this means that Uniscribe is not installed on your PC.
Some versions of Uniscribe may have bugs that may produce unexpected rendering behaviour, or even cause BabelPad to crash. Those that I know of are outlined below :
- Versions of Uniscribe greater than Version 1.405.2416.1 may only work correctly when running under the Windows XP operating system. The following unexpected display behaviour may be observed in BabelPad's character map utility when running under Windows 95/98/ME or Windows 2000 with a version of Uniscribe that is greater than Version 1.405.2416.1 :
- Strange characters displayed in the range U+0000 through U+001F of the Basic Latin block
- Strange characters displayed in the range U+0080 through U+009F of the Latin-1 Supplement block
- Only digits displayed for Indic scripts (including Tibetan and Mongolian) if the selected font does not have OpenType tables defined for the particular script
- Version 1.453.3665.0 has a bug that may cause any application that relies on it to crash if an attempt is made to display any character in the Lao script range.
- Version 1.460.3707.0 has a bug that may cause any application that relies on it to crash if an attempt is made to display a sequence of 16 or more consecutive Tibetan letters without a break (i.e. a space, tsheg or shad).
- Some early versions of Uniscribe may crash when an attempt is made to display Arabic text with the "Arial Unicode MS" font.
- Early versions of Uniscribe have a bug that causes it to return to BabelPad the wrong character position and screen point of Unicode characters outside of the Basic Multilingual Plane when in Right-To-Left (RTL) mode. This means that you may be unable to click on or select text composed of characters from Unicode Planes 1-16 when in RTL layout mode.
- Version 1.626.7600.16385 that ships with Windows 7 causes any characters in the Supplementary Multilingual Plane (Plane 1) that are nor defined in Unicode 5.1 to be rendered as two square boxes (this affects Unicode 5.2 additions such as Avestan, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Kaithi, Old South Arabian, Old Turkic, Enclosed Alphanumeric Supplement, Enclosed Ideographic Supplement and Rumi Numeral Symbols). This is fixed in Windows 7 SP1.
- Version 1.0626.7601.17514 that ships with Windows 7 SP1 displays characters in the Variation Selectors Supplement block (U+E0100.E01EF) as two undefined glyphs, and does not display characters in the Tags block (E0000..E007F) at all. These characters may be displayed correctly in BabelPad by disabling Uniscribe (Ctrl+0 [zero]); however complex scripts will not render correctly in this mode.
BabelStone Home Page