Tangut Character Database
The "Tangut Character Database" is a database of Tangut character data that I have been compiling as part of the project to encode the Tangut script in Unicode (see Documents relating to the encoding of the Tangut, Jurchen and Khitan scripts). The database is very much a work in progress, and is subject to frequent and updates and corrections (I will usually announce major updates on twitter). The format is also designed for my personal convenience, and will evolve over time. At present the database comprises three separate Excel 2007 format spreadsheets, TangutMappingData (character catalogue numbers, mnemonic codes and radical sorting data from various secondary sources), TangutPhoneticData (phonetic data and phonetic reconstructions), and TangutDefinitionData (character and word definitions from primary and secondary sources). The latest version of the database files are available for download in Excel 2007 format or plain text format:
All data that I have created from scratch or extracted from primary Tangut sources is in the public domain, and may be used without permission or attribution, but character reference numbers and mnemonic codes from modern secondary sources may be under copyright of the person or persons who created them.
Fonts
This database utilises the following fonts, which are available for download unless restricted by licence:
- N3297 : covers the characters at 17000 through 18715 proposed in N3297, remapped to the PUA at F7000 through F8715 (font is not redistributable)
- N3797 : covers the characters at 17000 through 187A6 proposed in N3797, remapped to the PUA at F7000 through F87A6 (font is not redistributable)
- N4033 : covers the characters at 17000 through 187BF proposed in N4033, as well as PUA-mapped Tangut components used in the IVS sequences (font is not redistributable)
- Mojikyo M202 and Mojikyo M203 fonts : covers the Mojikyo Tangut character set, mapped to CJK characters according to Shift-JIS encoding (available for download from the Mojikyo website)
- Mojikyo Kychanov : font derived from the Mojikyo M202 and M203 fonts that covers the 5,803 characters in Kychanov and Arakawa's 2006 Tangut-Russian-English-Chinese Dictionary, remapped in Kychanov order to the PUA at E000 through F6AA (font is not redistributable)
- LFW1986 : scan font covering 5,812 characters given in the calligraphic facsimile reproduction of the Tong Yin text in Li Fanwen's 1986 Study of the Hompohones, mapped to the PUA at E000 through F6B3 (font is in the public domain)
- LFW1986X : scan font covering the 5,817 characters in the radical/stroke index to Li Fanwen's 1986 Study of the Homophones, mapped to the PUA at E000 through F6B8 (font is in the public domain)
- LFW1997 : scan font covering the 6,001 characters in Li Fanwen's 1997 Tangut-Chinese Dictionary, mapped to the PUA at E000 through F770 (font is in the public domain)
- HXM2004 : scan font covering the 6,066 characters in Han Xiaomang's 2004 dissertation, mapped to the PUA at E000 through F7B1 (font is in the public domain)
- KYC2006 : scan font covering the 5,803 characters in Kychanov and Arakawa's 2006 Tangut-Russian-English-Chinese Dictionary, mapped to the PUA at E000 through F6AA (font is in the public domain)
- Nishida1966 : scan font covering the 3,548 characters in Nishida Tatsuo's 1966 Small Dictionary of Tangut, mapped to the PUA at E000 through EDDB (font is in the public domain)
- WHYJ : scan font covering the 3,064 head characters in the calligraphic facsimile reproduction of the Wen Hai text in Shi Bojin et al.'s 1983 Study of the Sea of Characters, mapped to the PUA at E000 through EBF7 (font is in the public domain)
- WHYJIndex : scan font covering the 4,935 characters in the radical index in Shi Bojin et al.'s 1983 Study of the Sea of Characters, mapped to the PUA at E000 through F346 (font is in the public domain)
- TangutRadicals : scan font covering radicals from various sources, mapped to the PUA (font is in the public domain):
- E100..E296 = 407 radicals from Li Fanwen's 1997 Tangut-Chinese Dictionary
- E300..E478 = 377 radicals from Kychanov and Arakawa's 2006 Tangut-Russian-English-Chinese Dictionary
- E480..E4EA = 107 radicals from Grinstead's 1972 Analysis of the Tangut Script
- E500..E6BB = 444 radicals from Shi Jinbo et al.'s 1983 Study of the Sea of Characters
- E700..E892 = 403 radicals from Sofronov's 1968 Grammar of the Tangut Language
- E900..EA49 = 330 radicals from Nishida's 1966 Little Dictionary of Tangut
- EB00..EF87 = 1,160 radicals from Keping et al.'s 1969 Sea of Characters
- F000..F0B4 = 181 radicals from Kolokolov and Kychanov's 1966 Chinese Classics in Tangut Translation
- F100..F297 = 408 radicals from Nevskij's 1960 Tangut Philology
- F300..F4C6 = 455 radicals from Nakajima et al's 2000 Research into the Computer Processing of the 'Precious Rhymes of the Sea of Characters'
- F500..F6D9 = 474 radicals from Han Xiaomang's 2004 Research into the Correct Forms of Tangut Characters
- F700..F89C = 413 radicals from Li Fanwen's 1986 Study of the Homophones
Description of TangutMappingData
Unicode Data (cols. A..M)
- A. Row : the sequential number of the row in the spreadsheet (for use in resorting the table back to its original order after it has been sorted)
- B. UCode : the Unicode code point proposed in N4033 (these code points will change and should not be relied upon)
- C. Line : "0" for the primary entry for each proposed Unicode character, and "1" or greater for secondary rows (sort by Line and Row to get a list of the proposed Unicode characters)
- D. UGlyph : Glyph given in N4083 {N4033.ttf}
- E. IDS : Primary IDS sequence for the character (i.e. the IDS sequence of the UGlyph) {N4033.ttf}
- F. IDS for Unified Glyph Variants : Secondary IDS sequences representing the glyph form of unified glyph variants (multiple IDS sequences separated by a pipe character) {N4033.ttf}
- G. Radical : Radical number according to the proposed ordering principles (see N3797)
- H. RGlyph : Radical glyph {N4033.ttf}
- I. Strokes : Total number of strokes (based on the UGlyph)
- J. Sort Key : Alphabetic sort key (based on the UGlyph), where each letter represents a particular stroke type (see N3797)
- K. Variant of : The prefered character if this is a disunified character variant
- L. Disunified Variants : Disunified character varianst of this character (if any)
- M. Changes from N4033 : Significant changes or corrections compared with N4033
N3297 Mappings (cols. N..O)
- N. N3297 : Proposed code point in N3297
- O. Glyph : Proposed glyph in N3297 {N3297.ttf}
N3797 Mappings (cols. P..Q)
- P. N3797 : Proposed code point in N3797
- Q. Glyph : Proposed glyph in N3797 {N3797.ttf}
Mojikyo Mappings (cols. R..U)
- R. Mojikyo : Mojikyo number (one-to-one mapping to LFW1997 numbers)
- S. Font : Name of font that contains this character
- T. Code : Code point of character in the Mojikyo font
- U. Glyph : Mojikyo glyph {Mojikyo M202/M203.ttf}
LFW1997 Data (cols. V..AA)
- V. LFW1997 : LFW1997 character number
- W. Glyph : Scan glyph (from the 4-corner index) {LFW1997.ttf}
- X. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- Y. RGlyph : Radical glyph {TangutRadicals.ttf}
- Z. RSort : Sort key based on the radical stroke order (where a character occurs twice in the radical index, the sort key is based on the most appropriate entry, and the alternative sort key is given in the Notes column)
- AA. Notes : Explanatory notes on individual characters
LFW2008 Data (cols. AB..AL)
- AB. LFW2008 : LFW2008 character number
- AC. Glyph : Glyph {N4033.ttf}
- AD. XRef : Cross reference to the main entry if this is a character variant
- AE. 4 Corner : 4-corner index number for the character
- AF. Page : Page number that the entry starts on
- AG. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- AH. RGlyph : Radical glyph [TBD]
- AI. AddStrokes : Additional stroke count
- AJ. RSort : Sort key based on the radical stroke order (where a character occurs twice in the radical index, the sort key is based on the most appropriate entry, and the alternative sort key is given in the Notes column)
- AK. Notes : Explanatory notes on individual characters
- AL. Definition : English definition for the character, as given in the English Index
KYC2006 Data (cols. AM..AT)
- AM. KYC2006 : KYC2006 character number
- AN. MGlyph : Glyph derived from the Mojikyo M202 and M203 fonts {MojikyoKychanov.ttf}
- AO. Glyph : Scan glyph (from the radical index) {KYC2006.ttf}
- AP. XRef : Cross reference to the main entry if this is a character variant
- AQ. Radical : Radical number
- AR RGlyph : Radical glyph {TangutRadicals.ttf}
- AS. RSort : Sort key based on the position of the character in the radical index
- AT. Notes : Explanatory notes on individual characaters
HXM2004 Data (cols. AU..BA)
- AU. HXM2004-1 : Character type number (last column of HXM2004)
- AV. HXM2004-2 : Character variant number (first column of HXM2004)
- AW. Variant : An asterisk indicates that this is considered a character variant by HXM
- AX. Glyph : Scan glyph (from the first column of HXM2004) {HXM2004.ttf}
- AY. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- AZ. RGlyph : Radical glyph {TangutRadicals.ttf}
- BA. RSort : Sort key based on the position of the character in the radical index
NIS1966 Data (cols. BB..BI)
- BB. NIS1966 : WHYJ1983 character number
- BC. Glyph : Scan glyph {Nishida1966.ttf}
- BD. Code : Alphanumeric code indicating the layout of character elements (added where missing, and silently corrected if obviously wrong)
- BE. Sequence : Sequence of elements that make up the character (exactly as given, with no corrections, even where obviously wrong)
- BF. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- BG. RGlyph : Radical glyph {TangutRadicals.ttf}
- BH. RSort : Sort key based on the position of the character in the radical index
- BI. Notes : Explanatory notes on individual characaters
WHYJ1983 Data (cols. BJ..BP)
- BJ. WHYJ1983 : WHYJ1983 character number
- BK. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- BL. RGlyph : Radical glyph {TangutRadicals.ttf}
- BM. RSort : Sort key based on the position of the character in the radical index
- BN. IndexGlyph : Scan glyph (from the radical index) {WHYJIndex.ttx}
- BO. HeadGlyph : Scan glyph (from the calligraphic facsimile of the main text) {WHYJ.ttf}
- BP. Page/Pos : Page and position number of the head character in the main text (prefixed "P" for the Level Tone section, and "Z" for the Mixed section)
TYYJ1986 Data (cols. BQ..BX)
- BQ. TYYJ1986 : TYYJ1986 character number
- BR. TextGlyph : scan glyph from the calligraphic facsimile of the text (NB there are many glyph errors in the calligraphic facsimile) {LFW1986.ttf}
- BS. IndexGlyph : scan glyph from the radical index {LFW1986x.ttf}
- BT. Page/Pos : Page and position number of the head character in the main text
- BU. Radical : Radical number (sequential number based on the order of the radicals in the radical index)
- BV. RGlyph : Radical glyph {TangutRadicals.ttf}
- BW. RSort : Sort key based on the radical stroke order (where a character occurs twice in the radical index, the sort key is based on the most appropriate entry, and the alternative sort key is given in the Notes column)
- BX. Notes : Explanatory notes on individual characters
Mnemonic Codes (cols. BY..BZ)
- BY.Boxenhorn : The "alphacode" mnemonics created by David Boxenhorn (version 5 dated 2010-05-27)
- BZ. Downes : The main transliteration codes (pages 239–440) created by Alan Downes for his BA thesis (Macquarie University, 2010-06-17)
Description of TangutPhoneticData
[TBD]
Description of TangutDefinitionData
[TBD]
Sources
- DOW2010. Downes, Alan, Xixia Dictionaries, Long Transliteration Edition, Supplement to The Xixia Writing System (Bachelor of Arts Honours Thesis). Macquarie University, June 17, 2010.
- GRI1972. Grinstead, Eric, Analysis of the Tangut Script (Scandanavian Institute of Asian Studies Monograph Series No.10). 1972.
- HXM2004. Hán Xiǎománg (韓小忙), 西夏文正字研究 (Xīxiàwén Zhèngzì Yánjiū) [Research into the Correct Forms of Tangut Characters]. 2004.
- JYS2008. Jǐng Yǒngshí (景永时 ), 西夏文字处理系统使用手册 (Xīxià Wénzì Chǔlǐ Xìtǒng Shǐyòng Shǒucè) [Handbook for the Tangut Document Processing System]. Yinchuan, 2008.
- KEP1969. Keping, K. B. (К. Б. Кепинг) et al., Море письмен (More pisʹmen) [The Sea of Characters]. Moscow, 1969
- KK1966. Kolokolov, V. S. (В. С. Колоколов) and E. I. Kyčanov (Е. И. Кычанов), Китайская классика в тангутском переводе (Kitajskaja klassika v tangutskom perevode) [Chinese Classics in Tangut Translation]. Moscow, 1966.
- KYC2006. Kyčanov, E. I. (Е. И. Кычанов), Словарь тангутского (Си Ся) языка (Slovarʹ tangutskogo (Si Sja) jazyka) [Tangut-Russian-English-Chinese Dictionary]. St. Petersburg and Kyoto, 2006.
- TYYJ1986. Lǐ Fànwén (李範文), 同音研究 (Tóngyīn Yánjiū) [Study of the Homophones]. Yinchuan. 1986.
- LFW1997. Lǐ Fànwén (李範文), 夏漢字典 (Xià-Hàn Zìdiàn) [Tangut-Chinese Dictionary]. Beijing. 1997.
- LFW2008. Lǐ Fànwén (李範文). 夏漢字典 (Xià-Hàn Zìdiàn) [Tangut-Chinese Dictionary]. Beijing, 2008.
- NAK2000. Nakajima Motoki (中嶋幹起) et al., 電脳処理《文海宝韻》研究 (Dennō shori ‘Bunkai Hōin’ kenkyū) [Research into the Computer Processing of the 'Precious Rhymes of the Sea of Characters']. Tokyo, 2000.
- NEV1960. Nevskij, N. A. (Н. А. Невский), Тангутская филология: Исследования и словарь (Tangutskaja filologija: Issledovanija i slovarʹ) [Tangut Philology: Researches and Dictionary]. Moscow, 1960.
- NIS1966. Nishida Tatsuo (西田龍雄), 西夏文小字典 (Seikabun Shōjiten) [Little Dictionary of Tangut]. In 西夏語の研究(Seikago no kenkyū) [A Study of the Hsi-Hsia Language] (1964-1966) vol.2. Tokyo, 1966.
- WHYJ1983. Shǐ Jīnbō (史金波) et al., 文海研究 (Wénhǎi Yánjiū) [Study of the Sea of Characters]. Beijing, 1983.
- SOF1968. Sofronov M. V. (М. В. Софронов), Грамматика тангутского языка (Grammatika tangutskogo jazyka) [Grammar of the Tangut Language]. Moscow, 1968.
Acknowledgements
My work on the Tangut database is indebted to many friends and experts, and I would especially like to express my thanks to the following people: