CJKV Information Processing 2nd Edn.

Reviewed by Major Keary

The initialism, CJKV, stands for Chinese, Japanese, Korean, and Vietnamese; it has become a standard term in the fields of information processing and character encoding.

In the Foreword of CJKV Information Processing Robert Bringhurst notes that "very few books [written by outsiders about the Orient] have been translated into Asian languages, because so few of them tell Asians anything about themselves. Once in a while, though, outsiders really know their stuff, and insiders see that this is so. The first edition of this book … was recognised at once as the definitive work in the field and was promptly translated into both Chinese and Japanese".

CJKV Information Processing is the definitive text in respect of CJKV information processing using modern computers; it is a valuable resource for anyone with an interest in the development of multilingual software; and it contains much of interest--especially in respect of writing systems—to teachers and students of any of the CJKV languages.

Some may wonder why Vietnamese is included when it uses a Latin-based alphabet. There are two reasons. The language was originally written in Chinese characters, a form of writing that—given Vietnam's large ethnic Chinese population—is still widely used. Apart from some additional characters (horned 'O' and 'U', and crossed 'D') numerous diacritics and tone marks are employed in the official Vietnamese Latin alphabet, which represents a repertoire of, on my count, 233 characters when numerals, punctuation marks, and the other usual symbols are included. One of the advantages of modern electronic typesetting is that 'accented' letters are treated as stand-alone glyphs.

Chinese characters found their way to Japan, Korea, and Vietnam at different periods; in each locale some of the glyphs have since undergone variation and new characters created. While that has been going on there have been changes and additions to the han mother-lode. Han is a term used for Chinese characters (hanzi in China, kanji in Japan, hanja in Korea, and chu Han in Vietnam).

In response to the proliferation of computers the Japanese were first to look at ways of codifying the characters used in their writing system; JIS C 6226-1978 was established on 1 January 1978; there have been several revisions, largely related to the development of ISO 10646 and Unicode. Apart from national standards there have been a number of corporate encodings.

The second edition of CJKV Information Processing is the most authoritative and detailed text on the current state of unification of han characters, which occupy a large part of the ISO 10646/Unicode encoding space. There are ongoing problems with presenting a unified set of characters that satisfy the needs of countries—and particular locales—where the writing system uses Chinese characters. Ken Lunde supports Unicode, but points out that legacy encodings will continue to be used. He discusses legacy-to-Unicode conversion issues in depth and how various programming languages can be used to cope with that situation. The book is an essential reference for software developers working on applications that require multilingual interfaces.

For those who need to work with CJKV languages there is information on operating systems and software in a CJKV environment; there is no other current resource that provides such depth of coverage as CJKV Information Processing.

The breadth of coverage in CJKV Information Processing is most remarkable, reaching into every nook and cranny of the languages, their respective idiosyncrasies, resources (such as dictionaries), programming issues, typesetting issues, input methods, and standards. Apart from information processing, which is the prime focus of this text, it is an unparalleled resource for anyone studying (or teaching) the respective writings systems of the CJKV lanuages.

This is a book that deserves a place in any library with holdings on linguistics, computer programming, typesetting, type design, Unicode, information processing, and—of course—any of the CJKV languages.

The first edition of CJKV Information Processing ran to 1101 pages; the second edition contains more information, but the page count is 864, which may seem strange. The reduction has been achieved by dropping some 420 pages of appendices covering various character sets and encodings. The data is now available as PDF files that can be downloaded (URL details for each of the new appendices are provided in the book). In case you encounter a blank screen simply do a save-page-as and the PDF file will show in the pop-up download dialogue box. Placing the appendices on a web site also enables the information to be updated as necessary.

A tour de force and a delight to read. Ken Lunde is a master technical communicator.

Ken Lunde: CJKV Information Processing 2nd Edn.
ISBN 978-0-596-51447-1
Published by O'Reilly, 864 pp., RRP AU$ 125.00

O'Reilly titles are distributed in Australia by Woodslane (www.woodslane.com.au)


Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <b> <dd> <dl> <dt> <i> <img> <li> <ol> <u> <ul> <pre> <br> <blockquote> <hr> <code><sup><sup><p><em><strong> <h2> <cite> <code> <tt> <h1><table><tr><th><td>
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.