Separating similar complex Chinese characters by Walsh transform

作者:

Highlights:

摘要

Typed (machine printed) Chinese character recognition is practically feasible and the recognition rate can be as high as 99.9%. The first step of the recognition procedure is to use 4C code and 4P code to partition the commonly used 5401 characters. 4C code is defined by encoding four corner zones of a character, each into two levels, and 4P code is defined by encoding four peripheral rectangular zones, each into 32 levels according to the number of points having some particular runs of black and white. In this way we can obtain approximately 4096 classes, most containing 1–6 characters. Characters with similar peripheries are grouped together. Here we use Walsh transform to separate these similar characters in each class. The Walsh transform has “sequency” instead of “periodicity” that Fourier transform has, and Walsh transform is also easy to calculate. In our experiment, we use three groups of complex Chinese characters (Ming font), each containing 4–6 characters. Each character is imaged 8 times by changing its size, position and thresholding value. We find that most Walsh coefficients are stable under these changes. Thus we pick up 2–5 coefficients that have most separability power, and we are able to use these coefficients to recognize each character in each group. This shows Walsh transform is a simple, fast and reliable method for separating complex Chinese characters with similar peripheries.

论文关键词:Chinese document processing,Similar characters,Walsh transform,Character recognition

论文评审过程:Received 30 May 1986, Revised 12 September 1986, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(87)90068-9