Re: Petition to UN on Abolishment of Traditional Chinese in 2008



Dylan Sung wrote:
"Lee Sau Dan" <danlee@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:87sloxc1of.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

"yky" == yky <yky@xxxxxxx> writes:

yky> Dylan Sung wrote:
>> So why did the traditional character set Big5 merge zhe5
>> (aspect particle of continuing action) and zhuo (to wear)?
>> There was no justification for that either, so don't just blame
>> the simplification side.

yky> Huh? Those two are different characters?

Maybe, he meant zhu4, as in zhi4ming2 (famous), pian1zhu4 (to edit and
write [a book]), etc.

Big5 has since long been extended to include zhe5/zhuo.



http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=8457&useutf8=true
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=7740&useutf8=true

More info:
u+8457 = Morohashi #31302; "grass" radical + 8 strokes.
u+7740 = Morohashi #23339; "eye" radical + 6 strokes.

However, the "correct" form for both is actually Morohashi
#31410; "grass" radical + 9 strokes. It differs from
Morohashi #31302 (u+8457) by having an extra dot at the
middle of the character.

Neither Morohashi #31302 (u+8457) nor Morohashi #23339
(u+7740) is in the copy of Kangxi that I have. (ISBN
7-101-00518-7; Zhonghua Shuju, Beijing; 1st ed 1958;
11th printing 2002). OTOH Morohashi #31340 is in, as
expected.

The Unihan DB appears to have errors on these two code points.
Specifically,

(1) The entry for u+8457 refers to Kangxi Index
"1440 260"[*]. However, on page 1440 one finds only
Morohashi #31410 (with extra dot), not Morohashi #31302
(u+8457).

[*] I have never figured out what the second number is for.

(2) The entry for u+7740 refers to Kangxi Index
"0808 051". Howeve, on page 808 there is no such character.

This shows, once again, that neither Big5 nor Unicode are
(good) pedagogic standards.

Tak
--
----------------------------------------------------------------+-----
Tak To takto@xxxxxxxxxxxxxx
--------------------------------------------------------------------^^
[taode takto 陶德] NB: trim the xx to get my real email addr



.



Relevant Pages

  • Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
    ... Another character set and encoding! ... I'm not familiar with GB2312 and Big5 but I expect that there are ... You have to tell IE what encoding to use to display the file. ... Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/ ...
    (comp.lang.java.programmer)
  • Re: [PHP] Re: 0x9f54
    ... I hope you're not using legacy encoding like Big5 or GB. ... While in Big5 every character is represented by two ... you should look at the positive side of using Unicode ...
    (php.general)
  • Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
    ... I'm not familiar with GB2312 and Big5 but I expect that there are characters in GB2312 that are not in Big5. ... Whether the conversion from GB2312 to UTF-16 and then to Big5 can convert a simplified character to a traditional counterpart is unknown to me. ... You have to tell IE what encoding to use to display the file. ... If Java writes a character that is not present in the specified output character set then I expect it might also substitute a placeholder character. ...
    (comp.lang.java.programmer)
  • Re: Petition to UN on Abolishment of Traditional Chinese in 2008
    ... the "correct" form for both is actually Morohashi ... Morohashi #31302 by having an extra dot at the ... means the character would have been just after the between ... pedagogic standards. ...
    (sci.lang)
  • Re: Big5--->GB converter
    ... Converting Big5 text to GB text is not as simple as it seems. ... Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, ... GBK is the de facto Simplified Chinese encoding scheme. ...
    (comp.lang.java.programmer)