Re: KANJD212
- From: jwb@xxxxxxxxxxxxxxxxxx
- Date: Tue, 06 Dec 2005 03:25:42 GMT
dareka <dareka@xxxxxxxxxxx> dixit:
>jwb@xxxxxxxxxxxxxxxxxx wrote:
>> dareka <dareka@xxxxxxxxxxx> dixit:
>>>I don't understand what you mean by "modern standardization".
>>
>> Well, the current approach is to establish a code-point according
>> to a number of factors, but not to be prescriptive about the finer
>> points of the glyphs. For example in 籾 the length of the last stroke
>> was reduced. Nothing else changed; it was still the same kanji. That
>> wouldn't happen now.
>Who decides the factors and what are their criteria, Unicode?
I was actually referring to the JSC committee. I daresay
they have an eye on the Unicode Han-unification principles,
but they pretty much do their own thing. Bear in mind that
national standards override any Unicode unification processes
as one of the Unicode ground-rules is that if an (abstract)
character is coded more than once in a national standard, then
it will be coded more than once in Unicode. Otherwise
characters like 刃/刄 and 劔/劒 might have been unified.
>How do you decide whether a glyph is a different kanji or
>character from another, or simply a different handwriting,
>font or something?
The principles used in the Han unification are pretty clear. See:
http://www.unicode.org/versions/Unicode4.0.0/ch11.pdf from
about the 10th page.
>Are the criteria which decide a glyph is
>different character from another or not consistent anytime
>anywhere?
Certainly. Read the document I mentioned above.
>You can say these criteria are actually boiled down
>to depending on language, jurisdiction and culture.
I can, but I haven't said that.
>Is Unicode
>trying to influence languages, jurisdictions and cultures of
>the world?
I doubt it, except that by developing a codeset that can be
used to represent all the world's languages, it will undoubtedly
influence them. In the long run it will have less influence than
radio, television, films, printing, etc. etc. all of which
"influence(s) languages, jurisdictions and cultures of the world"
>> Well, they were still caught up in the idea that the standard
>> dictated the glyphs. This is still a common attitude in Japan,
>I don't think so. JIS people seem to refrain from dictating the
>actual fonts or glyph implementations as much as possible. I
>think it's a simply view of frustrated and prejudiced Unicode
>believers.
I agree. But they are still vocal.
>But as long as you use Unicode you can never be sure in what
>language a character is written and the character may be
>displayed in fonts it is not supposed to use: as I said before
>what fonts countries, cultures, and/or languages tend to use
>are different.
This is true. I can use a code like JIS X0208 along with a font
which looks very wierd to most Japanese. You couldn't blame
JIS X 0208 for this.
>At least you can say Unicode characters have a
>tendency to be displayed in strange fonts. It's Unicode's
>fault and not users'.
I wouldn't say that at all, and there's no way I could say
it's "Unicode's fault". I'd blame the provider of the font. Or
the user who installed an unfamiliar font.
It's not the fault of the digit "7" that people from some parts of
Europe put a horizontal stroke through it.
>> No, it was because the very original standard, JIS C 6220-1976, had
>> done some ham-fisted unification of similar kanji, which had upset
>> people. The fiddling around in the 1983 version of JIS X 0208
>> (successor to JIS C 6220) was an attempt to backpedal a little.
>> JIS X 0213 effectively opened the door to de-unifying those kanji.
>I don't know but isn't it simply that JIS 1976 people were as
>naive as Unicode people?
Had they stopped beating their wives too? I don't think the
Unicode people (which BTW included plenty of Japanese, with the
Han-unification subcommitteee chaired by a Japanese person) were
naive at all. The early JSC committee may have been naive; they
were certainly unsystematic. But to be fair, they had to develop
the rules and techniques as they went along.
>No, Unicode people are more naive
>because I don't think what JIS people meant in 1976 was not a
>"unification" at all but a selection from a larger set of
>kanjis and characters and what they did in 1983 was changing
>the selection and some example glyphs or 字体? I guess there
>were especially no definitive definition on 字体 or 字形 in
>1983 and before.
>BTW, JIS X 9051-1984 seems to be a standard for 16 dot font
>and not a computer character code standard.
Correct.
>I don't think it's
>not a bad idea in principle that whatever printer or display
>you use there is this font that gives you a exactly same font,
>though I think it has little actual demand.
I think that's a principle that most people in the printing/
display industries would now reject. I'd hate to have a
writing thought-police unit telling me exactly how to form
characters. Sounds a bit Singaporean to me.
>> ISO 2022 is a structure for carrying various national codes, and
>> JSC is the body which locks down which codesets are specified by
>> which escape sequence for the Japanese sets.
>>
>> It's a bit of a hack in that 十 can be coded at least 3 different ways:
>> Japanese, Chinese and Korean. Hence you have the problems you get when
>> you try and compare n with n.
>I think, usually for the most people, having only one 十 or
>other more obscure and having less common meaning character
>for multiple languages does more harm than good.
In what way?
>At least
>that's my anecdotal impression.
Well, share some anecdotes with us. Is it a problem that 世界 in
Japanese and 世界 in Chinese end up using the same code-points in
Unicode? Both English and French
share an alphabet (with added diacritics for French) and we have a
squillion words written the same: initiative, commune,
manifestation, etc. etc. In what way is it a problem if they
are coded the same way, e.g. in ISO-8859-1 or Unicode?
>As for n and 全角 n, I have a
>bit different opinion because n is basically used as "English"
>or other alphabet language character even if it is 全角 n. At
>least 全角 n is not as important as 半角カナ.
半角カナ is a hack which should have been put down years ago.
>> In fact the vast majority of kanji that are ever likely to be
>> used are in JIS X 0208. If I were to try and push beyond that,
>> I'd go straight to Unicode and not futz around with JIS213. All
>> of JIS213 is in Unicode.
>It's only your opinion. I think if Windows would be able to
>handle the JIS X 0213 encodings, I think JIS X 0213 would
>catch on, unlike Unicode.
Unlike Unicode? Every user of WinXP is using Unicode whether they
like it or not. Most are blissfully unaware, which is the way it should
be.
>Perhaps JIS has a lot of its own
>problems but at least JIS knows better than Unicode as to what
>Japanese people really need.
You are falling back on this mythical concept that Unicode is in
some way dictating to the Japanese. It ignores some very basic
facts about Unicode, the main one being that EVERY kanji in a JIS
standard was taken into Unicode. NOTHING was left out. The
only way kanji, hanzi, hanja, etc. get into Unicode is via
national standards like JIS. Unicode is basically an aggregation
of national character standards, and if a particular kanji is
missing from it (e.g. some of the IBM-kanji) it's because JSC has
declined to include them.
>It seems kanjis in JIS X 0213 are
>really needed by people like 青空文庫 and I don't think they
>are the hardest 文系 people at all.
There are many people critical of the criteria used by the
early JIS committees for selecting the kanji to include. I am
one of them. The situation would have been better if JIS212 had
been more widely implemented, but the invention and widespread
use of Shift_JIS screwed that (all for the sake of 半角カナ.)
Most of JIS X 0213 came from JIS212, and has been in Unicode
since V1.0 (about 12 years ago).
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
ジム・ブリーン@モナシュ大学
.
- Follow-Ups:
- Re: KANJD212
- From: Rolomail
- Re: KANJD212
- From: dareka
- Re: KANJD212
- References:
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- Prev by Date: Re: EDICT : gouriki vs kyouryoku (example sentences)
- Next by Date: Re: 'Tefu-tefu' read as 'Choo-choo'
- Previous by thread: Re: KANJD212
- Next by thread: Re: KANJD212
- Index(es):
Relevant Pages
|