Re: KANJD212
- From: dareka <dareka@xxxxxxxxxxx>
- Date: Thu, 08 Dec 2005 01:02:10 +0900
jwb@xxxxxxxxxxxxxxxxxx wrote:
> dareka <dareka@xxxxxxxxxxx> dixit:
>
>>jwb@xxxxxxxxxxxxxxxxxx wrote:
>>
>>>dareka <dareka@xxxxxxxxxxx> dixit:
>>>
>>>>I don't understand what you mean by "modern standardization".
>>>
>>>Well, the current approach is to establish a code-point according
>>>to a number of factors, but not to be prescriptive about the finer
>>>points of the glyphs. For example in 籾 the length of the last stroke
>>>was reduced. Nothing else changed; it was still the same kanji. That
>>>wouldn't happen now.
>
>
>>Who decides the factors and what are their criteria, Unicode?
>
>
> I was actually referring to the JSC committee. I daresay
> they have an eye on the Unicode Han-unification principles,
> but they pretty much do their own thing. Bear in mind that
> national standards override any Unicode unification processes
> as one of the Unicode ground-rules is that if an (abstract)
> character is coded more than once in a national standard, then
> it will be coded more than once in Unicode. Otherwise
> characters like 刃/刄 and 劔/劒 might have been unified.
But once a character is defined/get a codepoint in Unicode it
principally can't be deleted, modified or de-unified. So what
will happen to original codepoint of 刃/刄 if a new national
standard modifies the codepoint of the kanji to a totally new
kanji or divides it into 刃 and 刄 like JIS X 0203 did? The
result is a simple confusion because what the codepoint means
will be different by national standards. I can confidently say
that kanji unification of Unicode is a failure or deed of
evil: they were simply naive about the definitions of kanji or
did the unification propelled by the demands of their
companies like that they wanted all kanjis crammed into 16
bits code. If you really want to unify characters of similar
nature in modern character set (I mean by "modern character
set" one that has as many characters and glyphs as needed),
perhaps only possible way is clearly and definitively defining
single actual glyph for a codepoint.
>
>
>>How do you decide whether a glyph is a different kanji or
>>character from another, or simply a different handwriting,
>>font or something?
>
>
> The principles used in the Han unification are pretty clear. See:
> http://www.unicode.org/versions/Unicode4.0.0/ch11.pdf from
> about the 10th page.
>
>
>>Are the criteria which decide a glyph is
>>different character from another or not consistent anytime
>>anywhere?
>
>
> Certainly. Read the document I mentioned above.
>
>
>>You can say these criteria are actually boiled down
>>to depending on language, jurisdiction and culture.
>
>
> I can, but I haven't said that.
>
>
>>Is Unicode
>>trying to influence languages, jurisdictions and cultures of
>>the world?
>
>
> I doubt it, except that by developing a codeset that can be
> used to represent all the world's languages, it will undoubtedly
> influence them. In the long run it will have less influence than
> radio, television, films, printing, etc. etc. all of which
> "influence(s) languages, jurisdictions and cultures of the world"
OK, if Unicode doesn't define characters except for passively
doing so like by unifying characters, I don't have much to say
for the moment. Simply I think Unicode was stupidly planed and
coded, and is getting more and more complicated or stupider to
hide the previous stupidities.
>
>
>>>Well, they were still caught up in the idea that the standard
>>>dictated the glyphs. This is still a common attitude in Japan,
>
>
>>I don't think so. JIS people seem to refrain from dictating the
>>actual fonts or glyph implementations as much as possible. I
>>think it's a simply view of frustrated and prejudiced Unicode
>>believers.
>
>
> I agree. But they are still vocal.
>
>
>>But as long as you use Unicode you can never be sure in what
>>language a character is written and the character may be
>>displayed in fonts it is not supposed to use: as I said before
>>what fonts countries, cultures, and/or languages tend to use
>>are different.
>
>
> This is true. I can use a code like JIS X0208 along with a font
> which looks very wierd to most Japanese. You couldn't blame
> JIS X 0208 for this.
But as long as you use JIS X 0208, softwares and you can know
it is a Japanese codeset. If you use Unicode, you are only
sure it is supposed to be read by someone on the earth.
>
>
>>At least you can say Unicode characters have a
>>tendency to be displayed in strange fonts. It's Unicode's
>>fault and not users'.
>
>
> I wouldn't say that at all, and there's no way I could say
> it's "Unicode's fault". I'd blame the provider of the font. Or
> the user who installed an unfamiliar font.
>
> It's not the fault of the digit "7" that people from some parts of
> Europe put a horizontal stroke through it.
Again, if the code is not unified, it is distinguishable, like
in ISO-2022 code sets. It's Unicode's fault.
>
>
>>>No, it was because the very original standard, JIS C 6220-1976, had
>>>done some ham-fisted unification of similar kanji, which had upset
>>>people. The fiddling around in the 1983 version of JIS X 0208
>>>(successor to JIS C 6220) was an attempt to backpedal a little.
>>>JIS X 0213 effectively opened the door to de-unifying those kanji.
>
>
>>I don't know but isn't it simply that JIS 1976 people were as
>>naive as Unicode people?
>
>
> Had they stopped beating their wives too? I don't think the
> Unicode people (which BTW included plenty of Japanese, with the
> Han-unification subcommitteee chaired by a Japanese person) were
> naive at all.
It seems when these people gathered for the committee the
unification of kanji and cramming them into 16 bit code had
been already slated. It seems to me what they did was what
they were ordered to do: the unification of kanji. And having
Japanese or other local people doesn't mean much; you can find
a lot of questionable people like people who advocate Japanese
should drop kana and kanji and adopt alphabet instead.
> The early JSC committee may have been naive; they
> were certainly unsystematic. But to be fair, they had to develop
> the rules and techniques as they went along.
I don't argue on this. But at the time of Unicode, they should
have known better if they were not simply biased.
>
>
>>No, Unicode people are more naive
>>because I don't think what JIS people meant in 1976 was not a
Sorry for my usual stupid English. I meant
"because I think what JIS people meant in 1976 was not a".
>>"unification" at all but a selection from a larger set of
>>kanjis and characters and what they did in 1983 was changing
>>the selection and some example glyphs or 字体? I guess there
>>were especially no definitive definition on 字体 or 字形 in
>>1983 and before.
>
>
>>BTW, JIS X 9051-1984 seems to be a standard for 16 dot font
>>and not a computer character code standard.
>
>
> Correct.
>
>
>>I don't think it's
>>not a bad idea in principle that whatever printer or display
>>you use there is this font that gives you a exactly same font,
>>though I think it has little actual demand.
>
>
> I think that's a principle that most people in the printing/
> display industries would now reject. I'd hate to have a
> writing thought-police unit telling me exactly how to form
> characters. Sounds a bit Singaporean to me.
I don't think JIS X 9051-1984 is a kind of thing you have to
use in everything or it restricts using other fonts. I think
its purpose in principle is like designs in traffic signs, in
which exactly same designs are preferred or required.
>
>
>>>ISO 2022 is a structure for carrying various national codes, and
>>>JSC is the body which locks down which codesets are specified by
>>>which escape sequence for the Japanese sets.
>>>
>>>It's a bit of a hack in that 十 can be coded at least 3 different ways:
>>>Japanese, Chinese and Korean. Hence you have the problems you get when
>>>you try and compare n with n.
>
>
>>I think, usually for the most people, having only one 十 or
>>other more obscure and having less common meaning character
>>for multiple languages does more harm than good.
>
>
> In what way?
People usually select from only one language data set or other
limited set of data. Simply data selected from data sets which
you are not interested in to begin with are nuisances.
>
>
>>At least
>>that's my anecdotal impression.
>
>
> Well, share some anecdotes with us. Is it a problem that 世界 in
> Japanese and 世界 in Chinese end up using the same code-points in
> Unicode? Both English and French
> share an alphabet (with added diacritics for French) and we have a
> squillion words written the same: initiative, commune,
> manifestation, etc. etc. In what way is it a problem if they
> are coded the same way, e.g. in ISO-8859-1 or Unicode?
Yes, usually when I search by kanji, only results I want are
Japanese ones; I'm not interested search results I can't read
or of less or no relevance. And what is worse with unified
codes is that the information of what codes a unified code was
before is lost. If the codes were not unified, selection by
emulated unified code is always easy to do.
>
>
>>As for n and 全角 n, I have a
>>bit different opinion because n is basically used as "English"
>>or other alphabet language character even if it is 全角 n. At
>>least 全角 n is not as important as 半角カナ.
>
>
> 半角カナ is a hack which should have been put down years ago.
半角カナ is still very important character set in Japan. Maybe
it is the most single popularly-used character set in Japan
even now. Perhaps people who disregard 半角カナ simply
don't/didn't know how important the code set is/was in the
*real usage* of code sets in Japan.
>
>
>>>In fact the vast majority of kanji that are ever likely to be
>>>used are in JIS X 0208. If I were to try and push beyond that,
>>>I'd go straight to Unicode and not futz around with JIS213. All
>>>of JIS213 is in Unicode.
>
>
>>It's only your opinion. I think if Windows would be able to
>>handle the JIS X 0213 encodings, I think JIS X 0213 would
>>catch on, unlike Unicode.
>
>
> Unlike Unicode? Every user of WinXP is using Unicode whether they
> like it or not. Most are blissfully unaware, which is the way it should
> be.
I don't think Japanese users chose to use Unicode, but think
they were forced to use it by companies like Microsoft, IBM,
Sun, etc. And they are blissfully cheated unawarely by a code
set like a tower of Babel.
>
>
>>Perhaps JIS has a lot of its own
>>problems but at least JIS knows better than Unicode as to what
>>Japanese people really need.
>
>
> You are falling back on this mythical concept that Unicode is in
> some way dictating to the Japanese. It ignores some very basic
> facts about Unicode, the main one being that EVERY kanji in a JIS
> standard was taken into Unicode. NOTHING was left out.
Having all kanjis doesn't mean that it's compatible with
national standards nor that it's easy to use nor efficient at
all. I think I have been saying the whys on this NG, so I
don't repeat. BTW, the 25 characters you gave first in this
thread aren't the characters that are not able to be got in
Unicode if you don't use character combination or something?
> The
> only way kanji, hanzi, hanja, etc. get into Unicode is via
> national standards like JIS. Unicode is basically an aggregation
> of national character standards, and if a particular kanji is
> missing from it (e.g. some of the IBM-kanji) it's because JSC has
> declined to include them.
>
>
>>It seems kanjis in JIS X 0213 are
>>really needed by people like 青空文庫 and I don't think they
>>are the hardest 文系 people at all.
>
>
> There are many people critical of the criteria used by the
> early JIS committees for selecting the kanji to include. I am
> one of them. The situation would have been better if JIS212 had
> been more widely implemented, but the invention and widespread
> use of Shift_JIS screwed that (all for the sake of 半角カナ.)
I think Shift_JIS was a clever encoding. I felt that way only
in Shift_JIS among encodings. And I can hardly imagine UTF-8
would be a standard Japanese encoding even if it was invented
earlier than Shift_JIS. And I think even if Unicode was the
first code set invented, people would invent code sets which
have only their national characters sooner or later, and they
would be popular without Microsoft's help.
>
> Most of JIS X 0213 came from JIS212, and has been in Unicode
> since V1.0 (about 12 years ago).
>
--
dareka dareka@xxxxxxxxxxx
.
- Follow-Ups:
- Re: KANJD212
- From: jwb
- Re: KANJD212
- References:
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- From: jwb
- Re: KANJD212
- From: dareka
- Re: KANJD212
- From: jwb
- Re: KANJD212
- Prev by Date: Re: Sub titled english/ japanese
- Next by Date: Re: That time of a year again.
- Previous by thread: Re: KANJD212
- Next by thread: Re: KANJD212
- Index(es):
Relevant Pages
|