Re: Inverted 2 and inverted 3 in Unicode?

From: Harlan Messinger (h.messinger_at_comcast.net)
Date: 09/21/04


Date: Tue, 21 Sep 2004 11:41:11 -0400


"Tak To" <takto@alum.mit.edu.-> wrote in message
news:S8ednYWrFMOH7NLcRVn-vQ@comcast.com...
> Harlan Messinger wrote:
> HM.2> Then I'm not sure what you're looking for. I thought you were
> HM.2> trying to justify Unicode just listing radicals instead of
> HM.2> characters, on the grounds that the characters can then just
> HM.2> be defined as combinations of radicals.
>
> I did not suggest Unicode doing anything but I have said that
> (1) a Chinese character (<zi4>) can be defined as a combination of
> radicals just as an English word is defined as a series of
> letters;
> (2) that modern computer technology may very well be able to
> layout a Chinese character on the fly based on the series
> of strokes/radicals using a "font" of radicals.
>
> HM.2> I've demonstrated that it's not as simple as you assert.
>
> I don't know what "simplicity" you think I have implied by
> the above; therefore I don't understand your demonstration at
> all. To wit, you merely said, "it is not that X [X=simple]" then
> "prove me wrong"; and when I showed a general scheme, you said
> "See? It is not that X". Perhaps it would help if you would
> describe more in more details what you consider to my assertion
> of simplicity.

The quick summary: You're proposing that something be done a certain way on
computers. I explained why it can't be done that way.

The exhibit to which I referred you makes it clear that no matter how
complex an algorithm you develop to build up characters from their parts,
someone can design a font for which that algorithm won't work. And when
someone designs such a font, there isn't suddenly going to be an update
available for the rendering application that will accommodate this font.
This is why the rules for putting together a character are built into the
font, not into the rendering application or its underlying data structure or
encoding system.

>
> HM.2> I also don't understand *why*
> HM.2> you would want to do it that way.
>
> My point was that the "abstract character" in Unicode is based in
> practical consideration rather than consistent logic -- e.g., some
> "abstract characters" are atomic while others are composites.
>
> ----- -----
>
> TT.1> Laying out a Chinese character
> TT.1> is similar to laying out a web page, so I imagine a reasonable
> TT.1> setup would be that there will be a description of the character
> TT.1> representable by some sort of markup language, a default
> TT.1> "style***" defining a set of reasonably asethetic proportions
> TT.1> (e.g., that the "four_dots" radical should be half as tall as
> TT.1> the component above it), and a collection ("font") of (scalable)
> TT.1> radicals and strokes. These three components are more or less
> TT.1> independent of each other. In addition, there should probably
> TT.1> be a provision to override some of the parameters at the generic
> TT.1> radical, generic character, or individual (occurrence of)
> TT.1> character level (e.g., via additional "stylesheets").
> TT.1>
> TT.1> A layout of the <hei> character will be something like:
> TT.1>
> TT.1> <obj name=#upper
> TT.1> <obj name=#box1
> TT.1> <obj name=#box
> TT.1> <name=#b1 stroke=down>
> TT.1> <name=#b2 stroke=ne_corner beg=top(#b1) ratio=.5 >
> TT.1> < stroke=cross beg=bot(#b1) end=bot(#b2) >
> TT.1> </obj>
> TT.1> <stroke=\_dot loc=left( in(#box)) >
> TT.1> <stroke=/_dot loc=right(in(#box)) >
> TT.1> </obj>
> TT.1> <name=#v0 stroke=down beg=mid(top(#box1) len=2*ht(#box) >
> TT.1> <name=#h1 stroke=cross mid=3/4*ht(#v0) wd=narrower(#h2) >
> TT.1> <name=#h2 stroke=cross mid=bot(#v0) wd=wider(#h1) >
> TT.1> </obj>
> TT.1> <stroke=four_dots loc=below(#upper) >
>
> HM.2> As the font variety demonstrates, this is vastly insufficient:
> HM.2> You can't assume some fixed way in which characters will be
> HM.2> built from their parts.
>
> I am not sure what you considered by "fixed" in my description above
> and hence why it is "insufficient".

Your representation explains how a *given* font might build hei4. The
exhibit shows all kinds of variants of hei4 that your representation doesn't
describe. Again, that's why the representation of the character belongs in
the font, not in the encoding scheme or the associated character attributes
database. You are confusing abstract characteristics of a character with the
physical characteristics of an arbitrary rendering of that character.
Everything you are assuming about the way a Chinese character is build from
its parts is a *generalization* that many fonts will break.


Quantcast