Re: Inverted 2 and inverted 3 in Unicode?
From: Tak To (takto_at_alum.mit.edu.-)
Date: 09/27/04
- Next message: Helmut Richter: "Re: Request: translate the following phrase into a non-English language"
- Previous message: Douglas G. Kilday: "Re: review of Greenberg Re: Nostratic"
- In reply to: Harlan Messinger: "Re: Inverted 2 and inverted 3 in Unicode?"
- Next in thread: LEE Sau Dan: "Re: Inverted 2 and inverted 3 in Unicode?"
- Reply: LEE Sau Dan: "Re: Inverted 2 and inverted 3 in Unicode?"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 27 Sep 2004 03:49:29 -0400
Harlan Messinger wrote:
HM.4> If you designed your system with an awareness only of the kinds of
HM.4> fonts you say are typically used in books, then your scheme would be
HM.4> incapable of anticipating the rules necessary to construct the fonts
HM.4> in the exhibit I showed you.
Tak To <takto@alum.mit.edu.-> wrote:
TT.5> I have never said that my system is _limited_ to _only_ what I consider
TT.5> as practical styles -- I just said that my system is capable of doing
TT.5> what I consider as practical styles. It can do a lot of impractical
TT.5> styles as well, I believe.
TT.5>
TT.5> And pray tell how font styles "typically used in books" are different
TT.5> from "other" kinds?
HM.4> *It would be impossible to generate those fonts under your scheme.*
TT.5> So you have asserted again and again, without explanation.
HM.6> On the contrary--*you* explain to *me* how a system that knew only
HM.6> the standard ways to build characters for typical book-type fonts (as
HM.6> exemplified by the version in the upper right-hand corner) would have
HM.6> the slightest ability to generate most of the rest of the versions in
HM.6> the exhibit. Psychic ability?
TT.7> No, by the data in the font, as well as the intelligence in the layout
TT.7> engine.
HM.8> The whole gist of your scheme is that you're taking details that exist
HM.8> in the font *now* (i.e., the details of how each character is built),
HM.8> and encapsulating them into the encoding scheme *instead*, with the
HM.8> font only telling you how to draw the individual pieces, which your
HM.8> encoding scheme then supposedly has the information necessary to
HM.8> recombine into characters. So, no, the data *isn't* in the font.
HM.8> That's my whole point.
So?
HM.8> It's no longer in the font because you've
HM.8> removed it, and it's impossible for the rules in the encoding system
HM.8> to anticipate all variations in font one might later wish to design..
It is impossible for for _all_ variations, but possible for a large
class of _practical_ styles. For these styles, the variations in the
character-fonts can be captured by the stroke/radical-fonts,
while the commonality can be abstracted by the encoding scheme cum
the intelligence of the layout engine.
Let me try to explain in another way.
I assume you are similar with the difference between bitmap and
and outline fonts. A glyph in an outline font is defined by a
combination of straight lines and spine curives, whereas a glyph
in a bitmap font is defined by a bitmap. Thus, in your framework,
the straight and curved line segments would be the "individual
pieces", and the "encoding scheme" (namely the coordinates and
parameters of each of the line segments) would be "combination
information". And you would be right, the encoding scheme is not
feasible for _all_ bitmap images. Yet most fonts these days are
outline fonts. Why? It is because the glyphs for a _practical_
font style are not random images: there are well defined boundaries
(instead of fuzzy edges); the curves are smooth (instead of jagged)
that be abstracted into a few parameters; there are a few large
contiguous areas (instead of lots of small disjointed areas); etc.
And this is so because the "abstract character" is ultimately
defined by the trace left by a writing (or carving, etc) instrument.
The non-randomness makes it possible to parameterize the glyph
as a combination of straight lines and cursves.
Likewise, a Chinese character (<zi4>) is also defined by the
trace of a writing instrument, and at a higher level, composition
of strokes and radicals. Just as the traces are not random
images, the ways strokes and radicals combine are not arbitarily
limitless. In other words, an encoding system is feasible for
a practical font style.
Your argument is equivalent to saying that outline fonts are not
feasible, because not all images can be decomposed into lines
and curves. Sure, you can't have an outline font in which the
glyphs are photographic images of, say, US Presidents. However,
you wouldn't call that a practical font, would you?
----- -----
TT.7> Now if you can explain what "_only_ the standard way" mean I
TT.7> might be able to answer you question better. Specifically, what
TT.7> "non-standard" features are illustrated by your samples?
TT.7>
TT.7> Also you have not explained what you mean by "book type".
HM.8> It's you who brought up the subject of the fonts that are used in
HM.8> books, and you want *me* to explain it to *you*?
No, my term has always been "practical". Besides, it was your claim
that most of your samples were not "book type". Perhaps I have been
guilty of being vague, but you don't seem to have any trouble grasping
the idea to the extend that you can conclude that my scheme is
infeasible, and give examples to boot! All I am asking is for you to
explain your own examples (or counter example to my scheme), according
to _your_ understanding.
My question is not an exercise in sophistry, but one in the Socratic
method. As indicated in our previous exchange, it seems to me that you
are unaware that an infinite number of varieties can be abstracted
by a finite number of parameters. So, again, which ones of your
samples you think combine things so differently from others that
they prove your point?
----- -----
TT.7> As I have said, there are trade offs.
TT.7>
TT.7> My scheme, for example, would involve a more complicated layout
TT.7> engine, and thus probably slower, but perhaps not noticeably so
TT.7> using more advanced hardware. On the other hand, it can generate
TT.7> a lot more characters than what is currently defined in Unicode,
TT.7> and with font files that are much smaller.
HM.8> True. But what are all these characters that aren't in Unicode?
Unicode 3.0 has 27484 CJK characters. The dictionary <Kang1xi1
Zi4dian3>, compiled in the eighteenth century, has 47035. The
comtemproary <Han4yu3 da4 Zi4dian3> has 54678. The more specialized
<Zhong1hua2 Zi4hai3> in 1994 has 87019 and the <Yi4ti3 Zi4dian3> has
106230. The last one is specifically a dictionary of "variant"
characters.
In addition, new characters are being created, albeit at glacial
rate. A new chemical element would mostly need a new character,
and as Minnan literature is becoming popular again in Taiwan,
new characters are very likely to be introduced for transcribing
it (and more importantly, to legitimize it).
On the brighter side, Unicode 3.1 has a total of 70195 CJK
characters. However, I don't think there is any font out there yet
that has supports all. Besides, now that a Unicode character
may not fit into 16 bits, a fairly extensive programming (remember
the Y2K problem?) is needed in order to upgrade all those software
who has "thoughtlessly" allocated only 16 bits for each character.
Tak
--
----------------------------------------------------------------+-----
Tak To takto@alum.mit.eduxx
--------------------------------------------------------------------^^
[taode takto ~{LU5B~}] NB: trim the xx to get my real email addr
- Next message: Helmut Richter: "Re: Request: translate the following phrase into a non-English language"
- Previous message: Douglas G. Kilday: "Re: review of Greenberg Re: Nostratic"
- In reply to: Harlan Messinger: "Re: Inverted 2 and inverted 3 in Unicode?"
- Next in thread: LEE Sau Dan: "Re: Inverted 2 and inverted 3 in Unicode?"
- Reply: LEE Sau Dan: "Re: Inverted 2 and inverted 3 in Unicode?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|