Re: Arabic cursive in Unicode




Mike Wright wrote:
Peter T. Daniels wrote:
Mike Wright wrote:
Peter T. Daniels wrote:
Andreas Prilop wrote:
On 21 Nov 2006, Peter T. Daniels wrote:

I'm sorry, but I have no idea what you mean. "Presentation form" is not
a term used in the study of writing systems, or of Arabic, or in
typography.
"I'm sorry, officer, but I have no idea what you mean. 'Speed limit' is not
a term used in the study of writing systems, or of Arabic, or in
typography."
And if Netscape 3.04 would run on this Windows XP machine, I'd still be
using it.

Did you fail to notice that the original question was about the
typography of Arabic writing?

It turned out to be some Unicode nonsense. Unicode clearly got off to
an unfortunate start, and a lot of the difficulties with Unicode result
from trying to work around the messes it got stuck with at the start.
In the case of Arabic, the solution seems entirely sensible.

What I'm gathering from this thread is that there are two completely
separate groups of characters: the ones that you below call "logical
characters" and the ones I would call "allographs."

But one set of the allographs -- what Arabic grammars call "independent
forms" (i.e. unconnected on either side) -- should be identical to the
"logical characters."

In a way, the "logical characters" have no need of any concrete form. A
"logical character" can be thought of as an abstraction that represents
the set of all forms of that "character". Isn't that what a grapheme is?
I guess that "glyph" is really too general a term for positional
variants, and allograph would be the correct term.

Yet you tell me they've given them a concrete form ...

If you tell someone to write the character /ba:?/, which allograph
should they write? It depends on the context, doesn't it? Likewise, the
code 0628 does not represent an allograph, any more than does the sound
/ba:?/.

They will, of course, write the independent (isolated) form, a bowl
balanced on a dot.

Of course, for reference purposes, a listing of the names of the
graphemes might show just one of the allographs. And, since every
grapheme has at least an "independent form" (which Unicode refers to as
"isolated"), that is the allograph that makes sense. The same goes for a
listing of Unicode codes. It would not be wrong, however, to show *all*
of the allographs for each character--depending on the purpose of the
listing.

It would be impossible to show _all_ the allographs (for the same
reason you can't list all the allophones -- they involve indiidual
variation).

Different glyphs may be required for initial, medial, final, and
isolated forms of each character. A font must store all those glyphs,
and software must be able to specify the appropriate glyphs based on
context.

However, it doesn't make sense for the user to have to learn to type up
to four different glyphs for each logical character. It's simpler--and
faster--for the user to just type the logical characters and for the
software to figure out the appropriate glyphs based on context.

Also, from the standpoint of software, it's probably best to store the
characters in a glyph-independent format. This makes it possible for
software to quickly switch between a typewriter style, stringing one
glyph after another on a line, and a style that's a bit like
handwriting, with a more diagonal stacking of glyphs within words
(assuming a font that supports the difference).

Anything that requires parsing of Arabic strings should operate on the
logical characters, rather than on glyphs that might represent two or
more characters, so it makes more sense to store the logical characters
from this standpoint, as well.

You haven't defined which of your levels is "presentation characters,"
and "logical characters" is a very strange term.

Can you do it with "grapheme" and :"allograph"?

Is it correct to say that we speak allophones, not phonemes, and that we
write allographs, not graphemes?

Yes

If so, then "grapheme" works for "logical character", and this is what
is covered by the 0600 set. (I was just following the original poster in
using that term.) "Allograph" works for the actual forms, which are
covered by the two "Presentation" sets.

So (I'm looking at Unicode Version 1 Manual vol. 1, pp. 218 and 552)
0600 0628 and 0600 FE8F are identical in form, different in function.
Was that an efficient way to do it?

Do <A> and <a> have some platonic representation of "first roman
letter" with the appropriate variant chosen contextually?

The "Presentation Forms-B" set seems to be mostly the standard
positional allographs required for the basic Arabic alphabet. There are
a few exceptions, such as variations on alif and laam-alif. I tend to
think of laam-alif as a ligature.

The "Presentation Forms-A" set seems to cover the allographs of the
graphemes of non-Arabic languages, as well as a number of ligatures. For
example:

FBA0 ARABIC LETTER RNOON ISOLATED FORM
F8D4 ARABIC LETTER NG FINAL FORM
FBB1 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FC0B ARABIC LIGATURE TEH WITH JEEM ISOLATED FORM

I hadn't realized how many non-Arabic characters there are. I'd love to
see a single listing of what sounds they represent in various languages.
The Unicode names do tend to suck. I have no idea what "RNOON" or
"PEHEH" might refer to.

See The World's Writing Systems for all the languages that use Arabic
script nowadays, with a couple extra.

.



Relevant Pages

  • Re: Arabic cursive in Unicode
    ... separate groups of characters: the ones that you below call "logical ... But one set of the allographs -- what Arabic grammars call "independent ... Isn't that what a grapheme is? ... And the Mac Character Palette, could just as well display only the character name in that range--or, it could display the isolated forms drawn from in the Presentation Forms ranges, even if no fonts had glyphs in the 0600 range. ...
    (sci.lang)
  • Re: Arabic cursive in Unicode
    ... separate groups of characters: the ones that you below call "logical ... Of course, for reference purposes, a listing of the names of the graphemes might show just one of the allographs. ... A font must store all those glyphs, ... faster--for the user to just type the logical characters and for the ...
    (sci.lang)
  • Re: Arabic cursive in Unicode
    ... typography. ... separate groups of characters: the ones that you below call "logical ... A font must store all those glyphs, ... faster--for the user to just type the logical characters and for the ...
    (sci.lang)
  • Re: If you could add anything you want
    ... The Japanese don't write their characters exactly the same way as the Chinese do and vice versa. ... Some people aren't too happy that the example glyphs are drawn the "wrong" way. ... There'd be no way to express /what/ the standard was standardising. ...
    (comp.lang.java.programmer)
  • X.EXEs virtual keyboard.
    ... // covering 1.3 thousand glyphs. ... SetTextColor(DC, * Hue); ... // If our surface of 100 monospaced characters is used up, ... Paint_Maybe { ...
    (microsoft.public.vc.ide_general)