Re: Arabic cursive in Unicode
- From: "Peter T. Daniels" <grammatim@xxxxxxxxxxx>
- Date: 24 Nov 2006 06:05:16 -0800
Mike Wright wrote:
Peter T. Daniels wrote:
Mike Wright wrote:
Peter T. Daniels wrote:
Andreas Prilop wrote:In the case of Arabic, the solution seems entirely sensible.
On 21 Nov 2006, Peter T. Daniels wrote:And if Netscape 3.04 would run on this Windows XP machine, I'd still be
I'm sorry, but I have no idea what you mean. "Presentation form" is not"I'm sorry, officer, but I have no idea what you mean. 'Speed limit' is not
a term used in the study of writing systems, or of Arabic, or in
typography.
a term used in the study of writing systems, or of Arabic, or in
typography."
using it.
Did you fail to notice that the original question was about the
typography of Arabic writing?
It turned out to be some Unicode nonsense. Unicode clearly got off to
an unfortunate start, and a lot of the difficulties with Unicode result
from trying to work around the messes it got stuck with at the start.
What I'm gathering from this thread is that there are two completely
separate groups of characters: the ones that you below call "logical
characters" and the ones I would call "allographs."
But one set of the allographs -- what Arabic grammars call "independent
forms" (i.e. unconnected on either side) -- should be identical to the
"logical characters."
In a way, the "logical characters" have no need of any concrete form. A
"logical character" can be thought of as an abstraction that represents
the set of all forms of that "character". Isn't that what a grapheme is?
I guess that "glyph" is really too general a term for positional
variants, and allograph would be the correct term.
Yet you tell me they've given them a concrete form ...
If you tell someone to write the character /ba:?/, which allograph
should they write? It depends on the context, doesn't it? Likewise, the
code 0628 does not represent an allograph, any more than does the sound
/ba:?/.
They will, of course, write the independent (isolated) form, a bowl
balanced on a dot.
Of course, for reference purposes, a listing of the names of the
graphemes might show just one of the allographs. And, since every
grapheme has at least an "independent form" (which Unicode refers to as
"isolated"), that is the allograph that makes sense. The same goes for a
listing of Unicode codes. It would not be wrong, however, to show *all*
of the allographs for each character--depending on the purpose of the
listing.
It would be impossible to show _all_ the allographs (for the same
reason you can't list all the allophones -- they involve indiidual
variation).
Different glyphs may be required for initial, medial, final, and
isolated forms of each character. A font must store all those glyphs,
and software must be able to specify the appropriate glyphs based on
context.
However, it doesn't make sense for the user to have to learn to type up
to four different glyphs for each logical character. It's simpler--and
faster--for the user to just type the logical characters and for the
software to figure out the appropriate glyphs based on context.
Also, from the standpoint of software, it's probably best to store the
characters in a glyph-independent format. This makes it possible for
software to quickly switch between a typewriter style, stringing one
glyph after another on a line, and a style that's a bit like
handwriting, with a more diagonal stacking of glyphs within words
(assuming a font that supports the difference).
Anything that requires parsing of Arabic strings should operate on the
logical characters, rather than on glyphs that might represent two or
more characters, so it makes more sense to store the logical characters
from this standpoint, as well.
You haven't defined which of your levels is "presentation characters,"
and "logical characters" is a very strange term.
Can you do it with "grapheme" and :"allograph"?
Is it correct to say that we speak allophones, not phonemes, and that we
write allographs, not graphemes?
Yes
If so, then "grapheme" works for "logical character", and this is what
is covered by the 0600 set. (I was just following the original poster in
using that term.) "Allograph" works for the actual forms, which are
covered by the two "Presentation" sets.
So (I'm looking at Unicode Version 1 Manual vol. 1, pp. 218 and 552)
0600 0628 and 0600 FE8F are identical in form, different in function.
Was that an efficient way to do it?
Do <A> and <a> have some platonic representation of "first roman
letter" with the appropriate variant chosen contextually?
The "Presentation Forms-B" set seems to be mostly the standard
positional allographs required for the basic Arabic alphabet. There are
a few exceptions, such as variations on alif and laam-alif. I tend to
think of laam-alif as a ligature.
The "Presentation Forms-A" set seems to cover the allographs of the
graphemes of non-Arabic languages, as well as a number of ligatures. For
example:
FBA0 ARABIC LETTER RNOON ISOLATED FORM
F8D4 ARABIC LETTER NG FINAL FORM
FBB1 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM
FC0B ARABIC LIGATURE TEH WITH JEEM ISOLATED FORM
I hadn't realized how many non-Arabic characters there are. I'd love to
see a single listing of what sounds they represent in various languages.
The Unicode names do tend to suck. I have no idea what "RNOON" or
"PEHEH" might refer to.
See The World's Writing Systems for all the languages that use Arabic
script nowadays, with a couple extra.
.
- Follow-Ups:
- Re: Arabic cursive in Unicode
- From: Mike Wright
- Re: Arabic cursive in Unicode
- References:
- Arabic cursive in Unicode
- From: Danny
- Re: Arabic cursive in Unicode
- From: Andreas Prilop
- Re: Arabic cursive in Unicode
- From: Danny
- Re: Arabic cursive in Unicode
- From: Peter T. Daniels
- Re: Arabic cursive in Unicode
- From: Danny
- Re: Arabic cursive in Unicode
- From: Peter T. Daniels
- Re: Arabic cursive in Unicode
- From: Andreas Prilop
- Re: Arabic cursive in Unicode
- From: Peter T. Daniels
- Re: Arabic cursive in Unicode
- From: Mike Wright
- Re: Arabic cursive in Unicode
- From: Peter T. Daniels
- Re: Arabic cursive in Unicode
- From: Mike Wright
- Arabic cursive in Unicode
- Prev by Date: Re: Two pairs of technical terms
- Next by Date: Re: Arabic cursive in Unicode
- Previous by thread: Re: Arabic cursive in Unicode
- Next by thread: Re: Arabic cursive in Unicode
- Index(es):
Relevant Pages
|