Re: Chinese character & pinyin frequency analysis



"LEE Sau Dan" <danlee@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
"Richard" == Richard Wordingham <jrw0602@xxxxxxxxxxx> writes:

Richard> How does Emacs help with my telling where ZWJ and ZWNJ
Richard> have been placed? I did an experiment with 'Ca‍esar'
Richard> (with ZWJ between 'a' and 'e') and found 'a ZWNJ e'
Richard> displayed as 'a e'.

Correction: 'a ZWJ e' displayed as 'a e'.

What are ZWJ and ZWNJ?

Basically characters respectively mandating (ZWJ = Zero Width Joiner) and prohibiting (ZWNJ = Zero Width Non-Joiner) ligation of the characters either side. There are some complications in their semantics for cursive scripts (prototypically Arabic) and Indic scripts (prototypically Devanagari). I understand they'd be rather important for a proper fraktur font.

Richard> In Code2000 it displays the same as 'æ'. I see I've
Richard> still got some work to do to get Emacs to use the
Richard> Code2000 font, assuming that it's possible.

What's Code2000?

The usual example of a font that attempts to cover the entire Basic Multilingual Plan (less the Private Usage Area).

Richard> Using a legacy encoding for compacting will also result
Richard> in one's having a pair of source and derived files, and
Richard> possible problems if the editor is not clever enough to
Richard> convert the character encoding declaration.

It's a file or worflow management issue. We always of lots of files
and copies (real ones or virtual) around. Having them isn't a
problem. Not managing them properly is.

Nobody has ever complained that after writing a C program, one has to
compile it and that generates a second file---the executable.
(Intermediate-level programs may need to use a third, intermediate
file---object code, too.) Why?

Because there is little temptation for maintenance to be done on the object or executable files. On the other hand, when one uses automatic code generators, (e.g Statemate or Simulink to C) there is a severe risk of maintenance being done on the derived files rather than, in these cases, the 'pictorial' sources. In the case I have in mind, the temptation is extremely high.

If you're using the native Win32 port of Emacs, then Unix font specs.
are irrelevant. The Win32 port should be using native Windows fonts.
You may need to consult the manual, though, to find out the details.

Does one exists for the Win32 port? I'm looking through a FAQ, and it chiefly refers to the Unix-style names. I've tried some of the example commands.
They tell me that Arial (for example) is available, but it doesn't appear on the GUI font menu.

Richard> - but they do get saved when I save the file in UTF-8 or
Richard> UTF-16. It looks as though there is more work to do on
Richard> Emacs for Unicode support.

Yes. Unicode is not native for Emacs. But given that many people use
Emacs to edit UTF-8 and UTF-16 files, and they have been doing that
for years, I don't think there would be a very big problem. You may
want to search the web for some additional Emacs packages that can
facilitate your work.

Possibly. That is getting distinctly obscure.

There also definitely seems to be a problem with proportional fonts. It would seem that one once had to use fixed width Thai fonts (i.e. 0 or 1 cells per character) for MULE, and I suspect that may still be the case for serious editing. A freshly pasted line is legible, but moving a cursor through it or typing it afresh causes serious problems. The word น้ำ is unrecognisable after typing it, though repasting the line does cause it to display properly.

.



Relevant Pages

  • Re: Chinese character & pinyin frequency analysis
    ... Chinese characters, you should be using one that does it ... Richard> in the font you are using, ... How does Emacs help with my telling where ZWJ and ZWNJ have been placed? ...
    (sci.lang)
  • Re: Chinese character & pinyin frequency analysis
    ... Font and characters are 2 separate things. ... Richard> display the text one is editing. ... Emacs has been supporting fontsets for a decade. ...
    (sci.lang)
  • Re: Chinese character & pinyin frequency analysis
    ... Richard> done on the object or executable files. ... Richard> proportional fonts. ... That's quite different to the Windows port (of Emacs 22.1.1), which emulates a cell-based display device when one is editing a line. ... Suddenly Thai input worked tolerably - combining characters don't appear until I use Ctrl-L, but thereafter I can move the cursor through them without trashing the display. ...
    (sci.lang)
  • Re: Forms and Tables
    ... Well, Richard, I'm afraid I can't tell you. ... a site that teaches you everything you need to know about CSS: ... The font you're seeing is the default font that is ... >>formatting tags. ...
    (microsoft.public.frontpage.client)
  • Re: Chinese character & pinyin frequency analysis
    ... Richard> done on the object or executable files. ... You may try the Emacs wiki when looking for answers. ... Richard> proportional fonts. ... but moving a cursor through it or typing it ...
    (sci.lang)