Re: Chinese character & pinyin frequency analysis



"Richard" == Richard Wordingham <jrw0602@xxxxxxxxxxx> writes:

>> Then, upgrade your editor. If you're serious enough to use
>> Chinese characters, you should be using one that does it
>> properly.

Richard> Microsoft recently upgraded Notepad, at least for Windows
Richard> XP users. It now searches one's fonts for characters not
Richard> in the font you are using, so that problem has *now*
Richard> largely gone away.

Bad design. Font and characters are 2 separate things. A text-editor
should only be concerned with text characters, not fonts. Mixing the
element of "font" into a text-search is absurd.



Richard> So which editor do you suggest?

I use Emacs.


Richard> And can I sure be an editor will not normalise my input?

A text editor should NOT fiddle with your input unless you explicitly
instruct it to. That's What You Get Is What You Want.


Richard> I have, very occasionally, resorted to fixing Word files
Richard> by editing the RTF files as plain text.

Why do you need to do that? Because Word sucks?


Richard> And, yes, I do resort to editing binary files when the
Richard> need arises - the worst case was having to edit a VAX
Richard> object file to initialise an additional register.

You should be using something that's better than Word in that aspect.
Would you choose to buy a badly designed car and then be obliged to
fix a few screws everyday, or buy a decent car that have all the
screws working well out of the factory?



Richard> 3. It can be tempting to compact text by using a legacy
Richard> encoding. There are also message boards where characters
Richard> will get misinterpreted - I have had to enter accented
Richard> letters as character entities to avoid them being
Richard> misinterpreted according to a legacy code.

>> Use an editor that can do that automatically. :)

Richard> If you have one to hand.

Emacs. And if you editor can't convert between encodings, use a tool
to do that (e.g. GNU iconv). It isn't that difficult to write a
simple Perl script to do custom conversions, either.


Richard> Using a legacy encoding for compacting will also result
Richard> in one's having a pair of source and derived files, and
Richard> possible problems if the editor is not clever enough to
Richard> convert the character encoding declaration.

Write a Perl script and the process can be automated. Write it once
and use it millions of times.


>> Again, these ought to be delegated to a decent editor.

Richard> So which cheap editor do you suggest for HTML
Richard> incorporating ECMA-script ('javascript')?

I use Emacs.


Richard> 5. It's a lot quicker to type '&#331;' for eng than to
Richard> fiddle about with keyboard selections.

>> What are "keyboard selections"?

Richard> Selecting keyboard layouts. For small scripts (or
Richard> language systems using small subsets), one normally
Richard> selects a script- or language- specific keyboard layout.
Richard> This, if I want to mix Thai, Lao, Khmer and Latin-1, I
Richard> would normally switch between four different keyboard
Richard> layouts. (I'm seriously considering knocking one up for
Richard> IPA.) However, if a lot of keyboard layouts are enabled,
Richard> switching keyboards is as tedious as switching fonts.

You mean different Input Methods?




--
Lee Sau Dan 李守敦 ~{@nJX6X~}

E-mail: danlee@xxxxxxxxxxxxxxxxxxxxxxxxxx
Home page: http://www.informatik.uni-freiburg.de/~danlee
.



Relevant Pages

  • Re: Chinese character & pinyin frequency analysis
    ... why bother doing it manually? ... Richard> 1. ... characters, you should be using one that does it properly. ... Use an editor that can do that automatically. ...
    (sci.lang)
  • Re: Chinese character & pinyin frequency analysis
    ... It now searches one's fonts for characters not in the font you are using, so that problem has *now* largely gone away. ... It wasn't a problem with Chinese characters, though Windows Vista apparently includes some horrible bodges to get round the TrueType limit of 64K glyphs per font. ... Richard> entities) to check the interpretation of control-like ... This, if I want to mix Thai, Lao, Khmer and Latin-1, I would normally switch between four different keyboard layouts. ...
    (sci.lang)
  • Re: Non-specific settings
    ... Richard and his ability to coerce or entice others into his evil. ... The next year the same theater tried the ... the lack of setting was definitely being used to make the characters ... Everyman, and in fact to underscore the complicity of the audience. ...
    (rec.arts.sf.composition)
  • Re: Chinese character & pinyin frequency analysis
    ... Chinese characters, you should be using one that does it ... Richard> in the font you are using, ... How does Emacs help with my telling where ZWJ and ZWNJ have been placed? ...
    (sci.lang)
  • Re: Word count of minimum vocabulary
    ... Richard> Well, that won't take long. ... Richard> characters are patently not ideographic, ... Then, it's an ideograph, right? ... and ignore the phonetic hints. ...
    (sci.lang)