Re: Chinese character & pinyin frequency analysis



"Richard" == Richard Wordingham <jrw0602@xxxxxxxxxxx> writes:

>> The real question is: why bother doing it manually? And why
>> bother doing it at all?

Richard> 1. Older text editors aren't much good at mixing fonts.

Then, upgrade your editor. If you're serious enough to use Chinese
characters, you should be using one that does it properly.


Richard> 2. Some of the time I use these codes (jargon: character
Richard> entities) to check the interpretation of control-like
Richard> characters, such as ligature controls, or to effect a
Richard> choice of normalisation. These would not readily show up
Richard> in most editors.

Such things should be done by programs (and programmers debugging
their programs). You don't use a hex-editor to create/check your
files, do you? Then, why check the unicode?


Richard> 3. It can be tempting to compact text by using a legacy
Richard> encoding. There are also message boards where characters
Richard> will get misinterpreted - I have had to enter accented
Richard> letters as character entities to avoid them being
Richard> misinterpreted according to a legacy code.

Use an editor that can do that automatically. :)



Richard> 4. There are a few characters that are best entered in
Richard> HTML text as character entities ('<', '&' and
Richard> multi-character white space immediately spring to mind),
Richard> though there are symbolic names for these.

Again, these ought to be delegated to a decent editor.


Richard> 5. It's a lot quicker to type '&#331;' for eng than to
Richard> fiddle about with keyboard selections.

What are "keyboard selections"?


Richard> 6. Finally, on one bulletin board I have to type '(b)' as
Richard> '(&#98;)' to stop it being converted to a smiley.

It's strange that this board's software does a stupid conversion
('(b)' -> smiley), but not nested conversions ('(&#98;)' --> '(b)' -->
smiley).

You should have filed a bug report, rather than relying on stupid
work-arounds.


--
Lee Sau Dan 李守敦 ~{@nJX6X~}

E-mail: danlee@xxxxxxxxxxxxxxxxxxxxxxxxxx
Home page: http://www.informatik.uni-freiburg.de/~danlee
.



Relevant Pages

  • Re: Chinese character & pinyin frequency analysis
    ... Richard> Microsoft recently upgraded Notepad, ... It now searches one's fonts for characters not ... Richard> So which editor do you suggest? ... Richard> Selecting keyboard layouts. ...
    (sci.lang)
  • Re: Chinese character & pinyin frequency analysis
    ... It now searches one's fonts for characters not in the font you are using, so that problem has *now* largely gone away. ... It wasn't a problem with Chinese characters, though Windows Vista apparently includes some horrible bodges to get round the TrueType limit of 64K glyphs per font. ... Richard> entities) to check the interpretation of control-like ... This, if I want to mix Thai, Lao, Khmer and Latin-1, I would normally switch between four different keyboard layouts. ...
    (sci.lang)
  • Re: Non-specific settings
    ... Richard and his ability to coerce or entice others into his evil. ... The next year the same theater tried the ... the lack of setting was definitely being used to make the characters ... Everyman, and in fact to underscore the complicity of the audience. ...
    (rec.arts.sf.composition)
  • Re: Chinese character & pinyin frequency analysis
    ... Chinese characters, you should be using one that does it ... Richard> in the font you are using, ... How does Emacs help with my telling where ZWJ and ZWNJ have been placed? ...
    (sci.lang)
  • Re: Word count of minimum vocabulary
    ... Richard> Well, that won't take long. ... Richard> characters are patently not ideographic, ... Then, it's an ideograph, right? ... and ignore the phonetic hints. ...
    (sci.lang)