Re: Chinese character & pinyin frequency analysis



"LEE Sau Dan" <danlee@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

(On the use of numeric entities.)
Richard> It's still tedious if you are entering the codes by hand.

The real question is: why bother doing it manually? And why bother doing it at all?

1. Older text editors aren't much good at mixing fonts.

2. Some of the time I use these codes (jargon: character entities) to check the interpretation of control-like characters, such as ligature controls, or to effect a choice of normalisation. These would not readily show up in most editors.

3. It can be tempting to compact text by using a legacy encoding. There are also message boards where characters will get misinterpreted - I have had to enter accented letters as character entities to avoid them being misinterpreted according to a legacy code.

4. There are a few characters that are best entered in HTML text as character entities ('<', '&' and multi-character white space immediately spring to mind), though there are symbolic names for these.

5. It's a lot quicker to type '&#331;' for eng than to fiddle about with keyboard selections.

6. Finally, on one bulletin board I have to type '(b)' as '(&#98;)' to stop it being converted to a smiley.

Richard.

.