Re: Chinese character & pinyin frequency analysis



Ruud Harmsen wrote:
Wed, 10 Oct 2007 09:13:37 -0700: hxy <huangxinyu714@xxxxxxxxx>: in
sci.lang:


Nice page. I have a question. The code for Chinese characters in
your page does not look like unicode. Your code is &# plus 5 digits.
I remember that unicde for Chinese characters is &#x plus 4 digits, at
least I did that way with the unicode chart, and it works. What type
of code is it in your page?


I suppose &# plus 5 digits is decimal, and &#x plus 4 digits is
hexadecimal. The latter is easier because the Unicode codepages are
also in hexadecimal. Otherwise it makes no difference and it just a
matter of personal preference.

Actually it is
&#<digits>;
or
&#x<digits>;

the trailing semi-colon is mandatory; and the number of digits
(decimal or hexadecimal) does not matter.

Some say decimal is supported by more browsers than hexaecimal, but
that's now probably a thing of the past.

The hexadecimal representation is actually not (yet) defined in
the SGML (HTML is a kind of SGML) standard. However, most browssers
handle it anyway.

Tak
--
----------------------------------------------------------------+-----
Tak To takto@xxxxxxxxxxxxxx
--------------------------------------------------------------------^^
[taode takto ~{LU5B~}] NB: trim the xx to get my real email addr



.



Relevant Pages

  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > These are Unicode characters that have the "digit" Unicode attribute. ... > associated numeric weight as a digit in that radix. ... > 13.1.4.6 (Digits in a Radix). ...
    (comp.lang.lisp)
  • Re: big iron mainframe vs. x86 servers
    ... Hasn't the count of possible answer gone into 3 digits? ... The good thing about Unicode is that the Unicode Consortium has promised ... For IBM-MAIN subscribe / signoff / archive access instructions, ... send email to listserv@xxxxxxxxxxx with the message: GET IBM-MAIN INFO ...
    (bit.listserv.ibm-main)
  • Re: double byte string numbers to_int??
    ... > I made some testing and so far no luck getting encoded strings to convert to ... > I also tried converting the string with Iconv with no results (Illegal ... Those characters are just Unicode characters, ... ASCII digits, Unicode also has digits for plenty of other languages, which may ...
    (comp.lang.ruby)
  • Re: converting letters to its unicode representation
    ... doesn't make sense - it's like saying "series of letter digits". ... my understanding of Unicode is quite low. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Relative Cardinality
    ... know all digits simultaneously. ... > particles of matter in the universe. ... representation of the two numbers to be compared is in any case ...
    (sci.math)

Quantcast