Re: Chinese character & pinyin frequency analysis
- From: Harlan Messinger <hmessinger.removethis@xxxxxxxxxxx>
- Date: Wed, 10 Oct 2007 12:24:15 -0400
hxy wrote:
On Oct 2, 9:33 pm, d...@xxxxxxxxxxxx wrote:I've put together another Chinese characterfrequencylist, but this
one has a bit of a twist.
The usual 100,000 Chinese web pages were analyzed, and unique strings
at least 50 characters long were analyzed.
The results were fed into NJStar to attempt translation into pinyin.
The characters (with their pinyin) were then analyzed.
Results here:
http://readmandarin.com/research.htm
Nice page. I have a question. The code for Chinese characters in
your page does not look like unicode. Your code is &# plus 5 digits.
I remember that unicde for Chinese characters is &#x plus 4 digits, at
least I did that way with the unicode chart, and it works. What type
of code is it in your page?
The Unicode charts use hexadecimal representation. The Web page is using the corresponding decimal representation, which, for whatever reason, is what's usually found on the Web. The unified CJK character subset starts at 4E00 (hex), which is 19968 (dec).
.
- References:
- Prev by Date: Re: Chinese character & pinyin frequency analysis
- Next by Date: Re: as <adjective> as constructions
- Previous by thread: Re: Chinese character & pinyin frequency analysis
- Next by thread: Re: Chinese character & pinyin frequency analysis
- Index(es):
Relevant Pages
|
Loading