Re: Chinese character & pinyin frequency analysis



On 3 Oct, 02:33, d...@xxxxxxxxxxxx wrote:
I've put together another Chinese character frequency list, but this
one has a bit of a twist.

The usual 100,000 Chinese web pages were analyzed, and unique strings
at least 50 characters long were analyzed.

The results were fed into NJStar to attempt translation into pinyin.

The characters (with their pinyin) were then analyzed.

Results here:

http://readmandarin.com/research.htm

Any feedback is most appreciated.

Thanks!

Daniel

A number of years ago it was discussed on sci.lang, and I cited
DeFrancis' Chinese readers, who looked at Chen Heqin's study from the
1928

http://groups.google.co.uk/group/soc.culture.china/msg/2b701de71b169222?dmode=source

The frequency table shows some similar results.

All Chinese Reader 1200 Characters
Accounting for 93.7% of all the 4719 surveyed
characters in journals and newspapers about
900 000 characters.


Based on frequency of use. Chen Heqin,
graduate of Teacher's college New York.
Commisioner of Education Shanghai.


Yu3ti3wen2 Ying4yong4 zi4hui4
Characters used in
Vernacular Literature Shanghai 1928,
400 most freq char = 73.1% of total text
least used half = 2.5%


Num of order of % of Cumul %
diff frequency total of total
char text text
400 1 - 400 73.1 73.1
400 401 - 800 12.4 85.5
400 801 - 1200 5.8 91.3
400 1201 - 1600 3.3 94.6
400 1601 - 2000 1.9 96.5
400 2001 - 2400 1.0 97.5
2319 2401 - 4719 2.5 100.0

It would be interesting if you can compare the difference in character
usage 80 years ago and today, if you can a list of Chen's characters.

Dyl.


.



Relevant Pages

  • Review: The Promise (2006)
    ... Chinese martial art fantasy is an unique genre that is very popular in ... Chinese filmmaker, Kaige Chen, is an improbable mess. ... talent of a slave, Kunlun, the General acquires Kunlun as his ... never allowed enough time to stop and savor the characters or the ...
    (rec.arts.movies.reviews)
  • Millions of Chinese forced to change their names
    ... name in China, shared by nearly 17 million people. ... as many Chinese do. ... the roughly 55,000 Chinese characters, according to a 2006 government ... her identity card last August, she said, Beijing public security ...
    (soc.culture.baltics)
  • Re: The origins of writing
    ... > "One major difference between Chinese concepts of language and Western ... > characters are inscriptions on oracle bones, ... Cantonese, they are likewise pronounced jing, if we ignore the tone. ...
    (sci.lang)
  • Re: Word count of minimum vocabulary
    ... >> Nor do I have to map a Japanese Kanji to a word when I read ... tie it up with any words -- whether Chinese words or Japanese ones. ... A logograph can be used ideographically, and an ideograph can be used ... do not the characters represent chinese ...
    (sci.lang)
  • Re: Whats happening in SCS?
    ... mention the British Nazi Party's leading intellectuals. ... There is actually no such language as Chinese, ... This is not necesssarily a barrier, as Chinese characters are mutually ...
    (soc.culture.scottish)

Loading