Re: Chinese character & pinyin frequency analysis
- From: "dylanwhsung@xxxxxxxxxxxxxx" <dylanwhsung@xxxxxxxxxxxxxx>
- Date: Wed, 10 Oct 2007 11:30:51 -0700
On 3 Oct, 02:33, d...@xxxxxxxxxxxx wrote:
I've put together another Chinese character frequency list, but this
one has a bit of a twist.
The usual 100,000 Chinese web pages were analyzed, and unique strings
at least 50 characters long were analyzed.
The results were fed into NJStar to attempt translation into pinyin.
The characters (with their pinyin) were then analyzed.
Results here:
http://readmandarin.com/research.htm
Any feedback is most appreciated.
Thanks!
Daniel
A number of years ago it was discussed on sci.lang, and I cited
DeFrancis' Chinese readers, who looked at Chen Heqin's study from the
1928
http://groups.google.co.uk/group/soc.culture.china/msg/2b701de71b169222?dmode=source
The frequency table shows some similar results.
All Chinese Reader 1200 Characters
Accounting for 93.7% of all the 4719 surveyed
characters in journals and newspapers about
900 000 characters.
Based on frequency of use. Chen Heqin,
graduate of Teacher's college New York.
Commisioner of Education Shanghai.
Yu3ti3wen2 Ying4yong4 zi4hui4
Characters used in
Vernacular Literature Shanghai 1928,
400 most freq char = 73.1% of total text
least used half = 2.5%
Num of order of % of Cumul %
diff frequency total of total
char text text
400 1 - 400 73.1 73.1
400 401 - 800 12.4 85.5
400 801 - 1200 5.8 91.3
400 1201 - 1600 3.3 94.6
400 1601 - 2000 1.9 96.5
400 2001 - 2400 1.0 97.5
2319 2401 - 4719 2.5 100.0
It would be interesting if you can compare the difference in character
usage 80 years ago and today, if you can a list of Chen's characters.
Dyl.
.
- References:
- Prev by Date: Re: as <adjective> as constructions
- Next by Date: Re: Chinese character & pinyin frequency analysis
- Previous by thread: Re: Chinese character & pinyin frequency analysis
- Next by thread: Re: Chinese character & pinyin frequency analysis
- Index(es):
Relevant Pages
|
Loading