Statistics

This page will contain details on my learning statistics (flashcards primarily and perhaps number of hours devoted to each area of study if I get my shit together and begin recording it) as well as other statistics relevant to the study of Chinese.

Jump to:

DeFrancis’ Chinese Readers

Thanks to ablindwatchmaker’s wordlists and imron’s (of chinese-forum.com fame) Chinese Text Analyzer, I was able to determine the actual number of characters and words in Beginning Chinese Reader (BCR).

In BCR Part 1 and 2 there are:

  • 1430 unique words (or 1363 by wibr‘s wordlist)
  • 411 characters (or 400 exactly by wibr’s wordlist)

This is concordant with DeFrancis’ statement that book 1 teaches 400 characters and 1250 character combinations (aka disyllabic and trisyllabic words).

Intermediate Chinese Reader

  • 2822 unique words (thus, 3840 cumulative — 1020 of BCR words appear in ICR)
  • 764 characters (800 cumulative — nearly all of the characters in BCR make an appearance in ICR!)

Advanced Chinese Reader

  • 3540 unique words (6858 cumulative — is this right? If these words are high yield, then this is an impressive figure. I think a lot of single characters are getting interpreted as words and this has inflated the word count. Thus, 3318 of BCR and ICR’s words (3840 together) make it into ACR.
  • 1094 characters (1201 cumulative)
Unique Characters Unique Words
Beginning Chinese Reader 400 1363
Intermediate Chinese Reader 765 2822
BCR + ICR 801 3837
Shared BCR + ICR 364 348
Advanced Chinese Reader 1094 3542
BCR + ACR 1117 4595
 Shared BCR + ACR  377  310
ICR + ACR 1194 5961
 Shared ICR + ACR  665  403
Total (BCR + ICR + ACR) 1201 6858
Shared (BCR + ICR + ACR) 1058 869

Mandarin Companion

alanmd’s blog is indespensible for wordlists for the Mandarin Companion series: http://hskhsk.pythonanywhere.com/mandcomp

Integrated Chinese

Based on several wordlists I have found, the entire series appears to have approximately 2500 words and 1500 characters.

I quickly scanned the dictionary of the first text. It appears to have around 400 entries (each text promises to “introduce 350 essential vocabulary words”). Thus, 1600 – 2500 words seems to be a good estimate.

Glossika

I recently did a quick analysis of Glossika’s Fluency 1 – 3. For details on how I arrived at these numbers as well as a possible explanation for the discrepancies between these numbers, wibr’s below, and Glossika’s own, visit this page.

Unique Simplified Words Unique Simplified Characters Unique Traditional Words Unique Traditional Characters
Fluency 1 1076 807 1048 822
Fluency 2 1457 1010 1339 978
Fluency 3 2241 1371 2189 1379
Total 2984 1579 2872 1593

From wibr‘s analysis of all Glossika Chinese material, it appears Basic 1 – 3 has around 2000 words and 1650 characters. The entire collection has a bit over 3000 words and around 2100 characters. This may underestimate the number of words because the user compared words in Glossika to a word list from the Test of Proficiency (TOP, now TOCFL), Taiwan’s Chinese test for Chinese as a second language. Therefore, any word not found in TOP was not included in the statistics.

TOP words:  7339
TOP characters:  2552

Basic 1
Characters per sentence:  11.384
English words per sentence:  7.268
Unique characters (book):  850
Unique characters (cumulative):  850
TOP characters % (book):  0.29310344827586204
TOP characters % (cumulative):  0.29310344827586204
TOP words % (book):  0.12563019484943452
TOP words % (cumulative):  0.12563019484943452

Basic 2
Characters per sentence:  15.786
English words per sentence:  9.847
Unique characters (book):  1035
Unique characters (cumulative):  1194
TOP characters % (book):  0.359717868338558
TOP characters % (cumulative):  0.40752351097178685
TOP words % (book):  0.16323749829677067
TOP words % (cumulative):  0.19103420084480174

Basic 3
Characters per sentence:  17.207
English words per sentence:  11.763
Unique characters (book):  1403
Unique characters (cumulative):  1649
TOP characters % (book):  0.5035266457680251
TOP characters % (cumulative):  0.5681818181818182
TOP words % (book):  0.2651587409728846
TOP words % (cumulative):  0.30862515329063905

Daily Life
Characters per sentence:  8.157
English words per sentence:  5.785
Unique characters (book):  1088
Unique characters (cumulative):  1859
TOP characters % (book):  0.3957680250783699
TOP characters % (cumulative):  0.6281347962382445
TOP words % (book):  0.18122359994549667
TOP words % (cumulative):  0.35522550756233817

Travel
Characters per sentence:  9.556
English words per sentence:  6.545
Unique characters (book):  865
Unique characters (cumulative):  1922
TOP characters % (book):  0.3103448275862069
TOP characters % (cumulative):  0.6442006269592476
TOP words % (book):  0.14770404687287098
TOP words % (cumulative):  0.37511922605259573

Business Intro
Characters per sentence:  9.341
English words per sentence:  6.061
Unique characters (book):  681
Unique characters (cumulative):  1961
TOP characters % (book):  0.2445141065830721
TOP characters % (cumulative):  0.6543887147335423
TOP words % (book):  0.11173184357541899
TOP words % (cumulative):  0.38465731026025346

Business 1
Characters per sentence:  14.396
English words per sentence:  8.913
Unique characters (book):  1032
Unique characters (cumulative):  2090
TOP characters % (book):  0.3832288401253918
TOP characters % (cumulative):  0.695141065830721
TOP words % (book):  0.2106554026434119
TOP words % (cumulative):  0.44106826543125766

Average Hanzi per English word:  1.522665502058842
Average Hanzi per English character:  0.2969744652211724

Harry Potter

When I came to lessons 9 and 10 of BCR1, I finally had to read several pages of Chinese continuously. It was a very cool feeling, even if the passages were a bit unnatural. I asked myself, “how long will it take for me to be able to read a novel my 9- or 10-year-old self would enjoy?” The answer: a long time.

Harry Potter and the Philosopher’s Stone has 2600 unique characters and 7700 unique words. By the time you get to Goblet of Fire, there are nearly 3200 unique characters and over 12000 unique words. In contrast, the English version has a mere 26 unique characters and 5700 unique words.

Clearly, you need to be at the high intermediate or low advanced level to read these. I am happy to see that it appears the books start off at a lower level and then gradually become more advanced. Anyway, it doesn’t appear that I will be reading any young adult or child fiction in the near future.