How Many of the Top 5,000 Do You Know?
Take our free adaptive test — it samples across the full frequency range — and see where in the distribution your vocabulary sits.
Take the free test →Contents
- Why a handful of words covers most of English
- The top 100 most common English words
- What the top 1,000 cover
- The 2,000-word threshold
- The top 5,000 and the 95% threshold
- Words, lemmas, and word families
- How to use frequency lists effectively
- When frequency lists stop helping
- Frequently asked questions
Why a Handful of Words Covers Most of English
English word frequency follows a strikingly uneven distribution called Zipf's law: a small number of words appear extremely often, while the vast majority of words appear rarely. The most common word, the, accounts for roughly 5–7% of all words in typical English text by itself. The top 10 words together cover around 25%. The first 100 words cover around 50% of everyday English.
This is why frequency lists are so powerful for learners. The first thousand words you learn pay off enormously: each new word in the high-frequency band gives you many more text exposures than a word selected at random. The thousandth most common word is encountered roughly 50 times more often than the ten-thousandth.
Coverage gains slow rapidly after the first few thousand words. Going from 1,000 to 2,000 words adds roughly 7 percentage points of speech coverage. Going from 5,000 to 6,000 adds less than 1 point. By 10,000, additional words barely move overall coverage at all, though they make a large difference for specialised text.
What this means practically. For beginners, deliberately working through the top 1,000–2,000 words is one of the highest-return investments in language study. For intermediate learners and above, contextual learning through reading and listening becomes more efficient than working frequency lists.
The Top 100 Most Common English Words
The following are the 100 most common English words by frequency in contemporary corpora, drawn from large-scale resources such as the Corpus of Contemporary American English (COCA). The exact ranking varies slightly between corpora, but the items themselves and their approximate order are remarkably stable.
A few observations on this list. Almost all of these are function words — articles, pronouns, prepositions, conjunctions, auxiliary verbs — that hold sentences together rather than carrying specific meaning. Only a handful of true content words appear in the first 100: say, time, people, year, work, way, day. The content words you learn beyond the first 100 are where vocabulary really starts to expand what you can express.
What the Top 1,000 Cover
The first 1,000 most common English word families cover approximately:
- 80% of spoken English — everyday conversation
- 75% of written English — general non-fiction and journalism
- 70% of academic text — denser register, more specialised vocabulary
The first 1,000 corresponds approximately to CEFR A1 level (lower end) through A2 (upper end). A learner who has reliably mastered these 1,000 words can understand the gist of most everyday speech and recognise the structure of most written sentences, even when many content words are unknown.
The cost-benefit ratio of learning these words is unmatched. A learner can realistically work through the top 1,000 in 6–12 months of consistent study, and the result is access to roughly 80% of all speech encountered.
The 2,000-Word Threshold
The first 2,000 word families is the threshold at which most researchers say a learner can begin to function in everyday English. Coverage at this level reaches:
- ~87% of spoken English
- ~82% of general written text
- ~78% of academic text
The 2,000-word list corresponds roughly to CEFR A2 to lower B1. It is also the basis for the classic General Service List (West 1953) and many modern adaptations such as the New General Service List. At this level a learner can hold simple conversations, follow simple news articles with effort, and use simple writing for everyday purposes.
However, 87% coverage is still not enough for comfortable reading. At this level the learner encounters an unknown word roughly every 8 words — frequent enough that comprehension and enjoyment of authentic text remain difficult.
The Top 5,000 and the 95% Threshold
Nation's research established that comfortable reading of authentic text requires roughly 95–98% word coverage. Below 95%, unknown words appear too frequently to maintain comprehension. Reaching the 95% threshold for general English text takes approximately the first 5,000 word families.
| Word families known | Speech coverage | Written-text coverage | CEFR equivalent |
|---|---|---|---|
| 1,000 | ~80% | ~75% | A1–A2 |
| 2,000 | ~87% | ~82% | A2–B1 |
| 3,000 | ~90% | ~88% | B1 |
| 5,000 | ~95% | ~93% | B2 |
| 8,000 | ~97% | ~95% | B2–C1 |
| 9,000–10,000 | ~98% | ~97% | C1–C2 |
Sources: Nation (2006); coverage statistics from contemporary English corpora.
The 5,000-word level is sometimes called the threshold of functional literacy in English. At this level the learner can read most non-specialised journalism and contemporary fiction with occasional dictionary lookup, hold full conversations, and write effectively for general purposes.
Words, Lemmas, and Word Families
Frequency lists differ in what unit they count, and the differences are significant.
- Word tokens — every individual form is a separate item. run, runs, ran, running, runner, runners are six items.
- Lemmas — different parts of speech are separate items, but inflections collapse. run (verb, including runs, ran, running) is one item; runner (noun) is a separate item.
- Word families — base word plus all regularly inflected and derived forms. The entire family run, ran, running, runner, runners, rerun is one item.
A learner who knows the verb run does not need to learn runs, ran and running as separate items — they fall out of basic morphology. Word families therefore match learner reality most closely, which is why most contemporary lists (Nation's BNC/COCA list, the New General Service List, etc.) use word families as their counting unit.
When you see claims that "the first 3,000 English words cover 95% of speech", check what counts as a word. If it counts word tokens, the underlying number of items learned is much larger than the headline suggests.
How to Use Frequency Lists Effectively
Beginners — A1 to A2
Direct memorisation of the top 1,000–2,000 words is highly efficient. The coverage payoff per hour of study is enormous in this range. Spaced repetition systems (Anki and similar) paired with example sentences work well. Aim for recognition first; productive use will come with practice.
Lower-intermediate — B1
Continue with the top 3,000 list, but increasingly supplement with graded readers and simple authentic text. By the end of this stage, contextual exposure should be producing roughly as much vocabulary growth as deliberate study.
Upper-intermediate and advanced — B2 and above
Frequency-list study becomes a supporting rather than primary tool. The rate of return drops sharply: many words in the 5,000–10,000 frequency band are encountered rarely enough that contextual learning through reading and listening is more efficient than rote memorisation. Specialised lists (academic, technical, exam-specific) become more useful than general frequency lists at this stage.
Exam preparation
For exam-focused vocabulary, use the corresponding targeted list rather than a general frequency list. Our guides cover the relevant ranges: SAT, GRE, IELTS, TOEFL, and the Academic Word List.
When Frequency Lists Stop Helping
Two cautions matter especially for intermediate and advanced learners.
First, beyond 5,000 words, frequency is a much weaker predictor of usefulness for any individual learner. A learner working in medicine needs medical vocabulary that is rare in general corpora; a literature student needs literary vocabulary similarly under-represented. Frequency lists describe the average — but no learner is average.
Second, frequency lists do not teach collocations, register, or use. Knowing the word make means little until you know make a decision, make sense, make do, and the dozens of other expressions in which make is the central verb. This deeper word knowledge develops through encountering words in context, not through frequency-list study.
For these reasons, most vocabulary research recommends frequency lists as a beginner's tool and an intermediate's supplement, but not as the primary method beyond B1.
How Many of the Top 5,000 Do You Know?
Our free adaptive test samples across the full frequency range and estimates your vocabulary in 8 minutes — no signup required.
Take the free vocabulary test →Frequently Asked Questions
What are the most common English words?
The 10 most common English words in modern corpora are: the, be, to, of, and, a, in, that, have, I. These ten function words alone account for roughly 25% of everyday text. The exact ranking varies slightly between corpora but the top 10 are remarkably stable across studies.
How many of the most common words do I need to know?
The first 1,000 word families cover roughly 80% of speech and 75% of written English. The first 2,000 cover 87% of speech. The first 5,000 reach the 95% threshold required for comfortable reading. Beyond 9,000, coverage gains become marginal for general text.
What is the difference between a word and a word family?
A word family is a base word together with its regularly inflected and derived forms. Run, runs, ran, running, runner, runners count as one word family. Frequency lists typically count word families because this matches what a learner actually needs to know — once you know run, you do not need to learn the inflections separately.
Should I memorise frequency lists?
For the top 1,000–2,000 words, yes — the coverage payoff is enormous and there is no faster way to reach the level at which reading and listening become useful for further learning. Beyond 2,000, the rate of return drops sharply, and contextual learning becomes more efficient than rote memorisation.
Are the most common words the same in British and American English?
Yes — the top several thousand words are nearly identical between British and American English. Differences exist in specific items (petrol/gas, lift/elevator, biscuit/cookie) but they are small relative to the shared core. A learner using either corpus is well-served for both varieties.
How does this compare to other languages?
The Zipfian frequency distribution is universal across natural languages — a small handful of words always cover a disproportionate share of all text. The specific coverage thresholds (80% from top 1,000, 95% from top 5,000) are also broadly similar across major languages.
Related Reading
- How to Improve Your Vocabulary — methods backed by research
- CEFR Vocabulary Levels — what word counts correspond to A1 through C2
- Academic Word List — the 570 word families covering academic register
- Average Vocabulary Size — adult benchmarks by age and education
- Native Speaker Vocabulary — how the top 5,000 fits into the native lexicon