Methodology

How the Test Works

A two-phase adaptive methodology designed to measure vocabulary size accurately — without letting overclaiming inflate your score.

The Two-Phase Approach

Most vocabulary tests ask you to answer definition questions directly. This approach has a fundamental flaw: test-takers can guess, and many people tend to overclaim — marking words as known when they only have a vague familiarity. Our test addresses this with a two-phase design.

i.
Recognize
You see a list of words and mark the ones you know. Decoy words — plausible-sounding but non-existent terms — are mixed in. Your overclaim rate on decoys is used to adjust your final score downward accordingly.
ii.
Define
For words you marked as known, you choose the correct definition from four options. Questions get harder when you answer correctly and easier when you do not, converging on your true vocabulary ceiling.

Overclaim Detection

The recognition phase includes a controlled proportion of fake words — terms that look and sound like real English words but have no meaning (e.g., crepulent, nargitate). If you mark these as words you know, the test records an overclaim.

Your final vocabulary estimate is adjusted based on your overclaim rate. If you marked 10% of fake words as known, the model assumes you also overclaimed on approximately 10% of real words — and reduces your estimate accordingly. This keeps scores honest even for test-takers who are inclined to be generous with themselves.

Adaptive Difficulty

The definition phase uses a simple adaptive algorithm. Each question is drawn from a difficulty band (1–5 for native speakers; 1–6 for English learners). A correct answer raises the difficulty of the next question; an incorrect answer lowers it. After a warm-up phase, the algorithm locks in on the band where you are answering correctly roughly 50–70% of the time — your vocabulary frontier.

This means the test is efficient: it does not spend time on words far below or far above your level. Most users reach a stable estimate within 20–30 definition questions.

Calibration — Native Speaker Track

The native speaker track is calibrated against the Brysbaert & Keuleers (2016) lexical decision database, which contains word frequency and familiarity ratings for over 60,000 English words collected from more than 220,000 participants. Word difficulty levels in our test are assigned based on frequency rank and familiarity ratings from this corpus.

Vocabulary estimates are expressed in word families — the standard unit in vocabulary research. A word family includes a base word and its regular inflections and derivations (e.g., run / runs / ran / running / runner = one family).

Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant's age. Frontiers in Psychology, 7, 1116.

CEFR Alignment — English Learner Track

The English learner track is aligned with the Common European Framework of Reference for Languages (CEFR). Word difficulty levels correspond to CEFR band vocabulary lists:

CEFR placement is determined by the difficulty band where your performance stabilises. A user who consistently answers Level 4 words correctly but struggles with Level 5 words would receive a B2 placement.

Accuracy and Limitations

The test has an estimated margin of error of ±3% under honest test conditions. Accuracy depends on:

  1. Honest answering — the overclaim correction assumes a consistent response pattern. Deliberate random answering will produce unreliable results.
  2. Sample size — the test uses a random sample from each difficulty band. Results are statistical estimates, not exact counts.
  3. Word family definition — vocabulary estimates vary depending on how broadly "knowing a word" is defined. Our estimates use a conservative receptive-vocabulary definition.

For most purposes, the ±3% margin means that a score of 27,000 words should be interpreted as "somewhere between 26,000 and 28,000 words" — a useful and accurate estimate, though not a precise inventory.

Word Bank

The native speaker word bank contains words across five difficulty levels, from high-frequency everyday vocabulary through SAT and GRE-level academic words to rare literary and archaic terms. The learner word bank covers A1 through C2 CEFR vocabulary with a focus on receptive knowledge of word meaning.

Both banks are reviewed periodically to ensure difficulty calibration remains accurate. Decoy words are constructed to match the phonological and morphological patterns of real English words to maximise their plausibility.

Ready to find out your score?

Free · 8 minutes · No sign-up required

Take the vocabulary test →