The Two-Phase Approach
Most vocabulary tests ask you to answer definition questions directly. This approach has a fundamental flaw: test-takers can guess, and many people tend to overclaim — marking words as known when they only have a vague familiarity. Our test addresses this with a two-phase design.
Overclaim Detection
The recognition phase includes a controlled proportion of fake words — terms that look and sound like real English words but have no meaning (e.g., crepulent, nargitate). If you mark these as words you know, the test records an overclaim.
Your final vocabulary estimate is adjusted based on your overclaim rate. If you marked 10% of fake words as known, the model assumes you also overclaimed on approximately 10% of real words — and reduces your estimate accordingly. This keeps scores honest even for test-takers who are inclined to be generous with themselves.
Bayesian Item Response Theory (IRT) & Adaptive Selection
The definition phase operates on a mathematically rigorous 2-Parameter Logistic (2PL) Item Response Theory (IRT) framework. Every word in the test bank is dynamically calibrated with two statistical parameters:
- Difficulty ($b_i$): The latent ability level ($\theta$) at which a test-taker has a 50% probability of answering the item correctly.
- Discrimination ($a_i$): The sensitivity of the item in distinguishing between test-takers whose abilities lie just above or below the word's difficulty level.
The probability $P_i(\theta)$ that a user with latent ability $\theta$ will answer a given word $i$ correctly is computed using the 2PL logistic formula:
$$P_i(\theta) = \frac{1}{1 + e^{-a_i(\theta - b_i)}}$$
Instead of hard difficulty steps, the test uses Fisher Information ($I_i(\theta)$) maximization to select the next question. It evaluates which remaining word in the bank will yield the highest mathematical information at your current estimated ability level $\theta$, calculated as:
$$I_i(\theta) = a_i^2 P_i(\theta)(1 - P_i(\theta))$$
After each answer, the system updates your latent ability score ($\theta$) using a **Bayesian Expected A Posteriori (EAP)** estimator, integrating your response history across an 81-point ability scale. The test dynamically halts as soon as the Standard Error of Measurement (SEM) drops below a highly precise threshold ($SEM \le 0.22$, representing approximately a 95% confidence interval of $\pm3\%$), or when a maximum cap of 20 questions is reached.
This makes the assessment extremely fast and efficient, matching the difficulty of the questions to your exact level in real-time.
Thematic Lexical Profile Breakdown
Upon completing the test, your vocabulary is not just represented as a single number, but also divided into a multidimensional Lexical Profile across five core educational domains:
- Academic: Vocabulary key to scientific, analytical, and scholarly environments (SAT, GRE, and academic publishing).
- Business: Lexicon of professional relations, negotiation, economics, and corporate communication.
- Conversational: Standard daily vocabulary, idioms, and core conversational speech.
- Literary: Narrative richness, historical phrasing, descriptive aesthetics, and rare creative prose.
- Collocations: Common phrasal verbs, idiomatic pairings, and natural native-like collocations.
Your mastery in each domain is visualized on an interactive **SVG Radar Chart** that evaluates your categorical performance relative to the global calibration benchmarks, highlighting your personal vocabulary strengths and weaknesses.
Calibration — Native Speaker Track
The native speaker track is calibrated against the Brysbaert & Keuleers (2016) lexical decision database, which contains word frequency and familiarity ratings for over 60,000 English words collected from more than 220,000 participants. Word difficulty levels in our test are assigned based on frequency rank and familiarity ratings from this corpus.
Vocabulary estimates are expressed in word families — the standard unit in vocabulary research. A word family includes a base word and its regular inflections and derivations (e.g., run / runs / ran / running / runner = one family).
CEFR Alignment — English Learner Track
The English learner track is aligned with the Common European Framework of Reference for Languages (CEFR). Word difficulty levels correspond to CEFR band vocabulary lists:
- Level 1 (A1) — ~500 most frequent word families
- Level 2 (A2) — 500–1,500 word families
- Level 3 (B1) — 1,500–3,000 word families
- Level 4 (B2) — 3,000–5,500 word families
- Level 5 (C1) — 5,500–8,500 word families
- Level 6 (C2) — 8,500–12,000 word families
CEFR placement is determined by the latent ability parameter ($\theta$) estimated during the test, representing the level at which your performance stabilizes under the 2PL IRT model. A user whose estimated ability $\theta$ corresponds to Level 4 difficulty (B2) will receive a B2 placement.
Accuracy and Limitations
The test has an estimated margin of error of ±3% under honest test conditions. Accuracy depends on:
- Honest answering — the overclaim correction assumes a consistent response pattern. Deliberate random answering will produce unreliable results.
- Bayesian measurement precision — the test halts dynamically once the Standard Error of Measurement (SEM) falls below $0.22$, which ensures the statistical reliability of the latent ability score ($\theta$).
- Word family definition — vocabulary estimates vary depending on how broadly "knowing a word" is defined. Our estimates use a conservative receptive-vocabulary definition.
For most purposes, the ±3% margin means that a score of 27,000 words should be interpreted as "somewhere between 26,000 and 28,000 words" — a useful and accurate estimate, though not a precise inventory.
Word Bank
The native speaker word bank contains words across five difficulty levels, from high-frequency everyday vocabulary through SAT and GRE-level academic words to rare literary and archaic terms. The learner word bank covers A1 through C2 CEFR vocabulary with a focus on receptive knowledge of word meaning.
Both banks are reviewed periodically to ensure difficulty calibration remains accurate. Decoy words are constructed to match the phonological and morphological patterns of real English words to maximise their plausibility.
Frequently Asked Questions
How accurate is the vocabulary test?
The test has an estimated margin of error of ±3% under honest test conditions. By employing a 2-Parameter Logistic (2PL) Item Response Theory (IRT) model and Bayesian EAP estimation, the test continues until the Standard Error of Measurement (SEM) is reduced to $\le 0.22$, guaranteeing robust statistical precision tailored to each individual's ability level.
What is the Brysbaert corpus and why is it used?
The Brysbaert & Keuleers (2016) lexical decision database contains word frequency and familiarity ratings for over 60,000 English words collected from more than 220,000 participants. It is the largest validated reference for English vocabulary size research, which is why our native speaker track is calibrated against it.
How does overclaim detection work?
The recognition phase includes decoy words — terms that look and sound like real English but have no meaning (e.g., crepulent, nargitate). If you mark these as known, the test records an overclaim.
How are CEFR levels assigned in the learner track?
CEFR placement is determined by the latent ability parameter ($\theta$) estimated during the test, representing the level at which your performance stabilizes under the 2PL IRT model. Level 1 corresponds to A1 (~500 word families), Level 2 to A2, Level 3 to B1, Level 4 to B2, Level 5 to C1, and Level 6 to C2 (~8,500–12,000 word families).