The British National Corpus, developed by ESRC CASS centre researchers, contributed to the Oxford Advanced Learner’s Dictionary worldwide sales of 35 million copies and language tests for more than 600,000 students annually.


  • The British National Corpus (BNC), developed by researchers at the ESRC Centre for Corpus Approaches to Social Sciences in collaboration with industrial and academic partners, has been used to create the Oxford Advanced Learner's Dictionary, selling over 35 million copies worldwide.
  • The BNC was used by Cambridge University Press to update its English language learning materials. These prepare students for exams that attract up to 3.5 million entries annually.
  • The research team collaborated with Trinity College London to create more effective English language tests, which are taken by over 600,000 students in more than 60 countries each year.
  • Other work with Trinity College London contributed directly to key tests used by the government, such as the Home Office Secure Language Test for immigrants.
  • Development of a semantic coding tool to detect attitudes in texts has been used by (among others) the Home Office to study the language of violent extremists, and Canadian police to detect adults involved in online grooming of young people.

We're using this (the British National Corpus 2014 project) to gain evidence-based insights into how spoken English is really used in naturalistic situations, allowing us to ensure language courses provide authentic spoken English to learners. (Ben Knight, Director of Language Research and Consultancy, Cambridge University Press)

About the research

English language teaching contributes over £2.5 billion to the UK economy annually, according to estimates by the Department for Business, Innovation and Skills. Since 1970, researchers at the ESRC Centre for Corpus Approaches to Social Sciences (CASS) have pioneered major advances in this field.

CASS, based at Lancaster University, provides insights into the use and manipulation of language in society in a variety of areas, and gathers large collections of everyday written or spoken language use, known as corpora. The researchers have developed many widely-adopted techniques and the methodology of corpus linguistics – the computer-assisted study of language.

ESRC funding of CASS researchers has, in particular, been important in the creation of the British National Corpus (BNC), a 100-million word corpus of modern British English, and its successor, the up-to-date BNC 2014. It is shared with educational and commercial users, along with software developed through CASS research. This software automatically adds parts-of-speech, giving users grammatical information when studying linguistic data.