Tools for analysing vocabulary within texts

Various online tools can tell you a thing or two about the vocabulary used in a particular text.  To provide a concrete example of how these work, I’m using a piece from the Economist on the legacy of the Bretton Woods Agreements.

Wordle clouds seem to be everywhere these days, and perhaps it’s not surprising when you consider how neat yet simple a tool Wordle is.  Copy and paste text and Wordle processes it, removing common function words, and counting the frequency of lexical items.  It then produces a visual “cloud” of words, each sized according to their frequency within the text.  Designs can be randomised or tweaked until they meet the user’s approval, after which they can be printed to PDF, saved and linked to, or embedded in a website, like this:

Wordle: bretton woods

As well as being very pretty, Wordle clouds have a nice application in class: students can be shown a cloud in order to predict content before reading, and if any prominent words are completely unknown, they can be checked and discussed prior to any sight of the original text.

Wordsift is another tool that “visualises” text.  It’s perhaps less aesthetically pleasing than Wordle but has slightly more “meat” behind it.  As well as creating frequency-based representations of any given text, Wordsift will also highlight items from various word lists, including broad subject areas.

Frequency only



wordsift social sciences

Social Science related terms highlighted


The AWL Highlighter searchestexts for items from Averil Coxhead’s Academic Wordlist and presents them either in a highlighted or gapped format.  Nothing spectacular, and there’s plenty of healthy skepticism out there over the extent to which Coxhead’s list is helpful (or at least as worthy of the attention it receives), but if students use the highlighter regularly they will see the same general academic vocabulary again and again in their reading.

AWL Highlighter entry
Paste text and select how many AWL sublists you want to cover…
bretton woods highlighted
…and in a few seconds you’ll get something like this. Note that the “highlighting” in bold isn’t always especially easy to see.

Physically, Tom Cobb’s Compleat Lexical Tutor is a nice reminder of what the Internet used to look like when everything was shit and ugly, but the tools within are clever and simple to use (even if the interface isn’t always the smoothest experience).  You can process texts for frequency-based word lists and generate concordances for target terms, and with the Vocabulary Profiler you can access stats on any given text including the range of vocabulary within it, and what percentage of this vocabulary occurs frequently in English and what percentage appears in Coxhead’s AWL.  EAP teachers and students need to take such stats with a pinch of salt, but the profiler does provide an interesting snapshot of the way learners are using lexis, and should yield signs of vocabulary development over time.

vocab profiler stats
Text by numbers

vocab profiler coloured text

Lexis as colour


For more directly from the horse’s mouth (i.e. Cobb) you might like to read Cobb (2010) or Morris and Cobb (2004).