Tag Archives: graphs

Cultural Study Takes a Grand Leap with Google Books Ngram Viewer

Have you seen Google’s Ngram Viewer? It is an amazing application that generates a graph for any number of terms occurring in a “corpus of books” over selected years, one of which is described here:

The “Google Million”. All are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2000 than 1980). Books with low OCR quality were removed, and serials were removed. ~ About Google Books

It is important to remember when interpreting an Ngram chart, that the data is derived only from the number of times a word is used in the indexed books, with no indication of their context (unless your search is well refined). It is significant to the results that other cultural phenomenon are not factored in. Increasingly, the relatively current explosion of web content replacing some types of print publishing would make any broad conclusions erroneous. It might be interesting to see web content factored in as well, however that would give no truer a result since early such cultural discussions would have been verbal and in un-indexed letters.

Google has ‘normalized’ by the number of books published each year but as with any statistical analysis, questions must be carefully formulated with regard to the data source(s) used and conclusions drawn with discretion.

That said, the viewer can be a lot of fun. It can be useful, for example,  in determining the usage of particular words over time. Consider this comparison between the occurrences of the word diminutive (which I love) and tiny. (I guess I’m a little out of date.)

The table below the graph will take you to a search of Google Books for those included in the data set.

For success in using the tool, you will want to read the viewer’s About page. There you will also see a link to Culturomics by Researchers at Harvard University’s Cultural Observatory, which is a must if you want to use the Ngrams for scholarly research.

Whether or not you have yet seen and used the Ngram Viewer, I recommend setting aside 15 minutes to watch the Ted Talk called “What We Learned From 5 Million Books”. (Embeded below.)  Erez Lieberman Aiden and Jean-Baptiste Michel entertainingly discuss some of the many insightful and curious applications of this amazing tool. They explain how it came about and what some of the implications are from its data. There are some interesting examples and good discussion here.

1 Comment

Filed under Online Resources