Food for thought
Food for thought
Since microfilm was invented, scholars have dreamed of the eventuality of accessing the entire world of print on demand. Although microfilm itself did not quite live up to that expectation, online material and interlibrary loan have made resources much more readily available.
Technology is now up to the task. Google has been working on their online book collection for 10 years but it is mired in legal issues. Harvard University’s Digital Public Library of America (DPLA) is trying to work through the same challenges.
As Nicholas Carr describes in The Library of Utopia,
“…the major problem with constructing a universal library nowadays has little to do with technology. It’s the thorny tangle of legal, commercial, and political issues that surrounds the publishing business. Internet or not, the world may still not be ready for the library of utopia. ”
Who knows where this will all lead? Can we figure out a system that supports creators, producers and readers alike? The extensive article linked above discusses the issues involved at length.
To be honest, I did not read it in it’s entirety. For now, I have to continue to put books in the hands of readers and take advantage of online resources as they exist now. but it is interesting to ponder the challenges of change.
Maybe one day I will bore my grandchildren with a lecture on the controversies that once surrounded resources that they take for granted.
Or will I be commiserating with them about the eventual challenges of finding a book to take on a long camping trip where there is no electricity to charge their reader?
Have you seen Google’s Ngram Viewer? It is an amazing application that generates a graph for any number of terms occurring in a “corpus of books” over selected years, one of which is described here:
The “Google Million”. All are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2000 than 1980). Books with low OCR quality were removed, and serials were removed. ~ About Google Books
It is important to remember when interpreting an Ngram chart, that the data is derived only from the number of times a word is used in the indexed books, with no indication of their context (unless your search is well refined). It is significant to the results that other cultural phenomenon are not factored in. Increasingly, the relatively current explosion of web content replacing some types of print publishing would make any broad conclusions erroneous. It might be interesting to see web content factored in as well, however that would give no truer a result since early such cultural discussions would have been verbal and in un-indexed letters.
Google has ‘normalized’ by the number of books published each year but as with any statistical analysis, questions must be carefully formulated with regard to the data source(s) used and conclusions drawn with discretion.
That said, the viewer can be a lot of fun. It can be useful, for example, in determining the usage of particular words over time. Consider this comparison between the occurrences of the word diminutive (which I love) and tiny. (I guess I’m a little out of date.)
The table below the graph will take you to a search of Google Books for those included in the data set.
For success in using the tool, you will want to read the viewer’s About page. There you will also see a link to Culturomics by Researchers at Harvard University’s Cultural Observatory, which is a must if you want to use the Ngrams for scholarly research.
Whether or not you have yet seen and used the Ngram Viewer, I recommend setting aside 15 minutes to watch the Ted Talk called “What We Learned From 5 Million Books”. (Embeded below.) Erez Lieberman Aiden and Jean-Baptiste Michel entertainingly discuss some of the many insightful and curious applications of this amazing tool. They explain how it came about and what some of the implications are from its data. There are some interesting examples and good discussion here.