Advanced Search





Article Archives Search

Archives

  • April, 2013
  • March, 2013
  • February, 2013
  • January, 2013
  • December, 2012
  • November, 2012
  • select

AE Monthly

AE Articles

 
Google Ngrams:  What Words are Most Often Found in Books?

- By Michael Stillman

Ngrams graph popularity of terms "flapper" and "hippie."

Leave it to the folks at Google to come up with another amazing new tool for us to use. I'm not yet sure of the practical uses for it, but it is something that will fascinate lovers of books and history for hours. It's called "Ngrams," and its existence relies upon the massive book-scanning project on which Google embarked in 2004.

 

At this point, Google Books' database contains scanned copies of some 15 million books. From this, Google has selected 5.2 million books, containing 500 billion words, for its Ngram word search. However, Ngrams does not simply match words. What it does is to determine how many books employ those words. They do not just provide a total, but place the matches on a chronological map. That way you can see how frequently a word has been used at various times. You can track the development, or antiquating of words or phrases by seeing how frequently they appear in books.

 

You can plot these graphs for single words or up to five words in combination. You can plot just one word or phrase, or several of them on the same graph to show a comparison. For example, the graph on this page is a comparison of the popularity of the words "flapper" and "hippie." "Flapper" reaches its peak in popularity in the 1920s, then tumbles, along with everything else, in the years of the Great Depression. Its use is then fairly constant over the past 70 years.

 

"Hippie," on the other hand, is a nonexistent word until the early 1960s. By the middle of that decade, it becomes more common in books than "flapper," a position it never relinquishes. It peaks in use around 1970, before settling down to regular, but less frequent usage.

 

Then there are the name changes. Compare "Hawaii" with the "Sandwich Islands." In the early days, the British gave the island chain the name "Sandwich Islands" in honor of the Earl of Peanut Butter and Jelly. That name starts showing up in the late 18th century, but "Hawaii" does not appear in the graph until the 1820s. It then slowly closes the gap, finally surpassing "Sandwich Islands" around 1890. Since then, the Sandwiches have slowly disappeared, while "Hawaii" became overwhelmingly more common.

 

This only applies to books in English, perhaps unsurprising because others may not have recognized the claims of the Earl's homeland. Books in French and German (you can sort these and several other languages separately) never showed much popularity for the "Sandwich" name, Hawaii becoming more common as early as the 1820s.

 

Another such example can be seen in the Turkish capital. For centuries it was known as "Constantinople." In 1930, the Turks changed its name to "Istanbul." There is no "Istanbul" in the books prior to this date, but within a couple of years, it quickly surpasses centuries-old "Constantinople."

 

Some times names get reused. "Engelbert Humperdinck" first appears near the turn of the 20th century as the music of the German composer gained popularity. His name peaked in the 1920s and then began to decline. However, in the 1960s, the name starts bouncing back up after the English crooner adopted the old composer's funny-sounding name as his own.

 

A similar pattern can be found for "Benjamin Harrison." Harrison was the name of a signer of the Declaration of Independence, while his grandson of the same name became President in 1884. Harrison has two peaks on the graph, a century apart.

Google Ngrams:  What Words are Most Often Found in Books?

- By Michael Stillman

The rise and fall of Trujillo City.

Some names are sufficiently unique not to suffer reappearance issues. "Ku Klux Klan" first appears in the 1860s, and has since been in regular use. "Elvis Presley" does not appear until the 1950s, but it has been up, up and away ever since. Occasionally, a word may have a temporary vogue. It appears then disappears. In 1930, when Turkey was renaming its capital city, Dominican Republic dictator Rafael Trujillo was doing the same with his. He modestly renamed the Dominican capital of Santo Domingo - "Trujillo City." When Trujillo was overthrown and assassinated in 1961, his countrymen quickly restored the capital to its historic name. The graph on this page shows the rise and fall of "Trujillo City," at least as it appeared in books. It looks the same as the career of Trujillo himself. It should be noted that even in its heyday, "Trujillo City" was never as common in books as "Santo Domingo."

 

Then there is the case of old usage. Back in the day, the letter "s" was commonly written "f." Since Google Books is based on scans, all of those old "s's" show up as "f's." So, the word "stuff" frequently was written "ftuff." Ngrams shows regular appearances of "ftuff" from the early days to a peak in the late 1700s. Then it drops precipitously, virtually disappearing by 1820.

 

We should note that there are some similarities and differences between this and the "Get Keywords" feature available to subscribers of this site's AE Bibliographic Database. Rather than looking at all records for the frequency of appearance of selected keywords, "Get Keywords" allows you to select certain bibliographic records to determine which words appear most frequently within them. Its specific purpose is to show collectors what related words are logical to search for online when looking for material related to their collection, and to inform booksellers as to what keywords should appear in their online listings for a particular title. We are not sure what specific purpose Ngrams can be used for, and it was not designed with a particular purpose in mind. Nevertheless, we expect specific practical uses will evolve, and until then, Ngrams is both informative and fun to use. Try it out!

 

Google's Ngrams can be found at the following link: http://ngrams.googlelabs.com.