All Books and All Knowledge: Coming Soon to Your Computer
- By Michael Stillman
A Google search for "books about" Gerrit Smith reveals three books (next to books image) where his name appears.
By Michael Stillman
It had to happen. It may have suddenly dawned on you one day, like a light bulb above a cartoon's character's head, when you were searching the web. Someday, all knowledge, all information, every book will be available to you right there in your home via the internet. That day may yet be off on the horizon, but it now seems to be just a matter of time.
This may be a slight exaggeration, and we are certainly not there yet, but for those whose specialty is books and the knowledge they possess, an announcement last month took us one giant step closer to that day.
From my perspective, it looks like there are two 800-pound gorillas battling to determine who will provide (and control?) the world's knowledge. Those combatants go by the names of Microsoft and Google. Score one for Google. If there is one entity standing in the way of Microsoft controlling the world (again a slight exaggeration), it is Google. Today, Google has thrown open the door to the vast world of information locked behind the covers of books, and none of us, you, me, libraries, educational institutions, will ever be the same. However, as we will see, this flood of information may be too great for even the likes of Google and Microsoft to control. It's time to take a look inside.
Google threw open the door with its announcement of the expansion of "Google Print." Google Print is the process by which Google is making the vast amount of knowledge contained within books available to the public. "Google's mission," they state, "is to organize the world's information and make it universally accessible and useful." You don't get much more ambitious than that. Here is what they're doing specifically with regard to books. They are creating digital copies of the full text of vast collections of books and making them available to you online. If they could, they would probably offer every book in the world to you free of charge. However, copyright laws prevent that, so we need to break their offerings down between those which are copyrighted and those which are not (primarily old material whose copyrights have expired).
|
All Books and All Knowledge: Coming Soon to Your Computer
- By Michael Stillman
A targeted Google search finds an OCLC (worldcatlibraries.com) listing.
For those of us deeply involved with old books, the expired copyright material is the most interesting. This material can be freely copied and made available in full to anyone. That is just what Google, with the assistance of some prestigious libraries, is doing. The old books are currently coming from the university libraries of Oxford, Stanford, Harvard and Michigan, and the New York Public Library. Each has vast collections, parts of which are being made available for scanning. Google is underwriting much of the cost of this process. It will then make these listings available through its search engine. In fact, some already are and the number will be regularly increasing, though that number is as yet very small. A Google search for an old book may now not only find information about that book, or copies for sale, but the entire text of that book available to be read online. For more common books, where reprints are available for sale or copies may be found in numerous libraries, this is still a great convenience. For obscure books, perhaps only held at a handful of major libraries, it means these texts, previously only available to a few privileged scholars in the right location, will now be available to everyone.
For Google, this is an interesting expansion. Previously, they were strictly in the business of helping you find information. As long as someone else posted the information online, they would help you find it. Now Google has moved into the business of also providing information. This is a major step for them, and a plus for us searchers, provided, of course, it's all incremental, that they don't turn around and limit the amount of material posted by others.
Just how many books will be scanned and digitized at this point has not been stated. According to Harvard University's website, they will initially make 40,000 out of 15 million volumes available to Google. The University of Michigan does not appear to have put such limitations on the project, and Google is already busy scanning their books. However, scanning is apparently a hand process, so, at least using their current technology, we will not see millions of books posted online for a while yet. It is relatively slow, but it is a start.
The second group of books are those still under copyright. Google cannot post them as the literary rights are still privately held. This should ultimately prevent absolutely everything from becoming available online. However, Google is making some of these books at least partially available. For those books where the publisher offers permission, Google will search their text for matches to your search terms. However, Google will not make the full text available. It will make the page on which your search terms appear, and two pages forward and two pages back, available. If you want more, you will have to buy the book. Google will offer ads to booksellers on these pages, and will split the revenue they earn when someone clicks on one of these ads with the publishers.
|
All Books and All Knowledge: Coming Soon to Your Computer
- By Michael Stillman
none
This concept is not exactly new. In fact, it looks remarkably like Amazon's "Search Inside the Book." Amazon offers this word search, displaying the page on which the term appears plus two pages forward and two pages back, on its site and through its A9 search engine. They were first, but as Microsoft has proven, being first doesn't always matter.
To see the earliest stages of what Google is doing, go to their website at www.google.com. Type in a subject of interest preceded by "books about" (without the quotation marks). At the top of the page, you will get up to three results from more recent (copyrighted) books. For example, I typed in "books about gerrit smith" and received three matches for books which mentioned this obscure abolitionist who knew Frederick Douglass and John Brown. When you click these links, it takes you directly to the page on which Smith's name appears, and lets you see the two preceding and succeeding pages. Since these are copyrighted works, that's all you get to see, not the whole text.
While the addition of full texts to its searchable database has gathered the most attention, there is another book-related project going on at Google. They have been adding OCLC listings to the material they search as well. OCLC (Online Computer Library Center) is a cooperative database of card catalogue listings from thousands of libraries. It's a way to find out in which libraries a book you seek is located.
This is also a work in progress, but some books are now showing up. However, finding these listings points to a weakness in the process. Google searches frequently return thousands if not millions of matches. Unless a match shows up in the first ten or twenty, it is unlikely to be seen. OCLC matches don't seem to show up very high. With a few exceptions, I found I had to search especially for them. OCLC matches show up with the URL (web address) www.worldcatlibraries.com, so if you type this in, along with the book or author, you have a better chance of finding the listing. To see an example of what you can find, go to Google and search the following: "innocents abroad worldcatlibraries" (without the quotation marks). This will provide a link to their site, and if you click it, it will take you to a form where you can enter your zip code to find the nearest library with a copy of Mark Twain's "Innocents Abroad." In my case, it located three libraries, and told me how far away they are (3, 5, and 30 miles).
|
All Books and All Knowledge: Coming Soon to Your Computer
- By Michael Stillman
none
At the beginning we mentioned that the tide of information may be too great for the large companies to control. Google is clearly going to be a major force in this latest information revolution, and I cannot help but believe Microsoft will find a way to join in, but there are others. Project Gutenberg has been in the process of posting complete books on the internet for quite awhile now. If you have never checked them out, here's the link: http://promo.net/pg/. It went virtually unnoticed next to Google's announcement, but the Internet Archive also recently announced a digitization program. The Internet Archive is a nonprofit organization that provides, among other items, the "Wayback Machine," which allows you to look at websites the way they were at various times in the past. If you think that correcting egregious errors on your website will once and for all do away with the embarrassment, take a look at the "Wayback Machine" for a sobering dose of reality.
The Internet Archive will be hosting texts as part of a cooperative effort with ten libraries and organizations from five countries. They state that 27,000 volumes are currently available, 50,000 more will be posted in the first quarter of 2005, and they have commitments for one million books. Among the libraries and organizations taking part are Carnegie Mellon University in the U.S., the Library of Congress American Memory Project, the Universities of Toronto, Ottawa, and McMaster in Canada, plus organizations in Egypt, India, China, and the Netherlands. As with Google and Gutenberg, the texts are free to the public.
So where is all of this leading us? The answer is "somewhere." No one really knows what changes this will make to our institutions or in our lives. Who could have imagined the impact of the internet on our lives, our schools, our businesses, just a dozen years ago? Who would have imagined the impact television would have on our reading habits half a century ago, or the impact of radio a century ago? We do not yet know what the impact will be of this latest flood of information coming from millions of books, old and new, in the years ahead. All we can say is that the effect will be substantial.
What happens to libraries as more of the information hidden deep within their vaults becomes available to the public from the comfort of their homes? What will be the effect on those who sell old books when those books can be read from your home computer screen whenever you like for free? Obviously, these institutions and businesses will need to evolve to meet the needs of a changing reality. Some will thrive, despite the seemingly negative implications for printed material. The internet itself was filled with negative implications for those who deal in books, and yet out of it emerged Amazon.com and the cooperative bookselling sites. The world is again changing, and from the realms of booksellers, collectors, libraries and other institutions, there will be winners and losers. The winners will be those who watch the changes closely and find ways to better serve the public in the emerging new world.
|