Laura Miller at Salon has written a fascinating and troubling article on Google Books. Miller interviews UC Berkeley professor Geoffrey Nunberg who wrote earlier about finding "endemic" errors in Google Books in an article for the Chronicle of Higher Education. Miller describes a few of the errors that Nunberg found:
A search for books published before 1950 and containing the word "Internet" turned up the unlikely bounty of 527 results.
Other errors include misattributed authors -- Sigmund Freud is listed as a co-author of a book on the Mosaic Web browser and Henry James is credited with writing "Madame Bovary." Even more puzzling are the many subject misclassifications: an edition of "Moby Dick" categorized under "Computers," and "Jane Eyre" as "Antiques and Collectibles" ("Madame Bovary" got that label, too).
Metadata is the crucial information about a book that is included in a bibliographic record (title, author, publication date, etc.). We librarians know that errors in bibliographic information in any database can make it impossible for users to know that a document or item is available. For example, if an online catalog record misspells words in the book's title, the user may not be able to retrieve that book's bibliographic record when she searches for it using the title's correct spelling in the online catalog. Google Books is often hailed by academics as the best thing since sliced bread, but librarians know that if the metadata is faulty, retrieving information from a database can be a frustrating endeavor. It's great to see this discussion about Google Books happening, and I hope that Google continues to work to improve its metadata (for a detailed response to Nunberg from Google's Jon Orwant describing how Google is trying to address these issues, see Nunberg's original blog post on the metadata problem).