Wednesday, March 2, 2011

Using the Internet Archive Website: Books - Post 1

I have struggled over the years to use the FREE Internet Archive website (http://www.archive.org/) effectively.  It seems like almost every time I use it, my Internet Explorer browser locks up.  I have no idea why!

I have also been frustrated by the search limitations on the site, and have finally figured out "my way" of doing it.  If others have better methods, I would love to hear them.

Here is the home page of the site:


I was looking for a specific book about Samuel Sewall of Massachusetts, so I put "samuel sewall" (in quotes) in the small search box at the top of the page.  After about one minute, I got this screen:



It told me that "The search engine encountered the following error:  search engine returned invalid information or was unresponsive. We are working to resolve this issue.  Thanks for your patience."

I was still impatient, so I turned to my trusty work-around.  I searched the website in Google, using the string: ["samuel sewall" site: http://www.archive.org/] (without the brackets).  Here were my results:


That's better - there are plenty of mentions of Samuel Sewall in the Archive site - and look what's number 1:  "Diary of Samuel Sewall, 1674-1729)."  I clicked on it and saw:



To someone inexperienced using this site, the result looks like gibberish.  The user can scroll down and find real text, but it can be confusing to someone trying the site the first time.  Why the gibberish?

The Internet Archive seems to always open with the full OCRed (Optical Character Recognition) text version.  The graphics on a scanned page often create unintelligible text, as shown above.  It seems like there are many misspelled words in the text, probably because of the OCR process of text in difficult to read fonts.

This is usually where my system locks up.  Perhaps it is because it takes a long time (sometimes a minute or more) to load this OCRed text page.  If I click on something without the page telling me that it's "Done," my system locks up.  I thought this problem would go away once I bought a new computer with more RAM.  But it hasn't gone away so far.

There is no effective search box for the site (that I can see) on this screen.  My work-around here is to go to the Edit menu on my browser, and click on "Find on this Page" and enter the text I'm searching for in the search box (see screen above).  The screen instantly goes to the first use of the search text:


If I want the next instance of the requested text, I can click on the "Next" button and the next instance will be highlighted. 

The Internet Archive has a wealth of resources available.  There are other ways to read these online books, and we'll look at some of them in a future post.

3 comments:

Lisa Wallen Logsdon said...

I get the blank screen using Chrome on that site also. I tend to access the way you did only by searching Google Books. Once I get the results, I click on the red bar to the left and choose "read online". Much better! As I remember it is the OCR hangup and computer freeze exactly like you are getting that made me quit Explorer. I don't have that problem with Chrome.

Daniel Dillman said...

Randy, Randy, Randy. How is it that an otherwise intelligent, technically savvy guy like yourself is still using Internet Explorer? Please, my good man, migrate to another browser, ANY browser (except AOL's abomination) would be a better choice! The Tech Support Guy in me wails in anguish anytime it sees someone using IE...

Howard Swain said...

I start at this URL:
http://www.archive.org/advancedsearch.php

I then entered diary in the title field and samuel sewall in the creator field. Hit Search (extreme lower right)

I clicked on the first link:
http://www.archive.org/details/diarysamuelsewa00sewagoog

In the box at the left, I clicked on Read Online and got a neat display where I could flip through the images of the actual pages (not OCRed).

(I'm using IE8)