Wednesday, August 4, 2010

Working in the Archive.org Text Archives

...
When I Google the names of an ancestral couple to see if there are other researchers, or resources, available for the couple, the http://www.archive.org/ website often appears.

When I click on the link, the text version of the document appears. Because the web page opened at the top of the document, I had to use the Search box to find what I wanted in the text created by Optical Character Recognition (OCR). Often, my Internet browser would lock up and I would have to close down and restart Internet Explorer. This lead to much frustration on my part.

I found a better way to read and search documents on the http://www.archive.org/ site.

Here is the web page after I Googled ["daniel spangler" "elizabeth king"] - the text version of the book Genealogical Records of George Small appeared:





Notice the red button in the upper left-hand corner - it says "See other formats." When I click on that button, I see:


Here I have choices to "Read Online," "PDF," "PDF (B/W)," "EPUB," "Kindle," "Daisy," "Full text," and "DIVu." The PDF files are pretty large. There is also information about the book and its digitization.

If I choose to read it online, I see:


Here are the images of the book pages. In the right-hand column, I can enter a search term. I entered "spangler" in the search field on the screen above, and saw:


There are links to each search match in the right-hand column. I clicked on one of the matches and saw:



The search term is highlighted on the page. I could search each of the search matches or modify my search term.

Turning the page is easy - you click on the page and you see the page turned to the next page. The user can zoom in or out, can choose one page or two to display, etc., using icons up in the menu bar.

The user can print individual pages using a mouse right-click and Print Picture. Or the user can save the entire book by downloading the PDF file or the Full Text file.

For some books, the text on the book pages can be highlighted with the mouse and copied and pasted into another computer file. The result is usually better than the OCR text file. However, for the George Small book above, I could not highlight, copy and paste the text.

I'm very happy that my frustration with the OCR version on http://www.archive.org/ is relieved and that I can use the http://www.archive.org/ efficiently and effectively.

While Google Books has many online surname and locality books, they don't have all of them. The http://www.archive.org/ and BYU Family History Archive should be consulted for published books about our ancestral families.

1 comment:

Nina said...

I recently came across a website on OCR technology, containing all kinds of information about OCR software, news about companies and developers. Have you heard of it? www.ocrworld.com