Thursday, March 29, 2007

Thinkiing about FamilySearch Indexing

Many genea-bloggers have discussed FamilySearch Indexing as a wonderful thing, and it is and will be. The indexing, when completed with available digital images, will be a boon to genealogy research, especially for records that are not currently online in any form - such as deeds, probate records, town records, etc.
However, I'm wondering just how much horsepower we are going to need to access the images, and how accurate the index will be. The reason for my wonderment is because of my experience today at the Family History Center.

I went to the FHC in order to get copies of several probate records from Westerly, Rhode Island for Christopher Champlin (died 1732) and John Kenyon (died 1732). I had ordered the film a month ago. When I loaded the film on the microfilm viewer, it was very difficult to read many of the handwritten pages. There was a lot of bleed-through from the other side of the pages, the handwriting was cramped on some pages, and some of the pages were very dark and had blotches on them.

Frankly, I don't see how anybody could reliably index those pages (there were about 1,000 pages on this film). I figured that if I had the patience, I could probably read a typical page on this film and obtain names for an index in an hour. From my experience, many films of records from the colonial era are like this.

The second problem I see is the file size of the digital images that will be provided by FamilySearch. I don't know if they are using the method I use to capture digital images from microfilm, but I imagine that they are using something similar.

At my FHC (and many others), I can view the microfilm image on a microfilm reader/scanner, then press a button on the adjacent computer to scan it. The image shows up on the computer screen, and I can adjust brightness and contrast (I usually go to 100% on both), and save it on the computer system. Then it's on to the next image I want to capture, and I perform the same process. When I've captured all the images I want to acquire, I save all of the images to the computer hard drive in a directory. Then I insert my 512 mb flash drive and copy the images in separate files from the computer to my flash drive. When I get home, I copy the image files from my flash drive to my computer, rename them appropriately and print them out for abstraction or transcription. This sounds like a long process, but it is actually pretty easy once you find the rhythm to it.

After I had surveyed the film to find the images I wanted to capture, it took me about 45 minutes to find the images on the microfilm, capture and adjust the images, and save them. Today, I copied 24 images to my flash drive. All of that cost me $1.00 at the FHC (plus the $6.20 to rent the film).

However, the second problem I see with the Indexing project is the size of the image. For an 8.5" x 11" TIFF image, each image I captured today was 12 to 14 mb in file size. Perhaps they will save them as JPEG or another file format that result in smaller file sizes. Files of that size will likely take some time to load on any computer, especially one with a slow modem rate. Even on a cable modem, loading may take many seconds. Along with the file size problem comes the storage space required to save all of the images. There were about 1,000 pages on the one microfilm I reviewed today - so that means 12 to 14 gigabytes for just this one microfilm. And there are 2.5 million microfilms...we're talking about more than petabytes here.

Maybe I'm over-analyzing this problem, and I'm sure that FamilySearch has thought about this. I guess I'm expressing my concern so that users like you and me don't get their expectations too high too soon concerning access to these primary records from original sources. It may be that they will save these types of images for the end of the project...I wish they would do them first!


Becky said...

Randy - the issues you bring up with the FamilySearch indexing project are definitely valid. However, they not only apply to this project but to virtually any other indexing project - at least the ones I've been involved with. That's why an index should simply be a starting point and shouldn't be used like it is the data, which I'm sure that some people do. I must admit I've been guilty ot that myself but usually because the original records are no longer available and were never microfilmed.

As far as the size of the images, the ones I've worked with on the Indiana Marriages project have been quite reasonable, but I do have a DSL connection. Regarding the quality, some images were quite good but others were very poor. Part of the poorness could have to do with the quality of the originals also. In some cases it's a "best guess" for indexing. There are some checks and balances in place since 2 people index each record and a 3rd person reconciles each entry if there is a difference in indexing.

Also, I've worked with some original marriage records that were nearly impossible to decipher because of the clerk's handwriting.

If you haven't volunteered yet, give it a try. Every little bit helps. Sorry, didn't mean to write a book on the subject...

GarysTurn said...

There have been 2 projects completed within the last month or so from the FamilySearch Indexing project. One is the State of Utah Death Certificates from 1905-1954 and the other project is Nova Scotia Canada Vital Records. Both of these projects are now online and you can see how easily the indexes work and how long it takes to open or save the original image. I down loaded my Great Grandfathers death certificate, it was 447k and was in a .jpg format, it only took a few seconds with DSL. There are links to both projects at: