Monday, July 16, 2012

1940 U.S. Census Index Comparisons - Post 1: Methodology

Ancestry.com and FamilySearch.org are both indexing the 1940 United States Census, and both are on track to complete the indexing by September 2012.  That is a phenomenal achievement, and way before most of us thought that it would be completed.

So the question becomes:  Which index is better? 

The only way to determine that is to test both indexes for consistency and accuracy.  I've started doing it on a small scale, and I hope that other researchers will do it also, since I'm not going to do anything close to 132 million names!  Maybe a few hundred... We can probably get a decent statistical sample by doing several thousand in total.

Here is the methodology I'm using:

1)  Select a state that is indexed on both Ancestry.com and FamilySearch.org.

2)  Select a surname to search exactly.  I recommend that it be a surname that is not high on the frequency of surname list.  Fortunately, my main surname lines are uncommon.

3)  Search on Ancestry.com for the surname (exact spelling) in the state of choice.  Ancestry.com lists the matches by birth year rather than first name or county or ED.

4)  Make a table with two columns, one for Ancestry, the other for FamilySearch.  Write down the names 1 to N, along with birth year and birth state.  You might make a table in your word processor or in a spreadsheet program.

5)  Search on FamilySearch.org for the surname (exact spelling) in the state of choice.  FamilySearch.org seems to list the matches by county and ED rather than by birth year or first name.

6)  Start at the top of the FamilySearch match list, note the name and birth year of the persons, and   add a check mark or note in the FamilySearch column on the table that the person is on the FamilySearch matches.  I number the matches from both sites.  In one case, #1 on Ancestry was #20 on FamilySearch because of the different match listing methods used.

7)  If an entry on the FamilySearch match list is not on the Ancestry list, add that information to the table.  

8)  At this point, we now have a list that shows the names, birth date, birthplace of persons on both lists, and persons that are only on the Ancestry list, and persons that are only on the FamilySearch list.  Group those persons together as families if possible.


9)  Make a second table that includes only the families that were missed on either Ancestry or FamilySearch.  The columns might include an index number, the family names, the ED and page numbers, the Ancestry indexed name, the FamilySearch indexed name, and what you think the name is.

10)  For those only on the Ancestry.com list, note the Enumeration District (ED)  number and Page number for those persons, and use the ED and page number to find them on FamilySearch to ensure the page image is present and see if you can figure out how it might have been indexed on FamilySearch.  Do this same thing for those only on the FamilySearch.org list.

11) The next step is the most difficult - trying to find out how the persons that were indexed on Ancestry but not on FamilySearch were indexed on FamilySearch.  And vice versa - how were the persons indexed on FamilySearch indexed on Ancestry.com.

12)  Report your findings.  Report on how many were the same in both indexes, and what you found were the differences for the names not on both lists.  Which index seemed the most correct, in your opinion?  Write your own blog posts or add comments to this post.  Only by making comparisons like this can we answer the question above.  

We all know that Ancestry.com has a way to add changes to their index.  It's not clear to me if FamilySearch.org will have that the capability to add changes to their index.

I will show the results of my first effort to do these comparisons in my next post.  

The URL for this post is:  http://www.geneamusings.com/2012/07/1940-us-census-index-comparisons-post-1.html

Copyright (c) 2012, Randall J. Seaver

6 comments:

Sonja Hunter said...

That's a great idea! I'll definitely try that with some of my names. I recently was looking for someone in Michigan on Ancestry.com and saw that the wife's first name was indexed as "Simmons." Looking at the record, it clearly said Suzanne (and I didn't spend more than a few seconds looking at it). I don't know if Ancestry.com uses a double indexing/arbitrator system like FamilySearch does. I suppose time will tell.

Annette Kapple said...

I found a couple names incorrectly indexed at ancestry.com. I have not searched much at Family Search. The one family member I looked for at Family Search was correctly indexed. There may be quite a few problems with the Ancestry index?

Scott Phillips said...

I have found the Ancestry.com index to be super so far! No complaints from me. WAY easier than using enumeration districts, for sure! I am simply thankful for all those volunteers who do the work.

GeneGinny said...

Randy--I ran a similar comparison to yours for QUIGGs in Washington State. You can see my results at http://geneginny.blogspot.com/

Wendy said...

I am a volunteer for the 1940 US Census Community Project with FamilySearch. Whenever I'm stumped, I look for that person in both Ancestry and FamilySearch to see how they might have been indexed in 1930. It's not surprising that quite often the results are different. I'm following your lead to see how my family was indexed. Here is my first one: http://jollettetc.blogspot.com/2012/07/1940-us-census-index-comparison-who.html

EddieB said...

I enjoyed the exercise. I compared the Smilie entries for California.

I found Ancestry had 19 people indexed with the last name of Smilie and Family Search had 21. One of the Ancestry entries did not meet the parameters of the search. He was a Smilie living in Ohio and born in Texas; no one in the househould had any California connection.

So Ancestry really only had 18 names in California and Family Search had 21. Family Search had all of the Ancestry entries. Of the 18 they had in common two had slight variations that Family Search had correct.

There were three names in Family Search that Ancestry missed because they were incorrectly indexed.

My comparison had:
0 errors for Family Search
6 errors for Ancestry including the search engine error for the Smilie in OH born TX