Saturday, July 1, 2006

Census Errors - some examples



In the past four years, I've completed a one-name study for Seaver in the 1850 to 1930 (minus 1890 of course) census records. The first part of the project was easy - use the indices for the common surname variations (SEAVER(s), SEVER(S)) and collect them.

From that info input to my database, I was able to identify the families with "missing census years." So I went searching for the specific families in the "missing census years," using all of the search techniques in the books, articles, lists, etc. that I could find. I was able to find about 50% of the missing families. Nearly all of them had enumerator or index misspellings ranging from "understandable" to "how'd that happen?" Nearly all were due to poor handwriting, poor indexing, entries too light or dark, etc.

The more common SEAVER surname misspellings were LEAVER, SCAVER, SEOVER, SEANER, SEARER, SEAVEN, etc., and variations of them. There were only a few entries that had the name flat wrong in the index compared to the actual name on the image.

Not all names are like Seaver, which had about a 5% error rate (misspelled or misread relative to the total occurrence in the index) from what I could tell from my study. But it is not a rare name, just uncommon, with a lot of lower case letters without ascenders or descenders. Many names are more easily read, and some names are much more difficult.

Many names are easily recognizable even with poor handwriting. The HQO head of household index in 1900 for SMITH (292140 hits) has about a 0.2% error rate - check on SIMTH (80), SMTIH (313), SMTH (46), MSITH (34), SMIH (43), SMITHS (87), SMITHE (84), SMIHT (0), etc. These are likely just indexing typo errors, but the point is that even with a known and common surname, there are a finite number of unavoidable typing errors. 0.2% is one of every 500 entries. I found a typo rate of about 0.3% today on the 1920 census on Ancestry using the same process.

As you can tell, I've been spending way too much time on this...

2 comments:

Hydrocodone said...

O3CM4J The best blog you have!

Eileen said...

This is an interesting study. In my first census experience, I was looking for my ggrandfather, Bonaventura Bianchi. I finally found the family enumerated as surname Vantura and given name Biancqi. I knwe where they were located and read page after page of the images until I found it. A more recent experience was finally finding my ggreatgrandfather, Thomas Noble in the 1900 Census. I thought this was such a simple name and two easy to mess up. Wrong! I found it using some of those advanced search techniques you spoke about listed as Thomas Koeble. I heartily agree with you that errors can be introduced at any point in the process and dumping blame on just one aspect is unfair. The other point in the process you left out is the searcher. I noticed that as I got more experienced in searching, I had better results.