Wednesday, April 23, 2008

Social Security Death Index data

I love to poke around in databases trying to figure things out. I was curious about the Social Security Death Index as to how many deaths per year are in the over 80 million person database.

For my study, I chose the SSDI at Rootsweb - now at http://ssdi.rootsweb.ancestry.com/ The latest version there was last updated on 22 February 2008, and has 81,074,156 records.

I expected that the earliest deaths in this database would begin in the early 1960's but I was wrong. Here are the numbers for a variety of years:

* 1950: 18,540
* 1955: 44,430
* 1960: 90,468
* 1962: 300,606
* 1965: 747,494

* 1970: 1,261,339
* 1975: 1,641,032
* 1980: 1,851,578
* 1985: 1.888,674
* 1990: 1,856,077

* 1995: 2,142,198
* 2000: 2,229,845
* 2005: 2,279,129

Surname SEAVER: 1,255
Surname RICHMOND: 11,813
Surname CARRINGER: 223
Surname AUBLE: 261
Surname SMITH: 806,744

What's really interesting is that there are quite a few entries for people listed with a death year before 1937. For instance:

* 1930: 67
* 1920: 103
* 1910: 26
* 1900: 88

I believe that almost all of these are errors in data entry. Or is there another explanation of how people who died before 1936 have a Social Security Number and their death was reported to the SSA?

The earliest death date for a person that I found was 2 October 1899 for Ruth M. Riggs, she received her SS card in Illinois, and her last address was in Las Vegas NV. Interestingly, her birth was also listed as 2 October 1899. I'm guessing that Ruth was born in 1899 and died in Las Vegas on some unknown date and the death date in the database is incorrect. It might throw off a researcher looking for Ruth, eh?

My point here is that errors occur in all databases due to data entry misteaks - the question is how many errors are made? One way to check this SSDI error rate is to assume that data errors are random based on typing errors. I checked out SMITH and obvious spelling variations:

* SMITH: 806,744
* SMIHT: 2
* SIMTH: 6
* SMTIH: 120
* SMTH: 4
* MSITH: 3

That's only a 0.017% error rate - 1 in 5,976 on a fairly easy name to double-check. That's pretty good on the name. We have no idea about the birth date and death date, however.

I have another reason to list these numbers here - and that is to see if older death records are added to the SSDI on a regular basis. Every so often, I will check the most recent SSDI to see if the numbers have changed much.

Dick Eastman recently posted a summary describing the SSDI at http://blog.eogn.com/eastmans_online_genealogy/2008/04/using-the-socia.html. It's very helpful. Dick also posted some information about the recent newspaper article about fraudulent use of Social Security numbers at http://blog.eogn.com/eastmans_online_genealogy/2008/04/commentary-abou.html. Read the comments for both articles too.

2 comments:

Scrogginsdata said...

I've often wondered what percentage of the entries have the first and last names transposed.

Anonymous said...

Randy,

Good article. A lot of people assume that if their ancestor is not in the SSDI, then they didn't have a SS#. But that's not true...a lot of my ancestors registered right away around 1937 - if they died in the 30s, 40s, or 50s, they may not be in the index. They might be, but many are not. Hopefully folks can find the SS# on their death certs to find their SS-5 application.

Donna