Wednesday, May 31, 2006

1990 Census Data - Frequency of Names

Richard Eastman had an entry in his blog about the Frequency of Names in the 1990 US Census here. There are several interesting comments also by other researchers.

The surname, and given name, data was obtained from the Census Bureau site here. Click on the "Enter the Names File" and then select either the surname file (dist.all.last), the female first name file (dist.female.first), or the male first name file (dist.male.first) from the list. Note that the surname file is 2 megabytes...

One caution: if you read the documentation file back on the first page, you will find out that the data is from the undercount survey that totalled about 6.3 million people that provided a name. The name frequency is based on those 6.3 million entries not the 290 million persons counted in the census. Consequently, some rare surnames don't show up at all, or show up at the end of the list, depending on how many people with the rare name were included in the survey. For instance, I know that there are living people with the surname Carringer, Dalseth and Pilgram, but they are not found on the list.

Smith is the most common surname (about 1% of all Americans). The last name on the list is #88,799 - Aalderink. Howver, ties are done in reverse alphabetical order - Zysett is #75,677, and all the surnames in between Aalderink and Zysett have the same number of occurrences in the database. A 0.001% frequnecy is about 200 in the surname database, or about 3,000 in the total population. Even so, these 88,799 surnames cover only slightly more than 90% of the total entries in the database. My guess is that there may be another 50,000 surnames in the country, or more, that weren't listed.

Because of the nature of the survey - the number of surnames found in the survey over-sampled African-Americans and Hispanics according to the documentation page, so the results are probably skewed a bit. Read the whole documentation discussion here if you're interested.

My opinion is that the first 2,500 or so rankings are probably pretty accurate (that is about a 0.005% frequency - one in 315 of those sampled). Amazingly, 0.005% of the whole population is approximately 14,500 people.

My SEAVER surname is #8,167 with a 0.001% frequency. My given name RANDY is #78 in the male names with a 0.232% frequency (RANDALL is #139, RANDELL is #825) out of 1,219 given names listed.

No comments: