Saturday, February 28, 2009

Ancestry.com Indexing Quirks - 1860 Birthplaces

Any researcher who has been searching the census records online for any period of time has heard the stories about how "people in India" have indexed some of the http://www.ancestry.com/ census records, and when they saw a birthplace of "Ken" indexed it as "Kenya," and when they saw a birthplace of "Ind" they indexed as "India."

I was trying to capture 1860 United States census images today for my database and ancestral files, and I just couldn't find my supposed 3rd-great-grandfather, Ranslow Smith (1805-1875) in the 1860 census in Dodge County, Wisconsin. I had tried searching for him with parameters of (Old Search, Exact Matches):

* Given Name = Rans*
* Last Name = Smith
* Residence = Wisconsin, Dodge
* Birthplace = New York
* Birthdate = 1805 +/- 2 years

I got nothing. I took away the given name, and got a few Smith's but nothing like Ranslow. I tried the same with his son, Devier Smith born 1839 in New York, and got nothing. Hmmm. 'Tis a puzzlement! I'm sure that they are there because I have them transcribed from microfilm in my notebook. Let's see, page 745 in Oak Grove township. I finally went to Oak Grove township, Dodge County, Wisconsin and found page 745:


There they are - very readable. But wait, the Birthplace is listed as "St NY" - State of New York. Not just "NY," or "New York."

The indexer transcribed it exactly as "St NY" and it has remained that way for all these years. There are a total of 453 other persons all indexed with a birthplace of "St NY" in Oak Grove township (and none in any other county in Wisconsin), and a total of 709 in the entire 1860 US census. If I had specified "NY" in the birthplace parameter, the search would have found my Smith family in Oak Grove. Who knew? Oh, I couldn't find Ranslow Smith using just his name because he is indexed as "Rauslow Smith." The indexer misread the "n" for a "u" - an easy letter substitution!

The count in the 1860 US census with New York birthplaces is:

* New York = 3,388,119
* N York = 310
* St NY = 709
* NY = 1319
* State New York = 10

There are small numbers of birthplaces with other state abbreviations also - Mass = 125, and Vt = 114 (that's all I checked).

For the "odd" New York listings, they are only 1,639 out of 3,389,758 total, or about 0.05% of the listings. That is an error rate of one out of about 2,000. My Smith family just happened to be one of the "odd" ones.

What about Kenya and India in the 1860 US Census?

Kenya = 1,592
* India = 9,023.

Ancestry.com said that they have planned to spend significant time "cleaning up" existing databases like the US census records. I hope they get to this one sooner rather than later.

Does anyone else have Ancestry.com Indexing quirks and errors to complain about? Someone should build a web site about them so that users can consult it occasionally.

7 comments:

Amy (We Tree) said...

I have several ancestors whose birthplace is listed as Nova Scotia. The census page says US, but the fancy cursive U looks like an N, so it was indexed as NS.

kbea831 said...

Just this week I had difficulty locating information in the Census using Ancestry.com but I'm not sure why I couldn't find what I was looking for. Here's my post of my experience and how I did have success in finding what I was looking for.

Lynn said...

As a transcriber for Familysearch with about 250,000 transcriptions under my belt, I can tell you that we are to transcribe what we see, so therefore your StNY would still read that way, but Ky would be Ky and Ind would be Ind.

Also, that means that you have to allow for creative spelling of names!! It might be nice to have Elizabeth spelled the same way each time, but if it clearly Elisebath, then that is what is transcribed. I have seen some very unusual spellings of common names, including in one southern state where I finally figured out by reading it out loud -- the transcriber was writing with a southern accent! Now that page was clearly written, but it will be a creative task for finding someone on it. The difficulty in transcribing is that if you don't transcribe what was written, someone is sure to come back and tell you that the spelling on the census is what their ancestor used and not the usual spelling -- you just can't win! And you're right about the n/u issue, in fact many letters are difficult to decide between in some handwriting --- a/o, e/i, L/S, t/l, W/M, and so on. My advice is to try all the combinations that you can imagine in handwriting.

Familysearch reduces the errors because each page is seen by 3 independent transcribers before it's posted, but even that will not stop the problems as some pages are just plain difficult to read.

Cindy said...

Randy I am so glad you went here! Not only are there problems with the place names, but in general with the consistancy of the data entry. For instance my McCann family is entered as McCann and Mc Cann. This space in the name obviously causes some of them not to show up when searching. Any data analyst knows that if junk goes in, then junk comes out - reporting cannot be relied upon when the data isn't consistantly entered. Transcription errors I can live with on occasion, but blantent things like this just frustrate the user and make you wonder if any person ever actually looks at it. This also brings to mind the submission of corrections/variations of names, which everyone should do, but I'm convinced that they aren't veiwed with eyes, only processed and added to the data which again causes problems. Data clean up sounds like great fun! When can I start?

Lynn said...

Oops, Randy, I do have to go back on what I said about the transcriptions of place names in Familysearch. Some are "what you see", and others do have the instruction to complete the abbreviation shown if you know what it is - so there is room for error of Ken to Kenya instead of Kentucky. Darn! I can't imagine someone making that mistake on a US census, but it could (and obviously does) happen.

Searchers also have to understand that some census takers did pages of names with first initials only, and it's also common to see consonants doubled where you might not expect it (I just transcribed an Errik Errikson and family, but I bet the double-r Errik is not usually searched for.) Isaac is another one I've seen as Issac or Issak - in the original writing.

Geolover said...

One particular peeve of mine is in the 1850 US Census, where "Ia" is a frequent abbreviation for "Indiana". Yet Ancestry indexers nearly always spell it all the way out as "Iowa". Since Iowa was just beginning to be settled by non-indigenes in 1850, this would provide a very strange stat for someone working only with Ancestry's indexes.

Your idea for a "corrective" web site is a very good one. Ancestry staff refused to create a message board on its site for database corrections.

One of the same kind should exist for FamilySearch databases. FamilySearch does not even have a way for users to submit corrections, as Ancestry does regarding names. There are many baffling errors in the FamilySearch indexes.

TGN has stated that it will develop an interface for user corrections for items other than names, but nothing was said about when.

Anonymous said...

I have a dilly of an example, and have posted a comment to Ancestry. The image is Range 2 E, Johnson, Illinois; Roll: M653_190; Page: 147; Image: 148
My comment was: David Richardson on this page is indexed as born in Kenya; the enumerator clearly wrote Ke. Neighbors born in Ke are transcribed as Kentucky. Quality of transcribers is not improving. Suggest Ancestry review 19th century African history for plausibility of entries.