Monday, April 30, 2007

Ancestry's wild card search has a small problem

Paul Graham, in a post on the APG mailing list, pointed out a problem with Ancestry's databases and their wild card searches. Followup messages confirmed that it wasn't just his problem.

If you are doing a wild card search on a surname with a given name, the search works if you use the first three letters of the surname, but not if you use 4 or more letters.

But it seems to happen only on certain surname letters. As an example (1930 census, exact search, all states, all ages):

* If I use Robert Tho*, I get 12,152 hits (all of them correct)
* If I use Robert Thom*, I get 10,623 (all correct)
* If I use Robert Thomp*, I get 5,217 (all correct)

However, if I was searching for the name that Paul used - Michael Wazoo*, I get:

* If I use Michael Waz*, I get 11 hits (all correct)
* If I use Michael Wazo*, I get 95,645 hits (not all correct! - it found some of the given name = Michael as if there was no surname at all).
* If I use Michael Wazoo*, I get 204,687 hits - even more than last time with a more restrictive search.
* If I restrict the search to one state, it finds some of the Michael's in that state.

I tried a lot of surnames, and could only get it to hiccup on relatively rare or non-existent surnames. It worked fine on Smit*, John*, Turn*, Jone*, Seav*, etc. But for Smyk*, Grze*, Brka*, Pryz* (but not Pryo*), and others with rare spellings, it returns many more hits for the first name than for the three letter surname with a wild card. It did the same thing for non-existent surnames like Abcd* and Zxcv* - it returned lots of hits with the first names (with 4 or more surname letters).

Paul found that it extended to the other databases also - the Family Trees, Vital Records and Public Records.

Thankfully, you have to work really hard to make it fail, but if that is the surname you are searching, you will have to limit yourself to three letters in the surname with the wild card option.

Hopefully, Ancestry is aware of this problem and will fix it. I wonder why it happens?

Thinking about it, the search capabilities in online databases and using search engines for the web or news or images is so wonderful and fast, we have come to expect perfection. Think about 10 years ago - we were scrolling microfilm without indexes at the FHC or Archives to find people in the census records. Now we can do it in our pajamas late on a Monday night to our hearts content.

No comments: