Thursday, August 7, 2008

Searching for really rare surnames on Ancestry boards

As part of my O research project (I'm using letters for privacy reasons), I went searching on the Rootsweb/Ancestry message boards today. I input the O******* name (well, with the right letters, of course) in the keyword field (basic search) on the Message Board page and found that I got over 19 million matches. Oh no! It will take me a day or two to read all 19 million messages!

Hey - this is a really rare surname. What happened? I tested it again using a more common name - like Smith, and got over 500,000 matches, which seemed about right for all message boards. I tried a made-up surname - "Zxcvb" to see what would happen. Same thing - there were 19, 798,495 messages (as of today!) for that keyword. The screen shot below shows the count:

This looks like a problem with the Search algorithm - if there are no matches, it returns all messages in the database. Using the word "the" as a keyword returned 12,761,805 messages. The word "family" appears on 6,662,137 messages. The word "geneology" appears in 57,918 messages, and "genealogy" in 389,866 messages. I guess this means that over 13% of all message board users can't spell the word correctly!

However, if I choose the Advanced Search option, and use the Surname entry box (and not the Keyword box), I get 0 matches for "O******" and even for "Zxcvb!"

When using the message board Advanced Search, you can use a wild card in the Surname box, choose a Soundex option, but you have to click on the Search button in order to get results (i.e., you cannot just hit your Return key).

The lesson here is to not take search results at face value, and to use the Advanced Search options on the Rootsweb/Ancestry message boards.

Unknown said...

I too found 19M matches for an uncommon location that day - tho I could not see it mentioned in any of the messages... Thought it just a glitch that day and have not had a chance to go back and check.