Tuesday, November 17, 2009

Checking on Ancestry.com's Name Authority

In my post yesterday about the Name Authority Dictionary on Ancestry.com, I wondered if it worked for "Exact Matches" and/or on "Old Search." The answers seem to be NO and NO. I'm not surprised.

I did find some puzzles though, to wit:

1) Given name = Catherine and the 841 other variations (I didn't test all 842 variations, only seven fairly common ones) in the 1900 US Census:

* Catherine:

*** Old Search, Exact Matches = 284,380
*** New Search, Exact Matches = 281,918
*** New Search, Ranked Matches = 2,745,706

* Kathryn:

*** Old Search, Exact Matches = 3,385
*** New Search, Exact Matches = 3,385

*** New Search, Ranked Matches = 1,115,944

* Cathleen:

*** Old Search, Exact Matches = 1,148
*** New Search, Exact Matches = 1,148
*** New Search, Ranked Matches = 1,910,214


* Kathleen:

*** Old Search, Exact Matches = 5,467
*** New Search, Exact Matches = 5,467
*** New Search, Ranked Matches = 270,780


* Kate:

*** Old Search, Exact Matches = 234,798
*** New Search, Exact Matches = 234,798
*** New Search, Ranked Matches = 496,364


* Cathy

*** Old Search, Exact Matches = 439
*** New Search, Exact Matches = 439
*** New Search, Ranked Matches = 2,745,706


* Cat* (wild card)

*** Old Search, Exact Matches = 442,886
*** New Search, Exact Matches = 442,886
*** New Search, Ranked Matches = 442,886


2) Elizabeth and the other 899 variations in the 1900 US Census:

* Elizabeth:

*** Old Search, Exact Matches = 670,966
*** New Search, Exact Matches = 666,296
*** New Search, Ranked Matches = 4,352,581


* Eliza

*** Old Search, Exact Matches = 231,961
*** New Search, Exact Matches = 231,961
*** New Search, Ranked Matches = 4,352,581

* Lizzie

*** Old Search, Exact Matches = 397,770
*** New Search, Exact Matches = 397,770
*** New Search, Ranked Matches = 2,460,809

* Elisabeth

*** Old Search, Exact Matches = 45,858
*** New Search, Exact Matches = 45,858
*** New Search, Ranked Matches = 3,289,276

* Betty

*** Old Search, Exact Matches = 11,048
*** New Search, Exact Matches = 11,048
*** New Search, Ranked Matches = 2,679,790

* Eliz* (wild card)

*** Old Search, Exact Matches = 938,331
*** New Search, Exact Matches = 938,331
*** New Search, Ranked Matches = 938,331

Okay, what does all of that mean? There are several interesting and puzzling facts there, including:

1) Old Search and New Search "Exact Matches" results match on 12 out of 14 trials - I really don't understand why they don't match on 14 out of 14. Why are there more matches for "Catherine" in "Old Search" than in "New Search" for "Exact Matches?" And why more for "Elizabeth" also? But not for the others? Isn't there one database, and one search algorithm?

2) Not all variants of Catherine/Kathryn/etc. have the same number of "Ranked Matches," and not all variants of Elizabeth/Elisabeth/etc. have the same number. I fully expected that they would if they are using a single Name Authority Dictionary for those names and variants. Only "Catherine" and "Cathy," and "Elizabeth" and "Eliza," return the same number of "Ranked Matches" of all of the names tried. Does that mean that they are not really using a large number of Names in the Name Authority Database? I didn't have time to test every name in the Name Authority, of course, and I don't know all of them anyway, although I can probably guess quite a few.

3) Wild cards only provide "Exact Matches" results even if "Exact Matches" is not checked, at least for First Names. I didn't know that! What about Last Names?

* Smi* in the 1900 US Census:

*** "New Search" and "Exact Match" = 556,340
*** "New Search" and "Ranked Match" = 935,405

So it appears that the Wild Card for Last Names does return more matches with "Ranked Matches" than "Exact Matches." Interesting, isn't it? That makes sense, I think, and I'm glad to know that it does.

I'm not sure what I've proved here - but it sure seems that Ancestry.com is not using the full Name Authority Dictionary for First Names. If they were, the "Ranked Matches" for all of the Catherine/Kathryn/etc. and Elizabeth/Elisabeth/etc. variants would have the same number, wouldn't they?

The implication, then, for researchers is that there are some name variants that return the same number of "Ranked Matches" but not all of the variants do - and some return very few "Ranked Matches" compared to others that should be in the same Name dictionary. Researchers still have to search with First Name variants to ensure that they can find their search targets. It's probably easier to use a few First Names with Wild Cards than to search some or all of the 800 to 900 variants for these names.

2 comments:

Heather Wilkinson Rojo said...

I'm new to blogging, but I saw your name and I am looking for some information on Seavers in New Hampshire/Roxbury, Massachusetts. Is this your line of Seavers? Thanks so much, Heather Rojo at www.nutfieldgenealogy.blogspot.com

Geolover said...

Randy, glad you took a closer look at whether the Name Dictonary was actually being utilized in Ancestry.com's two different Search User Interfaces.

Wish they would answer your question: why is the dictionary not actually being used for search purposes?

By the way, Eliza and Elizabeth are not universally considered to be equivalents. I descend from two different families where two different daughters were named Eliza and Elizabeth (which has confused some other genealogists researching these families).