Monday, May 7, 2012

How Accurate are the Ancestry "Shaky Leaf" Hints?

I wrote New Hint Notifications on Ancestry Member Trees last Friday, and got to wondering about the completeness and accuracy of the Hints offered by Ancestry.com in Ancestry Member Trees.

The pedigree chart for one of my Ancestry Member Trees looked like this:



Almost all of the persons on the tree (five generations) have Hints indicated, and the ones without Hints indicated have had Hints in the past which I've either accepted or rejected.  

I decided to look at the Hints for each person on this tree, and see which Hints (offered, rejected, or accepted) are accurate.  I judged accuracy by whether the Hint was for the particular ancestor, based on my research.  Here are the results:

*  Randall Jeffrey Seaver (1943-....):  3 Hints - 3 accurate.
*  Frederick Walton Seaver (1911-1983): 9 Hints - 9 accurate.
*  Frederick Walton Seaver (1876-1942): 9 Hints - 9 accurate. 
*  Alma Bessie (Richmond) Seaver (1882-1962):  5 Hints - 5 accurate.

*  Frank Walton Seaver (1852-1922):  7 Hints - 6 accurate.

*   Harriet Louisa (Hildreth) Seaver (1857-1920):  7 Hints - 7 accurate.
*  Thomas Richmond (1848-1917):  4 Hints - 4 accurate.
*  Julia (White) Richmond (1848-1913): 3 Hints - 3 accurate

*  Isaac Seaver (1823-1901):  7 Hints - 6 accurate.

*  Lucretia Townsend (Smith) Seaver (1828-1884):  4 Hints - 4 accurate
*   Edward Hildreth (1831-1899):  5 Hints - 5 accurate
*  Sophia (Newton) Hildreth (1834-1882):  6 Hints - 6 accurate

*  James Richman/Richmond (1821-1912):  10 Hints - 9 accurate

*  Hannah (Rich) Richman/Richmond (1824-1911):  9 Hints - 7 accurate
*  Henry White (1824-1885):  2 Hints - 2 accurate
*  Amy Frances (Oatley) White (1826-1865):  4 Hints - 3 accurate.

*   Betty Virginia (Carringer) Seaver (1919-2002): 8 Hints - 8 accurate

*   Lyle Lawrence Carringer (1891-1976):  7 Hints - 7 accurate
*  Emily Kemp (Auble) Carringer (1899-1977): 8 Hints - 8 accurate

*:  Henry Austin Carringer (1853-1946): 10 Hints - 10 accurate

*  Abbie Ardell (Smith) Carringer (1862-1944):  11 Hints - 9 accurate
*  Charles Auble (1849-1916):  8 Hints - 8 accurate
*  Georgianna (Kemp) Auble (1868-1952):  8 Hints - 8 accurate  

*  David Jackson Carringer (1828-1902): 4 Hints - 4 accurate

*  Rebecca (Spangler) Carringer: 8 Hints - 5 accurate
*  Devier J. Smith (1839-1894):  3 Hints - 3 accurate
*  Abbie A. (Vaux) Smith (1844-1931):  3 Hints - 3 accurate

*  David Auble (1817-1894): 7 Hints - 7 accurate

*  Sarah (Knapp) Auble (1818-1900):  3 Hints - 3 accurate
*  James Abraham Kemp (1831-1902):  8 Hints - 7 accurate
*  Mary Jane (Sovereen) Kemp (1840-1874):  7 Hints - 5 accurate

Adding all of that up for 31 persons:

*  Number of Hints Offered:  197 
*  Number of Accurate Hints:  183
*  Accuracy Rate = 183/197 = 0.929.  

So, 93% of the Hints offered by Ancestry for these 31 persons were accurate.

All 31 of these persons had an Ancestry Member Tree match, which is based on submissions by Ancestry.com users.  These included my own "other" tree.  If we subtract those 31 matches from the numbers, we get 152/166 = 0.916.

Most of the Hints provided were for census records, including Canadian and English census records.  However, I know that some of the census records were missing from the Ancestry.com Hints.

It strikes me that it must be more difficult for Ancestry.com to search the records when persons use different first names, or different surname spellings, or females who marry and are listed with a maiden name and married surname in the records.

In a later post, I'll take a closer look at the hits and misses for one of my ancestors on this list.

My question is:  Does Ancestry.com actually search for these records based on the known information provided in my Ancestry Member Tree, or does it use the records that are attached to other Ancestry Member Trees?  Or both?  I can't tell from this first look at the results.

The URL for this post is:  http://www.geneamusings.com/2012/05/how-accurate-are-ancestry-shaky-leaf.html

Copyright (c) 2012, Randall J. Seaver

9 comments:

Cousin Russ said...

Randy,

I wonder what the accuracy rate is, if you do not include Ancestry Member Trees.

I don't normally use my AMT, as I use FTM2012, but I do find that the accuracy rate is high, from within FTM2012. I have 150+ Hints, Online, maybe I'll have to see what mine look like, without including the AMTs, only looking at the Ancestry Records. I have noticed an increase in Hints from websites like Find-A-Grave. They have been Good Hits.

Russ

Celia Lewis said...

That's higher than my results, Randy. I usually find a lot of non-documented unsourced details, which I consider "interesting but not impressive" as my mother would describe. So I get about 70% or less, accuracy, less when I'm looking at 1600-1700s ancestors, of course. I only use AMTrees if I'm looking for collaterals, otherwise I will check them for sources if they have any. "Other member trees" is not a source, as far as I'm concerned, and that is very common. Interesting bit of research.

David Newton said...

The answer is both.

If the system finds member tree matches it will also suggest records linked to the entries in those trees. However I have added entries to my tree from the 20th century and even the 21st century which are not visible in any of the other trees due to privacy settings. Admittedly I deliberately crafted things like birth or marriage information to match the style of the vital records index from the English GRO. However the Ancestry system will very quickly suggest those very records spontaneously after I create a new person.

Connie said...

I agree that the answer is "both," and that your results are similar to my experience as far as reasonably high accuracy for those non-tree hints Ancestry suggests.

HOWEVER, it misses a lot of records that are there for the searching, particularly World War I draft registrations (unless you've put in very specific birthdate info), original marriage records, and some of the less well known or easy to use records like the Civil War era federal tax records or pre-1850 census records.

Sometimes, I wonder if the average Ancestry user doesn't know about the search tab and only relies upon the little green leaf. I usually find and attach multiple records for a person (post-185), when others at best have a handful of mostly census records (and not even all of them) attached.

Carl Fields said...

I'm not sure about the shaky leafs, but in the last 3-4 months, the things that come up on the right-hand side of a search telling about other documents that MIGHT be the same person seem to also be running perhaps 90% (that's a guess, not from a careful experiment, such as you did).

I can't remember exactly when Ancestry started giving these extra "teaser" suggestions along with search results, but my impression is that they were not all that accurate when they first began. I suspect there has been some kind of software improvement in this area.

Geolover said...

The 'hints' system is beset by problems in the search programming, such as a ranking algorithm that has difficulties with date-bracketing. Just today I did a search-from-tree centered on a person who died in 1815, and the top 15 results were from the 1895 NJ State Census.

The 'hints' delivered when one looks at a pedigree or 'family' box-chart view are low-hanging fruit, weighted as you suspected with tree-hints and items saved to others' trees.

The search engine has little trouble finding items from the indexes to WV vital records that Ancestry.com acquired from LDS, if I have already entered the actual data from my own research in the actual records (in the Courthouses as well as in the WV State Archives site).

I have also found that where my research has led to different and/or added data about a person (actual date/place of death; additional marriages and children), compared with data on the same person in other trees, the number of tree-hints drops back appreciably. Thus the number of tree-owners who would find matches to the same person in my tree also drops, if they start out by copying from trees with inaccuracies. This is a built-in flaw in the tree-hint design. It is but one part of the overall fallacy in Ancestry.com's marketing the submitted member trees as significant research tools.

Geolover said...
This comment has been removed by the author.
Smadar said...

I'm impressed with your statistics. I've also noticed an increase in accuracy. I feel that for a long time, I've been ignoring the hints, because apart with the ones that where immediately relevant, the rest were mostly "junk." Clearing they are working hard on the technology. I'm getting more e-mails informing me of hints than I have in a long time, and new data bases now seem to be included in these hints (Find A Grave) for example. Just last week I decided to review all my hints in an attempt to cleanup (something I hadn't bothered to do in a long time), and I was surprised to find many helpful hints. I agree it has to do with other trees, and I do find those helpful, especially when I overlap with a tree of someone who has done a good job documenting the profiles I'm interested in.

Anonymous said...

My concern is that, because I sometimes accept a hint before thoroughly verifying it, that I might be "polluting" the tree that could, in turn, pollute the hints that are sent to others. For example, if I said I was a descendent of Abraham Lincoln, would other people researching Lincoln show me as a descendent? I hope not.
... But I guess this is an art as much as a science!
Roger