Friday, February 14, 2014

Ideas for Improvements -- Improve the Ancestry Public Member Trees Match List to Put the "Richest Tree Person" First

I attended the breakfast for some Genea-bloggers on last Thursday, 6 February, in Salt Lake City.  Tim Sullivan (President and CEO), Eric Shoup (Executive Vice-President for Product) and Heather Erickson (Senior Director of Corporate Communications) from discussed the company, the near future, and encouraged questions and feedback.  The Ancestry Insider summarized the meeting well in #RootsTech Blogger Breakfast.

One idea for improvement of the search features that I suggested was:

Improve the list of matches in the Ancestry Public Member Trees by ordering the matches according to some rational "Best Match" criteria - I suggest putting the "Richest Tree Person" first on the list.

I don't know how orders the matches at present.  The order seems almost random, except for a birth year order for a given spelling of the name.  The match list often has many entries with only a name and no birth or death years, and these are often near the top of the match list.  To me, those are pretty worthless matches.  

In most search result lists on and other database sites, the provider tries really hard to provide "the best matches" to the search criteria right at the top of the results list. does that for matches from their databases - they rank them using an algorithm so that the match list has the "best" matches at the top.  Except for, it seems, in the Ancestry Member Trees.

When I am fishing for cousins in the Ancestry Member Trees, I want to find the submitted tree with my target person with the "richest" set of sources, attached records, and uploaded photos or stories.  I am willing to look at 5 to 10 profiles, maybe even 20, but not 100, or 1,000, or 20,000 or more. 

Here is an example:

1)  I searched for one of my ancestors, my 5th great-grandfather, Norman Seaver (1734-1787) who married Sarah Read (1736-1809), and resided in Sudbury, Shrewsbury and Westminster, Massachusetts.  My search was:

*  first name = norman
*  last name - se*ver
*  birth year = 1734 plus/minus 2 years
*  exact matches checked

For this search, I searched the way I search on - what I know for the name (and usually with a wild card to cover known spelling variations), plus a birth date with a range of years.  If this was a common name, or if I didn't know an approximate birth year, I would have added the spouse's name (if known) to narrow the search.

There were 107 matches on the list.  Here's the top of the first page:

2)  I clicked on the first match, and saw Norman Seaver's profile in this tree:

The tree had one source and no attached or uploaded media.  It did not have the parents of Norman Seaver, and had only one child listed.  Was that the BEST match of the 107?

3)  I scrolled down the match list looking for a tree with many sources and attached records.  I found the one below in position 14 on the list.  It had 7 sources and 8 attached records, but no uploaded media:

That was pretty good, and everything I saw matched what I know about my 5th great-grandfather.

4)  There were several other trees with this person that had about the same number of sources and attached records, but there were few uploaded media items.

Further down on the match list (number 95) was one of my own trees, which had 3 sources and 5 attached records;  I had also uploaded 6 document images from my computer files:

6)  Which tree match would you want to see?  I want to see the one with the "richest" information possible.  I want to review the information, look at the sources, check the applicability of the attached records and the uploaded media items, and perhaps contact the submitter of the tree. 

Of the three tree profiles above, the 2nd and 3rd are pretty "rich" in content, and could probably help me if I needed information on this person.  I could attach the sources to the person in my tree, and I could contact the person to see if they would allow me to attach the uploaded media items (I know, people attach them without asking, but I like to ask).  

So why aren't those two items - #14 and #95 on the list of 107 matches, at the very top of the Match list?  I don't know, but I think they should be.

So how could this be done?  One way would be a simple sum of, say, sources plus attached records plus uploaded media items.  On that basis, the number 1 match on the list above would have a score of 1, #14 would have a score of 15, and number 95 would have a score of 14.  

Would this system be foolproof?  No, of course not.  There are always false positives, and not every researcher has attached sources, or attached records, or uploaded media items.  But the above relatively simple ranking measure would vastly improve the Results list.  

Other considerations for ranking matches might be:

*  the number of events in a person's profile (e.g., birth, baptism, marriages, immigration, military service, census, residences, occupations, death, burial, etc.).  Add the number to the other items.

*  the number of spouses and children in their family list.  Add the number to the other items.

*  do they have parents listed on the profile?  Add the number to the other items.

If we used that sum (sources, attached records, uploaded media, events, parents, spouses and children names), then Match #1 on the Norman Seaver list would have 5, #14 would have 38, and #95 would have 30.

In a real fancy algorithm, all of those things could be weighted somehow - maybe more weight for number of people and sources, and maybe less for number of events.  

All of the above assumes, perhaps naively on my part, that each person who submits a family tree tries to do the best they can for each person in the tree, and they don't waste time by adding extraneous or duplicate events or sources or media to persons in their tree.  

A match list with my example ranking system would put the target profiles of trees I want to review right at the top of the match list, and trees with virtually no information beyond a name at the bottom of the match list.  

What do you think?  Would a list that ranked the "richest" tree profile at the top of the match list  help you search Ancestry Member Trees?  

What other ideas do you have to improve the search experience?  

The URL for this post is:

Copyright (c) 2014, Randall J. Seaver


Anonymous said...

In the source count don't include Trees - or make its own category

Anonymous said...

I'm not overly worried about them adding new features as much as I'd like to see them fix things they broke when they forced everyone to start using the new image viewer. Now certain records can't be attached to a tree anymore and they got rid of having both page # and image # which was useful in some cases. They've known about the problems the new viewer introduced and have acknowledged they're a problem but have offered no timeline for a fix. They even knew about the problems while the new viewer was still in beta, but forced it on everyone anyway.

Jackie Corrigan said...

I would love for results to be sorted by the number of records attached. 99% of the time sources turn out to be other undocumented trees. Only rarely to I find a source to be a document that is worthwhile.

bgwiehle said...
This comment has been removed by the author.
bgwiehle said...

Wishlist for ancestry tree matches:
1. Exclude other ancestry trees from the source count, or show as separate category. Sometimes I've been able to figure out who copied from whom but often tree owners cross-reference back!
2. Include the last edit date or date of last visit by the tree owner. Date may help distinguish between a preliminary and a superceded or abandoned tree.
3. Show the relationship of the root person to the searched person. Ancestors are not hard to identify, but some close collateral relationships take a lot of work to hunt down. Sometime I'm looking for close connections (who may want my help) rather than just filling my gaps.

Number of sources, count of attached media and parents' names are already shown in the match list.

Potential problems with rating based on attached person counts: Since some people have duplicated sets of spouses and children, having more connected persons might not be a good measure (maybe an arbitrary cut-off?). And those who are presenting simple pedigrees would only show a single child in each generation.