Friday, August 31, 2007

Who will decide what is correct?

Like many genealogy researchers and family historians, I have watched the proliferation of genealogy databases that have erroneous data in them. The examples are numerous - from the LDS Ancestral File database contributed mainly by LDS church members, the user-contributed databases published on CDROMs by FamilyTreeMaker, the user-contributed Rootsweb WorldConnect databases, to the user-contributed databases on Ancestry and many other user-contributed databases.

I believe that almost all of this information was submitted by genealogy researchers based on their own research and/or research performed and shared by others. But the fact is that some of it - not all - some of it is erroneous for whatever reason.

My earlier post - "Was Daniel Boone an Ancestor of Pat Boone" - describes in some detail the dilemma. The result, at least for Pat Boone's patrilineal line, is that 3 out of 39 user-contributed databases were very likely correct. The other 36 submitted databases were very likely erroneous in one or more families.

My understanding is that several web sites intend to create a very large (humongous!) database of the world's genealogy data. I think that something like this is a goal of the New FamilySearch, Ancestry.com, FamilyLink, WeRelate, MyHeritage and undoubtedly other genealogy-related sites that will accept genealogy databases contributed by users. In some of these formulations, other researchers (the genealogy social network) would add content, vital statistics, and notes to them so that each person in the database was connected to the "correct" set of parents and children, back in time, ad infinitum (which should be interesting!).

All of this reminds me of the classic math/science joke shown here with the critical step being "then a miracle occurs."

Most genealogists are people of good faith that have researched individuals, connected families together, written books and articles, created web pages and submitted data on hundreds of millions (or is it billions?) of individuals.

Frankly, only by examining the sources of the submitters, analyzing their work, continually testing and proving hypotheses concerning the structure of every family can this be done well.

The problem with any plan to create massive family-linked databases is clear to me -- who will decide which data is correct?

* Will there be a review board of some type that verifies or certifies that a family structure - names, dates, places, relationships - is correct?
* Will they verify/certify one family at a time? Or an entire database?
* Will every database have its own review and verification/certification board, process and results?
* Will only data from a currently certified genealogist be accepted?
* Will there be a blacklist of known erroneous information? Or submitters?
* How will they treat new information that proves or disproves the currently "correct" information?
* How will they deal with submitters who withdraw their data from the web site?
* Won't this be a very large undertaking?
* What happens to the online databases that are already available? Will they be replaced and updated? Will they be taken down and made not available?
* What about the data published on CDROMs, in books, on web sites?

I am pretty sure that these questions have been asked by the smart people at the companies that hope to create these massive family-linked databases, and perhaps they have the answer.

I hope that these types of discussions are occurring with and within the professional and/or certified groups (e.g., BCG, APG, ICAPGEN) that are vital to this discussion and the resolution of these issues.

I don't think that censorship or blacklists are the answer to any of those questions. I do think that education, experience, sharing and collaboration are the long-term answer.

What do you think? I'm sure that many genealogy bloggers and readers have thoughts on this topic - I hope they will write their own posts or make rational and interesting comments to this post. If there are articles previously published on this issue, I would aprpeciate knowing about them.

UPDATED: 31 August, 12:20 PM for editorial corrections and additions.

4 comments:

Craig Manson said...

Randy,

The cyber-universe, especially the blogosphere, has proven itself adept in many contexts at exposing fraud, correcting mistakes, and asking the right questions. Your Pat Boone/Daniel Boone post is an example of that. With the capability to leave electronic "Post-It" notes on web pages, I think incorrect data will be corrected or fade away. People will evaluate evidence and choose the most persuasive.

Anonymous said...

I'm a fledgling genealogist, in the interest of doing the work my granny never had the resources to finish, and of course, knowing who my people were!

I subscribe to Ancestry.com, and visit numerous message boards, to put this info together.

The problem with Ancestry.com is that in the beginning of my stint with them, I was immediately shown 5 generations of ancestors for every entry I made (at least in some cases). I was amazed and accepted a few of those as gospel until I realized I sometimes already had different (and accurate) information than was being offered by Ancestry.com.

I've used "my account" there to store information digitally, but kept my family tree "private" so as not to spread erroneous information.

Lo and behold, the information is still spread freely by Ancestry.com, with a note that they cannot give away more since my family tree is private.

For example, if you look up one of my brick walls on Ancestry.com, William D. Waggoner b. 1816 in Tennessee, you might be shown a record of him which references my family tree (among others), but also gives the information that I have in the recent past posted his father as (possibly) Isaac Newton Waggoner, married to Sarah Boone. I am fantasizing about this parentage, really, and stored the info there in a space I thought wouldn't be leaked to the unsuspecting masses.

But it is leaked.

I took Isaac Newton Waggoner off my tree when I discovered this, but he hadn't disappeared from the search results when I last checked.

So, "private" is not really "hidden." You can find William D. Waggoner b. 1816, father Isaac Newton Waggoner, mother Sarah Boone, spouse Nancy Munsey. Ancestry.com will keep the rest "private" unless of course you look for each by name. Then it will reveal that next branch of the tree. It just won't show you the whole thing at once.

There is a lot to be desired with this bank of information, and a generous lot that is so helpful to my process. I only accept other family trees as leads, which I then substantiate with census records, probate records, etc.

I just wish Ancestry.com would provide a way to keep your information actually private until such time you feel it's sourced and documented enough. But, that would defeat their purpose, which is...to disprove the Evolution Theory, by 2010?? (just kidding...)

Michelle

Breeze said...

I too decided to create a "private" tree on Ancestry because posting a tree is the only way to get their computer system to search for hints on your family. A lot of my family tree is not internet ready but now it's out there for others to see. I guess one way for errors not to spread is to notate most of your tree as possibly or doubtful.

Unknown said...

This is a topic of interest to me. One possible solution is to take a data-centric approach. Rather than taking a bunch of family trees and trying to merge them all together, start with raw data and let the tree emerge from the data. This is the way automatedgenealogy.com is proceeding at the moment, although I am nervous to be taking a very different approach than everyone else seems to be. At the moment we just associate together all the records for each person, which, admittedly can introduce errors, but it does eliminate all the unsupported conjecture that is common in many family trees. The answer to your question is then "the individual genealogist", we're just making their job a whole lot easier rather than doing it for them. In the end, a consensus tree will probably end up being constructed, but the path from here to there will be very different.

Examples of linking:
simple census linking
more elaborate linking

Any comments?