Saturday, January 10, 2009

Day 2 in SLC - A Visit to TGN - Part 3

I had a choice last night - to write another post about the meeting at TGN (and stay up late to get it right) or to get a good night's sleep and be ready for a big research day at the Family History Library. I chose the latter!

Following up on Part 1 and Part 2 on this topic - I see that I basically gave you facts and numbers that could be obtained from a TGN presentation or press release. I thought that they would provide background for this followup post.

One of the most interesting moments of the day was when Andrew Wait provided a vision for how the genealogy community could collaborate with data providers like Ancestry.com. The vision was an Individual Page for every person being researched (are we thinking of a billion or more pages?). This would be in a wiki format (presumably similar to Footnote.com's Person Pages), with photographs attached, stories, sources, research notes and proff arguments attached, with attached data records (census, military, probate, deed, family papers, immigration, naturalization, cemetery, newspaper, etc. - whether from Ancestry.com, another database provider or from a contributor).

This sounds ideal to me, but will require a major education effort on the part of the genealogy industry to educate the genealogy community who are really the only people that can populate a wiki like this, at least for generations that don't have census, military, immigration and other records available in databases at present. There are already several wiki-based web sites for this type of collaboration - the largest one is at www.WeRelate.org, I think. Very few researchers have even tried the collaboration aspect of the WeRelate site, and few persons search the site. Frankly, I am really leery of a commercial company hosting a wiki of this nature, although Ancestry.com has the critical mass of users and the resources to introduce it and try to make it work. A collaboration of the commercial providers with FamilySearch and other non-commercial providers might be the naswer.

Here are summaries from the last four presentations on Friday.

1) Kenny Freestone made his presentation on Friday about Ancestry Family Trees. He didn't talk at all about the legacy family trees - Ancestry World Tree, One World Tree or Online Family Trees. His presentation was all about Ancestry Family Trees - the Public Member and Private Member Trees that have been added over the past two years. Here are some of the main points that I gathered from the talk:

* "Hinting" - this means the "little green leaves" on the pedigree chart when you are working with an Ancestry Family Tree - is a difficult computer problem. The purpose of "hinting" is to provide the most likely matches for a particular person in the Ancestry.com databases to the researcher. It is the "low hanging fruit" that can be easily found with a name, location and lifespan match. The matches may be new information for a user. Since they implemented it, 170 million hints have been "accepted" - meaning the image and information has been attached to the user's family tree. He said that 85% of the offered hints are accepted by the user - that's a phenomenal number, I think.

* There are 8.3 million trees in Ancestry.com from 7.3 million users (remember that you don't have to be a subscriber to put your data in the tree system, but you do have to be a subscriber to accept the hints). There are 810 million names in the trees, and 14 million photos have been attached. 2 million invitations to family have been sent, but not many people have contributed content to someone else's tree. They have designed the tree system for collaboration between family members and other researchers with common ancestors.

* They see they major problems with the family tree collaboration system - data vampires (those who take data without giving any - transparency is the solution here); perpetuating errors (adding other persons bad data creates more bad data - the solution may be user acceptance of merging data); and privacy (hiding details of living people - there are already Public trees and Private trees, and both permit someone to send a request for more info to the submitter; there will be a Hidden tree also for those that don't want any information shared but want a family tree online).

* There will be a new Tree Viewer on the Ancestry trees. It will close the gap between the current web-based tree and the FTM desktop tree. It will add a report capability like the FTM 2009 reports. The biggest issue for this is synchronization - if a user modifies his tree in FTM 2009 and tries to replace the web-based tree, the problem is lost attached data. They didn't say it explicitly, but I wouldn't be surprised if the web-based tree page looks like FTM 2009, with notes and sources available on the web page (as opposed to being well hidden now).

2) Mike Wolfgramm's presentation on Content Technology touched on 19 items in the content pipeline, but he only described some of them (and I didn't catch all of the items in my notes). The ones I did capture included:

* They are seeking patents for many of their technology items, which is smart business.
* Scan Manager creates a scanned database faster with automated tools.
* Automatic cropping of images
* Smart watermarking puts the "Ancestry.com" or "MyFamily.com" marking in a blank space on an image.
* They are using ultra-violet or infrared imaging on hard-to-read images (the example was pencil entries in the 1851 England census that have faded badly due to age and water damage).
* Binarization - changing some gray-scale images to black and white to improve readability when OCRed. The claim was that Ancestry's OCR error rate is 0.04% (1 in 2,500).
* Keying tools - they have two companies in China keying entries into indexes based on character recognition (i.e., like a census line, where they don't have to make sense of the content), and one company in Uganda keying entries with context (Uganda's national language is English).

I'm not expert at any of these technologies so I can't comment about them. I'm happy to see them trying to improve the technologies, though.

3) Anne Mitchell's presentation was about the Search function, and drew many comments. Anne pointed out that most of the Ancestry databases have names, dates, places, ages, and relationships in the data fields. The names may be spelled phonetically, with different spelling, with abbreviations, or with translated words by the recorders, and be mis-transcribed by the recorders and/or indexers. An example is they have identified over 800 ways to spell Catherine, including nicknames.

* Rather than just stick with Exact matches to the user's information, their Ranked Matches, which create fuzzy searches, are based on an algorithm that gives different weights to different data items. They are trying to create smarter algorithms that drive the best matches to the top of the list.

* The Vertical Search (which the users know as New Search) checks over 5 billion records and nodes, which resides on many different servers and then the dynamic web page displaying the results has to be built to deliver results within a few seconds.

* They are trying to improve the Ranked search results so that the absolute best matches are at the top. Name penalization (move names that don't match down the list), date penalization (move dates that don't match down the list) and place penalization are being considered. It means changing the algorithms.

4) Scott Sorensen's presentation on Emerging Technology was the last one of the day. His topics were:

* The Ancestry DNA prices will come down on 13 January ($79 for a Y-DNA 33 marker test, which was $149; and $149 for a Y-DNA 46 marker test, which was $199). Their goal is to attract more users who will contribute to their DNA database. They presently have about 30,000 now, and hope to achieve 150,000 entries in the database. Persons who have had their DNA tested by another company can enter their data into Ancestry's database and family tree. They want to do more user education on genetic genealogy and ancient ancestry.

* They hope to add a Places feature on family trees so that the user can see a map with life events for their ancestors. They talked about a wiki for places that users can contribute their knowledge to. They also want to add user information about genealogy resources available at repositories and on the Internet (more than just on Ancestry.com).

* Family Tree Maker 2009 - This was covered in some depth and caused some discussion during the meeting. My table at dinner last night also discussed this in some detail. The future plans for FTM were also discussed tonight at the TGN-sponsored dinner, and I'll cover it all in a separate post.

There were parts of The Generations Network that were not addressed during this meeting, including MyFamily.com, Genealogy.com, Rootsweb.com, international sites and content, MyCanvas, the Learning Center, and Ancestry Magazine, etc.

I keep reminding myself that this is a Public Relations campaign by Ancestry to influence the attendees. We saw the best view possible of the company and the products. However, the candidness about past and present problems, the enthusiasm and work ethic of the employees and management, and the willingness of TGN to allow this information to be shared must be considered when evaluating these presentations and the meetings.

TGN has publicized many of their plans in a more public way than ever before. They know that the expectations of the subscribers and users will be raised based on this information, and that they have to deliver as promised.

My apologies for this too-long post about half a day of presentations, but I wanted to put my impressions and understandings down on paper. If a TGN employee thinks that I have ignored or miscopied something in their presentations, I urge them to contact me in Comments or at rjseaver@cox.net and I will set things straight with corrections.

In addition to the FTM post, I will write soon about the overall impression I have of the company and its' staff and products.

4 comments:

Becky Wiseman said...

Thanks for the updates on the TGN meetings, Randy. If they "fix" the search functionality, I for one, will be a happy camper. Right now I have a love-hate relationship with the ancestry search capability. Most of the results returned on my searches are totally irrelevant most of the time.

These blogger meetings are a *big* PR campaign on their part as they are getting quite a bit of publicity from it. We can only hope that something fruitful and useful will come of it. Skeptic is my middle name...

Sheri Fenley said...

Great reporting Randy!
I have a question - Why are they outsourcing to China and Uganda? Why can't the work be done in the United States?

Harold Henderson said...

Thanks, Randy. It wasn't too long for me! Re search results: would it be difficult to write search-engine code so that I, the searcher, could say which factors to emphasize and which to "let go of" first in the absence of results?

Harold

Becky Wiseman said...

Sheri - my guess would be economics. The bottom line is money. Workers in China and Uganda (and many other countries) can be paid much less than American workers would expect to be paid.