Friday, May 3, 2013

Q&A About MyHeritage Census Records and Record Matches

After the availability of the 1790 to 1930 U.S. Census records was announced by MyHeritage (see MyHeritage Adds 1790 to 1930 United States Census Records), I emailed Daniel Horowitz, the Chief Genealogist and Translation Manager for MyHeritage, asking questions and requesting that I can share the answers on Genea-Musings.  

Here are my Questions (in Red) and Daniel's Answers (in blue):

Q1)  What is the source of the census images?  Are they from FHL microfilms, or NARA films, or another source?

Q2)  Did MyHeritage create an independent index?  If so, using paid or volunteer indexers?  From which countries?  If it's not independent, where did you obtain it?

A1-2) We cannot comment about this due to confidentiality agreements.

Q3)  Why did you not include the NARA Microfilm Publication number and the NARA Roll number in the indexing?  Is the "Roll number" on the record summary an FHL microfilm number?

A3) We did not have this info available. However, we will try to obtain this and add it

Q4)  How did you choose which fields to index?  For instance - for 1880 to 1930, the father's birthplace and mother's birthplace are very useful bits of data, and the Ancestry search can use them to narrow a search.

A4) Agreed. At the end of the day the transcription costs dictate what is indexed.

Q5)  Is the 1940 U.S. census completely imaged and indexed?  If not, which states not not complete?  If not, when will the 1940 census be completely indexed?

A5) The 1940 US census is complete (images + index) but the last batch of the index (about 4%) is still in our QA lab and we expect that we'll release this last batch in about one more week. For now the images on the site are complete and the index on the site is 96% complete.

Q6)  When I'm in a Record Match screen, why can't I click on another person in the household and "Confirm" or "Reject" that match also (if it's in the Record Match list)?  In my example today, the husband, mother and two children are all in my MyHeritage tree and I would prefer to Confirm them all by clicking on their name and confirming, rather than having to do it 5 times from five different Record Matches.

A6) This is a known issue, that will be addressed in our product roadmap.

Q7)  Are the Record Matches finished providing numbers?  My number of Record Matches is very low for some census years.  My guess is that the search is not complete, and that it will take some time to do everyone's tree.   For instance, I have 116 source citations in my tree for the 1850 census, but the Record Matches shows 5.  I have 420 sources for the 1900 census, and MyHeritage shows 588, which means I have work to do!

A7)  Record Matches are still being calculated. The 1850 census does not provide many matches, so do not expect many more, because it did not list for each member of the household, its designation (e.g. wife, son, servant, etc). This was added in later years of the census. This makes it harder for automatic matching to reach conclusions without risking accuracy. We don't just give you a "match" if we find a James Smith born 1820 in a family tree and that's it, more evidence is needed that this is the right one, such as relatives and their birth dates. However, we may be able to optimize this and make some safe assumptions about relationships. For example if we have a household with James Smith age 56, Wilma Smith age 50, and 3 other smiths in the household aged 10 to 16, we can head, wife and children and try to match based on that assumption. Since the census was added to MyHeritage only yesterday it will take us a bit more than one day to develop these extra smarts, but we will, and you will be able to get more matches from census years like 1850. Having 588 matches on 1900 vs. 420 that you have does mean you have work to do, it means MyHeritage has an excellent, and highly accurate matching technology and thanks to that you have your work cut out for you...

My thanks to Daniel for answering my questions quickly and succinctly.  There are some interesting comments  there about the census records and Record Matching.  

For what it's worth, my guess is that the census images were obtained from the Family History Library microfilms, since the source citations include the FHL microfilm numbers.  Who provided the indexes, or did MyHeritage do their own?  Anyone have a guess?

No comments: