Friday, May 29, 2009

Ancestry.com Content Count

The email from Ancestry.com provided more information about the web site content. Gary Gibb, the Vice President of U.S. Content, provided this chart of images and records by category:


Based on this information, Ancestry.com (in all of its' content areas) has about 240 million images of records (note that this does not include images uploaded by family tree users) and over 8.3 billion records. I'm not sure how that is defined - it's not just names. A record is probably the summary of the indexed information for a particular person, and might be from an image. Some databases on ancestry do not have an image associated with them. All of the records have some sort of source citation, although they are not Evidence Explained quality in most cases.

Those numbers seem fairly staggering for me. A recent press release by Ancestry.com claimed that they had 2.8 petabytes of data stored on over 5,200 servers (one petabyte is 1.048576 million gigabytes). My 2005 Dell Windows XP computer hard drive has almost 84 gigabytes of storage. The Ancestry.com data would need over 12,000 computers like mine just to hold this data! Then there is the backup problem!

I was surprised by the number of Military images - about one third of the total image count on Ancestry. I looked at the Military databases page and noted that there are over 38 million World War I draft card records for over 24 million men registered. About 8.3 million persons are in the World War II Army enlistment records.

There are 125 million records and 80 million images - about 1.5 records per image. Other databases have a much higher record to image ratio - the entire Ancestry collection has a record to image ratio of 34.5 (8.3 billion divided by 240 million). The record to image ratio for these databases collections are:

* Vital records -- 28.3
* Census and Voter Lists - 32.5
* Court/Land/Probate - 3.5
* Directories and Member Lists - 276
* Immigration and Emigration - 5.7
* Military - 1.5
* Newspapers and Periodicals - 56.5
* Pictures, Maps and References - 15.2
* Stories, Memories and Histories - 17.1

I'm not sure that those numbers are useful, but since I took the time to calculate them I decided to post them.

No comments: