Monday, August 22, 2011

Comments on the 1940 U.S. Census RFQ and SOW

Links to the 1940 U.S. Census Request for Quote and Statement of Work to host and provide access to it, and selected highlights from the RFQ and SOW, were provided last night.  I have commentary and  questions about this information:

1)  NARA could contract with more than one contractor.  That  might be smart - one contractor might create a significantly better website to provide access than another; and a contractor might fail at completing the job; having more than one contractor provides some insurance.  Free hosting and indexing the images by the contractor costs NARA very little if they have more than one contractor.  Competition creates quality products. 

2)  The FREE access requirement is for searching for, browsing, downloading and sharing the images only - not for the indexing. 

 3) The contract is for one year, with four one-year options for renewal. What happens if NARA decides to not renew the contract? After the contract termination, NARA could host and provide access, using the image search index developed by the contractor, on their own site.

4)  I do not see a contract requirement to CREATE a name index.  A name index is mentioned only in Section 3 in the context that the Contractor can create a name index starting on 2 April 2012, but not before. 

5)  The Contractor can also host the census images on their own web site, and create other products, on its own web site.  Apparently, the Contractor(s) won't have to purchase the images from NARA.  That provides a financial incentive to win the contract.

6)  NARA will offer the 1940 US Census images in digital or microfilm format for sale, after 2 April 2012.  Who would want a microfilm set?  How much will they cost?  Will the cost of the digital set be different from the microfilm set? 

7)  Purchasing the census images will provide an opportunity for all potential name indexers, and will probably spur a competition to complete the name index and to index many fields.  However, the Contract winner(s) will have an advantage early on.

8)  Potential name indexers might be able to use the census images on the Contractor developed site for free, but that may be difficult to accomplish.

9)  Who are the candidates for submitting a proposal for the NARA Contract?  The obvious answers (to me) are,,,, or some other non-profit or commercial website with the required physical, technical and management resources. 

10)  How long will it take to index the 1940 U.S. Census?  There are 130 million names in the census, with 40 or 50 indexable fields on the sheets - that's a lot of indexing.  That's 498 person-years indexing at one name every two minutes (I don't know how long it will take).  Who has the resources to do this indexing?  My guess is that only and have the financial and/or volunteer resources to do this - who else has indexed large databases from scratch recently?

11) said in 1940 Census To Be Free on that "...more than 3.8 million original document images containing 130 million plus records will be available to search by more than 45 fields, including name, gender, race, street address, county and state, and parents’ places of birth. It will be’s most comprehensively indexed set of historical records to date."  Will be able to index the complete census by the end of 2013, when access to the index (and perhaps the images on Ancestry) will go behind their subscriber wall?  I think that Ancestry used paid indexers for the 1930 census, and many other large databases.  Do they have enough volunteer indexers to do this task in 21 months or less?

12)  Has FamilySearch made any statements about the 1940 U.S. census hosting, access and indexing?  I don't think I've seen a report about it, but I've heard some rumors.  The United States Census Population Schedules, 1940 (Family Search Historical Records) Research Wiki page says:  "This article describes a collection of historical records that is scheduled to become available for free online at FamilySearch" but doesn't provide any more details at this time.

What questions do you have about this 1940 U.S. Census contract to be awarded by NARA?  List them here in Comments, or write your own blog post, and I'll create a compendium of questions that need answers from NARA and/or the Contractor(s).

Denise asked (in Comments):  "It appears that NARA wants to keep access at no cost forever, but Ancestry is only offering it for free for a limited time."

My opinion:  I think that NARA will eventually host the census on its own site, with one of the name indexes available from some provider for use in a NARA facility.  That's the type of agreement they made with,,, etc. to permit indexing of other NARA databases they are free to access in a NARA facility.  Ancestry is offering free access through 2013 for the census images, not necessarily the complete index.  It's logical to assume they will index this collection as quickly as possible, state-by-state, but it may not be complete by the end of 2013.


Unknown said...

Randy, you said "I think that Ancestry used paid indexers for the 1930 census, and many other large databases."

Perhaps this time, Ancestry can use some volunteer indexers and actually have some accuracy about the records? I feel like Ancestry's indexes aren't even going through quality control anymore. It's almost as if they just put them out there and leave it up to the subscribers to fix. Quite frustrating. Hopefully, that won't happen with the 1940 census ... otherwise, what's the point of an index?

I'm really keeping my fingers crossed for FamilySearch ... mainly so I can get my hands on them and do some indexing! :)

I think a lot of smaller companies would be hard-pressed to compete with the likes of Ancestry and FamilySearch when it comes to allowing 10 million hits PER DAY and 25,000 concurrent users (or more) without crashing. I know I would certainly crash if someone hit me 10 million times in one day.

Geolover said... did not do more indexing for the 1930 US Census than was done originally, as far as I can see; for example, the birthplaces for non-heads-of-household are missing (except for non-related persons in households).

I don't see how can be in the running; they seem barely to have enough server power to keep the existing site running, and indexers have complained about server failures during peak-use periods. And it is sooooo slooooow.