Wednesday, June 1, 2011

NARA is Looking for a Host and Search Engine for the 1940 U.S. Census

...
I came upon a link to the U.S. Government's Federal Business Opportunities website today, and there is a very interesting Request For Information (RFI) titled "Hosting and providing access to the 1940 census."  The due date for the RFI is 22 June 2011.

This page indicates that:

"Solicitation Number:      NAMA-11-RFI-0004
:      Sources Sought
:      Added: Jun 01, 2011 5:13 pm

"The National Archives and Records Administration (NARA) is seeking industry approaches to award a no-cost contract to provide managed hosting and online access to digital images of the 1940 Census when it is released to the public on April 2, 2012. Managed hosting and online access includes providing the public with the ability to search and browse descriptions and digital images of the 1940 Census, zoom and pan the images, and download single or multiple images associated with each enumeration district.

"This RFI is intended strictly for market research and the potential solutions presented may or may not lead to a solicitation."

There is a link to an 8-page document titled 1940 Census RFI.docx (365.83 Kb).  A user can read or save this document.  I encourage readers with an interest in the 1940 U.S. Census to read it.

There are many really interesting statements in this document.  Let me try to summarize it for my readers:

*  They are requesting that some organization will contract to host all of the 1940 census images, and create a search engine for users to be able to browse the images.
*  The search engine must handle a request for an address, an Enumeration District, or a Geographic Location and provide a set of images for browsing purposes. 
*  There are maps with metadata for 325,000 ED descriptions (state, county, city or township, enumeration district) to help users locate and browse a specific set of census images.
*  The users must be able to zoom, pan and/or download the census page images.
*  NARA wants this hosted and searchable at no cost to NARA.
*  There are 3.8+ million 1940 census images and maps, from 4,745 microfilms.
*  NARA has 20 terabytes of digital image data (JPG files, 40 mb each open, 4 mb compressed)
*  The system has to be in place by 2 April 2012 (10 months from today).
*  NARA requires a capability to serve 10 million hits per day for this collection, and 25,000 concurrent users.

Since this is only a Request for Information, NARA is under no obligation to contract with any of the potential suppliers that submit information.  An evaluation of the RFI responses, a Request For Quote (RFQ) to qualified suppliers, and then a Contract Award, with one (or more) supplier(s) will likely follow, and may take several months to conclude.

My thoughts about this RFI include:

*  NARA sure waited a long time to ask suppliers to submit information to support this request.
*  NARA apparently did not plan for user access on their own computer servers, or planned along to have non-NARA servers host the searchable files.
*  NARA wants a supplier to host the digital images and the search engine, and have sufficient server capacity to handle the perceived maximum hits and concurrent users.
*  There is no mention of an index, but we already knew that.  That comes later from one or more providers.
*  Surely Ancestry, FamilySearch, Footnote, and possibly other companies, have been discussing this for months, or years, with NARA.
 
I have some questions for my readers:

*  Which genealogy companies would have the capability to host the images and provide a search engine at no cost?  My guess is only FamilySearch.org and Ancestry.com. 
*  Could a small genealogy company, or a known non-genealogy company trying to break into the genealogy collection arena, possibly win this contract? 
* Wouldn't whoever wins the contract have to start soon?  That's a lot of large computer files!  There is still an RFQ and Contract Award necessary to even start the work.
*  3.8 million image files sounds like a lot, but don't FamilySearch, Ancestry, WorldVitalRecords and GenealogyBank have large image collections also? 
*  If an organization gets the contract, will they then have a head start on indexing the images? 
*  Would NARA award two or more contracts for the actual hosting and searching? 
*  Would two or more companies decide to work together to spread the hosting and indexing task and cost? 

What observations, comments, conclusions, and questions do you have?  Tell me, or write your own blog post about this issue.

The URL for this post is: http://www.geneamusings.com/2011/06/nara-is-looking-fior-host-and-search.html

(c) 2011. Randall J. Seaver. All Rights Reserved. If you wish to re-publish my content, please contact me for permission, which I will usually grant, with proper attribution. If you are reading this on any other genealogy website, then they have stolen my work.

6 comments:

Carol Yates Wilkerson said...

My first thought was that since it's a government entity perhaps they have already chosen who will do it and have to put it out there as a formality. We saw that a lot with government jobs at the shipyard where my husband worked.

Geolover said...

Familysearch.org does not have the server capacity to do this. Their present server setup barely handles existing traffic, and parts of the site are already very slow-loading. They may not have the development / engineering staff to handle a large number of additional servers and the requisite programming.

TheGeneticGenealogist said...

Randy - first of all, I'm shocked that the 1940 census has already been digitized!

Second, since NARA (a government agency) created the digital images, they are likely NOT subject to copyright protection, even if someone else hosts the images.

I really like this model.

Blaine

Anonymous said...

What on earth will the search engine search if there is no index of any kind- especially as they say it must find an address??

Linda McCauley said...

This is really interesting considering less than a month ago David S. Ferriero (U. S. Archivist) said at the NGS Opening Session that the 1940 Census will launch on NARA's website on 2 Apr 2012, that they will have enough servers space to handle it and that the census is "sort of" indexed.

Anonymous said...

I would love to see Google do it! They have the best capabilities and money to do something like this.
I've always wished Google would get into genealogy... if nothing else, to end the strangle-hold monopoly Ancestry has on it.
Can you imagine how versatile, intuitive, and accurate your searches would be? They would eat Ancestry's lunch!