Monday, January 5, 2009

Are imaging services missing NARA records?

One of the more intriguing threads on the APG (Association of Professional Genelaogists) mailing list was started by Tom Kemp titled Chicago Marriage Records 1871-1920 going online. While the original post was informative about the subject, controversy soon arose over the issue of volunteer or commercial image providers (the ones working for ancestry, Footnote, FamilySearch, others?) missing specific records when capturing images from microfilms or original documents.

Peggy Reeves is an experienced genealogist who uses the National Archives in Washington DC frequently to pursue her own and her clients research. Her first response to Tom's post, and subsequent responses to other posters, made claims that (see original posts for context and more complete commentary):

* "They (the Footnote folks) are scanning in poor quality black-and-white images, and not from the original documents in most cases, but from antiquated NARA microfilm that is difficult to read in the first place." - from here.

* "The Ancestry Civil War pension index is a good example. Many of those cards are difficult to read on the NARA microfilm. Ancestry has a disclaimer that says 10% of the images are "missing". They are NOT "missing". The truth is that the subscription service chose not to include the ones that scanned as all black or all blank, and it's more like 20-30%, not 10%." - from here.

* "Footnote is now scanning Civil War widow's pension files from the original paper. The originals are on papers of many different colors, and sometimes faded and difficult to read. The technology exists to scan these valuable files with high-quality color scanning, but it is not being done." - from here.

* "The problem is that once these files are filmed (no matter how cheap and poor the images are), NARA will then take these files out of circulation so that no one can request to see the originals any more. In other words, we will all be stuck with whatever the subscription services do, and a great deal of valuable information will be forever lost to us ALL, because we won't be allowed to see the originals any more." - from here.

* "FamilySearch recruits volunteers and provides them for use by Footnote, Ancestry, and other Utah-based genealogy vendors. The vendors negotiate contracts with the various record custodians and then they send the FamilySearch volunteers in to bring home the bacon." - from here.

* "It is the longstanding policy of NARA to reduce handling of the original documents by taking them out of public view once they are microfilmed or scanned. Thus, the indexing and images that are getting botched or left out completely for the sake of getting the product to market faster will soon be gone forever from our view, because NARA will make us dependent on those scans. How's that for "preservation"?" - from here.

* "Today I had to get a Civil War pension file for someone. After I looked up their soldier on the microfilm, I decided to do a random spot-check. I rewound the microfilm and jotted down the first 25 pension cards on that particular film. With regard to evaluating the sources that we use, here's an actual example to evaluate. This is from the Civil War era pension index (the one digitized at Ancestry), #T-288, and this sample of the first 25 cards is from roll #402 (name, regiment, certificate #):
1. Roe, Charles - B1 TX Inf - ctf #1187312
2. Roe, Charles B., alias Charles Rogers - D 10 OH Cav & E 8 OH Inf - ctf #464620 inf, ctf #698145 widow Catherine B. Roe
3. Roe, Charles E. - B 1 FL Inf & L(?)22 U.S. Inf (SA War) ctf #1294569
4. Roe, Charles E. - K 89 IL Inf, QMS 89 IL Inf - ctf #1014449 inv, ctf #704736 widow Sarah
5. Roe, Charles E. - K 1 IA Cav & B 12 IA Inf - ctf #189672 inv, ctf #645225 widow Rebecca V.
6. Roe, Charles E. - I 5 MA Inf - ctf #1148997
7. Roe, Charles F. - Unassigned 3 U.S.C Inf, 11 U.S. Inf (Capt), 26 U.S. Inf, C 9 NY Inf - widow only, ctf #536186 Lydia F.
8. Roe, Charles H. - G 12 IL Inf - ctf #1000315
9. Roe, Charles H. - E 156 NY Inf - ctf #853656 inv, app only for widow Cathrene B. #1043795
10. Roe, Charles K. - I 4 MO S.M. Cav - ctf #1112664 inv, ctf #A-6-14-28 widow Frances
11. Roe, Charles O. - G 52 NY Inf - ctf #388195 inv, ctf #941982 widow Helen C.
12. Roe, Charles S. - G 6 IL Inf - ctf #1291570
13. Roe, Charles T. - I 146 OH Inf - ctf #578623
14. Roe, Chauncey C. - F 16 MI Inf - no ctf, app #962733
15. Roe, Chester K. - D 1 MMB USV Inf & A 1 MMB USV Inf - ctf #841542 inv, ctf #A-3-8-28 widow Mary E.
16. Roe, Christopher - F 106 OH Inf - ctf #1010257
17. Roe, Clarke - Unassigned 17 NY Inf - no ctf, app #1229984
18. Roe, Clem - G 3 WI Inf (SA War) (I didn't copy the # for this one), widow Emma
19. Roe, Cornelius B. - D 26 KY Inf - ctf #927453
20. Roe, Cyrus A. - I 50 NY Engineers - ctf #559866 inv, ctf #615547 widow Samira A.
21. Roe, Dalton - D 6 U.S. Inf & E 21 U.S. Inf (SA War) - no ctf, app #1243736
22. Roe, Daniel - B 65 IL Inf - no ctf, app #1267282
23. Roe, Daniel E. - F 27 IA Inf & K 4 VRC - ctf #1077346 inv, ctf #A-1-18-29 widow Louisa R.
24. Roe, Daniel J., Jr. - A 156 NY Inf - ctf #411967
25. Roe, Daniel M. - F 1 FL Cav - ctf #368117

"I invite each of you to evaluate your online source by scanning for each of these names in the Civil War pension database at Ancestry. Know how many you'll find? When I looked them up on the NARA computers today, I found only ONE out of the 25." - from here.

* "NARA researchers have always been able to request to see a document that is not clear enough to read on-screen. We know what is there because we can look at the microfilm indexes ourselves. But when new record groups are scanned from original documents and new indexes to these records are created by volunteers doing tedious work on a profit deadline (omitting the more difficult images, or misreading names for lack of experience), then the record disappears. In other words, how can we know to request to see the actual document if we can't see it on an index to know that it exists in the first place? We can't, and that is how records disappear from our view forever." - from here.

I checked the Ancestry Civil War Pension Index (the one for Microform Series T-288) for the persons on the list above, and Peggy is right - there is only ONE of them on the Ancestry database - Charles B. Roe, alias Charles Rogers. I looked for other spellings, etc, and did not find any of the other 24.

So far, that is the only concrete example provided by Peggy in response to questions from APG list members.

There are more posts by Tom, Peggy and others, including a response from James Hastings, a NARA manager, and Chad Milliner, who works for Ancestry. Please see the full range of December for the complete file.

I don't know Peggy, and don't know if other experienced NARA researchers have the same opinions. From what I can tell by reading Peggy's posts, she is an experienced researcher, knows her way around the NARA collections, and has opinions about commercial services and the imaging/indexing job being done at NARA. She put herself out on a big limb by making these comments and has held her own, civilly, in responses.

The big issues for all genealogy researchers are:

1) Are all pages of a NARA file being imaged and indexed from the original documents or from microfilm images?

2) If the images are not readable on microfilm, are the original documents brought out, imaged and indexed?

3) If the original or microfilm images are not readable, are place-holders put in the image collection so that a user knows that there is an unreadable page?

4) Are the original documents in the NARA collection going to be available for researchers to observe in the original form in the future?

These issues are important to all of us, and have been treated as such by the posters on the APG list. They are important for all database image and indexing projects, not just the NARA projects.

My opinion is that these issues need to be investigated. More examples should be requested from Peggy and other experienced NARA researchers. Standards should be considered and issued by NARA, the companies and the genealogy community at large, and both NARA and the imaging/indexing companies should abide by the standards.

The needed standard, in my mind, is that a record collection should be completely imaged and indexed, by the best technology appropriate for the task (e.g., if the Navy Pension Cards are on blue cards that make them dark using black-and-white, then image them in a color that shows contrast), and that original documents (or microfilms of indexes or original documents) should not be discarded or closed off completely from genealogy researchers.

My apologies for the length of this post, but I have not seen another genealogy blogger address this issue and I thought that it deserved exposure and comment. The larger world of the APG mailing list has been exposed to it, which is good, but not everyone reads the APG mailing list.

12 comments:

tagazio said...

These are some interesting accusations.

The one I find most interesting is FamilySearch providing volunteers to do the scanning for Ancestry, Footnote and others...so they all have zero labor costs for scanning. I seem to recall, in those flurry of announcements that FamilySearch and some of the commercial services were teaming up...and what they were doing sounded kind of vague...maybe this is what they were talking about.

I can understand not including records that scanned blank or all black, but they should at least insert place holder documents explaining why it happened.

This kind of puts everything in a different light...if true, what else is being left out? Are we only seeing parts of collections the commercial sites want us to see or are they including everything as we all assumed they were?

Tim

David said...

Good information, thanks for the post.

Abba-Dad said...

Ancestry says:

Please Note: Due to deficiencies in the microfilms of the original source cards (i.e. faded, illegible, etc.), about 1% of the pension cards were not included in this index, and may be re-scanned and included at a later date if legible digital scans can be created. The microfilm rolls of these original source cards may provide additional data for these missing images. The Family History Library in Salt Lake City or The National Archives and Records Administration (microfilm #T288) are excellent sources for the complete collection on microfilm.

So they claim 1% is missing, while you and Peggy found only 4% of that specific roll? That doesn't sound right, does it?

They also say that this 1% will be re-scanned at a later date. but Peggy claims NARA will offline the collection once the scanning is complete. So which one is it?

Is there anyone working on clearing this mess up?

Donna Hague Wendt said...

Well, I bet my missing Civil War ancestors are in the missing scans! Thanks for posting this interesting situation.

Elyse said...

Wow - I am kind of shocked that this is going on. I hope that they investigate this and an outsider with no opinion on the subject and no alliances comes up with a solution.

Jennifer said...

Very cool post, Randy. Thanks.

I think this epitomizes the fact that the road to digitization is fraught with peril. If the parties involved in creating the new, digital archives bungle this transition, we're going to be dealing with the consequences for a long time.

The thing I don't understand is how these projects intersect with NARA itself. When the government made these digitization agreements, weren't there provisions for the quality of the output? Or do they care that little?

Geolover said...

Jennifer asked if there weren't some output quality control provision.

The NARA-Ancestry contract that was posted for comment contained no quality control provision: nothing saying that NARA would review what was put up by Ancestry, nothing saying that 100% of a given database had to be posted.

Here and there are some huge gaps, such as incorrect links or missing images for about 2/3 of the World War I Draft Registration cards for Tioga County, Pennsylvania (every surname beginning with K through Z).

S. Lincecum said...

I've always known going to the original record and viewing it with your own eyes is the only way to really know what it says. Purposefully leaving out records because they are too hard to read or because businesses are too lazy or cheap to "do it right" does not shock me. The fact the record is then taken out of circulation altogether by the government does. That's messed up.

long island document scanning said...

You don't have to worry about equipment or labor costs when you decide to let document scanning pros handle the work.

electronic document management said...

It is possible to have zero labor cost for offering a scanning service, but only because you can get the money to pay for it elsewhere, such as data storage.

Janice Clements said...

Alternatives like file scanning might be the best in that kind of situation. It should be considered.

Fix PC

Jane said...

This is why I don't send our paperwork to the document shredding Los Angeles service yet until I everything is polished and cleared. The authorities should look into this and clear up the mess.