Wednesday, February 23, 2011

More on "Software Programs, GEDCOM Files, and Source Citations"

In my recent post "Software Programs, GEDCOM Files and Source Citations - Some Recommendations," I made some recommendations concerning these issues.  I received only three comments, but I want to discuss the issues raised in those comments:

1)  Geolover commented: 

"...Now the question is, will the developers be willing to give up having proprietary variants, even if the present groups of "revisionists" (concerning existing GEDCOM or a suitable replacement) should agree on very workable code?"

Randy's comment:  If my recommendations were adopted, the software developers could keep their proprietary variants intact, but add code that would transfer whole source citations without the receiving program or website having to interpret the source citation elements. 

2)  Louis Kessler commented:

"I don't agree with your conclusions.

"1. Putting all info into one field is wrong. Reading programs will not be able to interpret what is there. The various parts of the source citation must be identified, and placing them in the various fields: Author, Title, Publication, etc. is the way that reading programs can understand what is there.

"2. Fields like Footnote, Subsequent footnote, and Bibliography are not data unto themselves. They are basically formatting and template information. Those are not data, and they are not tags in GEDCOM. Those are personal preferences for how you want your data displayed. Doing the formatting should be the job of the program. As long as it receives all the data it needs and understand the concept of that data, then it can format it appropriately to any template you want that it supports.

"3. Italics is another example of formatting. It is not data. This is the same as number 2. The receiving program should do your formatting.

"You've done an incredible job testing out the inputs and outputs of transferring the source citations between these 3 program. Thank you.

"But you really need to simply identify whether or not these programs are transferring the data correctly, which is THE ONLY important thing. And ignore whether or not any formatting is being transferred, since it shouldn't be.

"Then, given the correct data, you can see if the program can format the source citation correctly with the templates it allows.

"Remember, GEDCOM is to transfer data. The only formatting it should transfer is formatting embedded in notes and data to allow those to look as much as possible like the original. But how to make everything else appear should be up to the program. The program that does the latter job the best for you will then should earn your favor and become your program of choice."


Randy's comments:  I greatly appreciate Louis's comments, because he is both a software developer and a genealogy program user. 

I would completely agree with him IF the software developers and website programmers consistently used the appropriate GEDCOM fields to represent source citation elements.  As I've demonstrated in my little studies, they don't.  And the resulting citations are often mangled when transferred via the current GEDCOM. 

In a perfect genealogy software world, there would be ONE agreed upon (from professional down to beginner, from software user to software developer and website programmer) set of Source Templates that would create "perfect" citations.  The genealogy software and website world is imperfect.  Realistically, the developers won't sacrifice their proprietary source templates.  However, they might be willing to add complete source citations created by their software so that source citations were transferred "perfectly."

Using the Evidence! Explained models, the software companies have created hundreds of different templates, each with unique field names.  The template types and data fields in RootsMagic 4 are somewhat different from those in Legacy Family Tree 7, and Family Tree Maker 2011, and all other programs.  Nothing is exactly the same, nor do they always use similar GEDCOM tags.  Creating a comprehensive list of source citation tags for the "next GEDCOM" would be an onerous task, and then the software companies and website developers would have to agree on them.  Will that ever happen?

When there are four different elements in one Source Template in one program, and ten elements in another template in the same program, and another software program has different fields, it becomes too unwieldy to deal with using hundreds of GEDCOM tags.  There is no master "translator" for all of the software programs and online family tree websites.  There likely will be none in the foreseeable future.

Any attempt to force this source template information into the existing GEDCOM tags likely will result in source mangling and bad formatting when read and then published by another software program or family tree website.  An alternative to creating many more source tags might be to add new GEDCOM tags for Element 1, Element 2, Element 3, etc. These Elements would represent the different source citation fields and would be created in the order that the Source Templates create the citation.  However, that would still cause formatting problems for the reading program or website, unless formatting indicators (italics, underline, bold, etc.) were included. 

That's why I recommended what I did - to create new GEDCOM tags that seamlessly transfer whole citations with formatting, and be able to attach them to Evidence, Conclusions and Assertions (AKA Facts).  These "Facts" clearly have definite data fields - names, dates, places, relationships, etc.  The different software programs do this pretty well because the fields for them were defined in the GEDCOM development back in the 1980's.  They don't transfer Source Citations well because the genealogy technology (Source Templates) has now exceeded the GEDCOM system capabilities.

There is no requirement in my recommendations that users use the Source Templates provided by the software developers, or that they follow any model at all.  If a user wishes to source a birth record with "Westminster MA book" or "birth certificate found in Aunt Mary's attic," that's fine.  It's a source citation, and would be put into the TITLe tag in GEDCOM by every program.  It's user's choice, and the current GEDCOM will transfer it well, because it is very simple. 

However, "quality" source citations are not simple - see Evidence! Explained.  It is the de facto genealogy standard, whether some users like it or not.  My online family tree will be judged by others based on its content, including the source citations.  In all likelihood, most genealogists will have to go through what I've been doing in my own database - making "shorthand" source citations into "quality" source citations.  It's user's choice.

I think that what the users want is simplicity and ease-of-use - to be able to use the fill-in-the-blanks source templates fields, or use free-form fields, and create whole source citations that can be transferred from one program or website to another program or website, without losing or mangling the citation elements and format.

Your comments are welcome!  What do users really want?  How can it be done?

Will the Seaver Source Citation Saga continue?

4 comments:

Cousin Russ said...

Randy,

I really appreciate your continuing dialog on this topic.

As an End User, of a Genealogy Software Package I only want ONE thing.

The ability to Share my research with another person without loss of any information, especially my Source Citations.

The program or where the application is, should be transparent. Using the Same program as the "other" person, or a different program should be transparent.

HOW that gets done, will not be easy, I clearly understand.

I just want to be able to share my research seamlessly.

Russ

Tessa Keough said...

Randy - I mentioned your source citation saga in a comment on Clue Wagon's blog - the discussion over the importance and method of source citations. I think you also posted there. This is an important discussion to be having now and I wonder if any discussion took place at RootsTech conference (or perhaps put on the agenda for 2012).

We need standard source citation transfer (if that is the correct term). I want to be able to use my favored database program (Legaacy Family Tree) but also have the citations I enter in that program go seamlessly into a different program or in the cloud or whatever. I take time to source and I have been following your saga. I want to know that what goes in in one program can be completely read by another program. I would not think this would be too difficult but will require that the developers of the various programs are on the same page.

No one wants to redo their work; no one wants to go through hundreds of citations to see what was missed in the transfer from one program to another. Everyone wants ease of use, consistency and professional results.

Of course the big question is how do we get it - developers out there need to help the end users as we are the customers. I appreciate your comparisons of what the programs do right now and we need to encourage the developers and programmers make our lives just a bit easier!

Perry said...

I completely agree with your assessment Randy. It would be wonderful if every software package could agree on standard templates and GEDCOM formatting. But it's just never realistically going to happen.

I would be perfectly satisfied if GEDCOM had Footnote, Subsequent Footnote, and Bibliography tags. The user could use their favorite program's templates to create their citations, and when they needed to be imported into another program they could be imported as "free form" citations.

I would rather have all of my formatting transfer with no tag analysis than to have all the tags analyzed and have the formatting screwed up for every single citation.

TreeTraverser said...

I am thankful you are addressing the issue of source citations and source templates. With your popularity as a blogger, you are increasing the exposure of this issue. The ability to easily create, modify and distribute source citations is critically important to genealogical research.

One should not have to manually create free-form source citations when your genealogy program has a source template feature. Source templates should be a tool and not a hindrance. As a tool, it should prompt you for the information necessary and then construct an appropriate citation. It should also allow you to easily modify the citation when new or different information is discovered.

I bought into the idea of source templates with RootsMagic 4. I converted my free-form source citations using its source template feature. In the process I discovered I had incomplete source information, which the templates helped me to fill in. That's great, but now I'm stuck. All the effort I spent on those source citations can only be used within RootsMagic. I cannot export my research from RootsMagic to be used by any other application, even if that application understands the RootsMagic-proprietary format.

For example, the title of one of my sources appears as follows in a GEDCOM source record:

1 TITL Michigan, Michigan Marriages 1868-1925, , , ; digital image, FamilySear
2 CONC ch and Genealogical Society of Utah, FamilySearch (http://Family
2 CONC Search.org : accessed ).

Notice the extra punctuation where missing source information should appear, and also the other missing words. That information is all stuffed in the PAGE tag back in the source citation:

3 PAGE downloaded; 11 June 2008; LDS 2342749; Digital GS 4047122; Image 410; Volume 3, Record 44; John Doe; 10 June 1922

Genealogy software that provides a source template feature should export the resulting, well-formed source citation. Otherwise, the source information is useless to any other program. How a program internally stores and recovers its own template structure is immaterial. There are user-defined GEDCOM tags available for that.

My source citations have been stuck in RootsMagic for almost a year now, and it seems this critical problem may never be addressed. I hope your series of posts force software vendors to address the need to be able to export, share, and use source citations outside of their own programs.