Thursday, February 17, 2011

Software Programs, GEDCOM Files and Source Citations - Some Recommendations

I have experimented quite a bit with Family Tree Maker 2011, Legacy Family Tree 7 and RootsMagic 4  genealogy software source citation creation and transfer (see The Seaver Source Citation Saga Compendium).

I've come to the following conclusions as they apply to Source Citations in genealogy software and transferring them via GEDCOM to another user or via upload to an online family tree:

1)   Each software vendor uses a proprietary set of master Source Templates to represent the Evidence! Explained models and adds the information into a GEDCOM file differently.

Family Tree Maker 2011 uses standard tags for TITLe, AUTHor, PUBLisher, and REPOsitory for Free-form and Template Sources in the GEDCOM file, but adds extraneous words and punctuation to the fields.

*  Legacy Family Tree 7 uses the standard tags for TITle, AUTHor, PUBLisher and REPOsitory for Free-form and Template Sources in the GEDCOM file.

*  RootsMagic 4 does not use the AUTHor and PUBLisher tags, but adds that information to the TITle field for Free-form sources and Template Sources in the GEDCOM file.  It also creates a unique set of tags for sources created from a number of Source Templates based on Evidence! Explained models in the GEDCOM file.

*  Other software programs may have different Source Templates and GEDCOM tags - I've only worked with these three programs.

2)  All three programs convey the PAGE (citation details) and REPOsitory tag information correctly and consistently.

3)  Each program can transfer the Source Citation information in the native program format to another user of the same program.  In other words, an FTM 2011 user can read the file obtained from another FTM 2011 user without loss of source content.

What is important in all of this is that source citation information, whether created by a Free-form or Source Template, be preserved when transferred to another user, including to or from an online family tree.  Many researchers work long and hard to cite their sources and to use recommended source citation models, such as in Evidence! Explained.  Here are three recommendations:

1)  The surest way to have a Source Citation "survive" the GEDCOM experience for transfer to another program, or to an online family tree website, is to create the master Source Citations putting all of the information into one field - for instance, as RootsMagic 4 puts all the author, title and publication information in the TITLe field.  However, this does not cover the Subsequent Footnote and Bibliography entries created by the different Source Templates. 

2)  My second recommendation is simple and should work:  GEDCOM fields for FOOTnote, SUBSequent Footnote and BIBLiography could be created by the software programs (from Free-form or Template models) and the appropriate information put in those fields for transfer of the complete Master Source Citation.  This would preserve the proprietary software models, but would require agreement between software vendors and online family tree websites in order to make it work.

3)  The use of italics (or other formatting) in the Source citations should be included in any improvements made to the GEDCOM standard.

There is a Build a BetterGEDCOM group that seeks to participate in establishing a standard for the transmission, sharing and updating of genealogical information.

At the RootsTech Conference last week in Salt Lake City,  FamilySearch said that the present standard is being evaluated and that FamilySearch would be moving ahead to update the standard (thank you, James Tanner of the Genealogy's Star blog).

Jordan Jones wrote RootsTech 2011: Towards a New Genealogical Data Model on the GenealogyMedia.com blog that discusses the open discussion session chaired by The Ancestry Insider.  He mentioned the FamilySearch statement, and wrote:

"This is an excit­ing devel­op­ment in the inter­sec­tion of geneal­ogy and tech­nol­ogy. If FamilySearch decides to share their work, and if a gov­er­nance body can be iden­ti­fied or set up, and finally if that gov­er­nance body has the trust of the genealog­i­cal com­mu­nity, including:
  • the major desk­top and mobile appli­ca­tion developers
  • the major web databases
  • the NGS
  • NEHGS (New England Historic Genealogical Society)
  • FGS (the Federation of Genealogical Societies)
  • BCG (the Board for Certification of Genealogists)
  • APG (the Association of Professional Genealogists)
"We could be near the start of a much more rich tech­nol­ogy envi­ron­ment. A new data model, address­ing issues with GEDCOM and upgraded and changed through a com­mu­nity gov­er­nance model could lead to inte­grated set of inde­pen­dently devel­oped soft­ware tools that would allow peo­ple to rep­re­sent their research bet­ter than they can with GEDCOM, and bet­ter share their data or move it from one vended prod­uct to another."

Thank you, Jordan, for an excellent report and reasoned suggestions.

I offer my Source Citation studies and my recommendations above in the spirit of cooperation and collaboration, and I hope that they are considered in any creation of an improved  genealogical data standard.

5 comments:

Geolover said...

Randy, your splendid intensive testing of ramifications of certain proprietary software coding has shown what some major weaknesses are.

Now the question is, will the developers be willing to give up having proprietary variants, even if the present groups of "revisionists" (concerning existing GEDCOM or a suitable replacement) should agree on very workable code?

Nolichucky Roots said...

Thank you so very much, Randy, for not only doing the testing, but sharing the results with us. I've been struggling with settling on a database program that would transfer data accurately and you've helped tremendously.

lkessler said...

Randy,

I don't agree with your conclusions.

1. Putting all info into one field is wrong. Reading programs will not be able to interpret what is there. The various parts of the source citation must be identified, and placing them in the various fields: Author, Title, Publication, etc. is the way that reading programs can understand what is there.

2. Fields like Footnote, Subsequent footnote, and Bibliography are not data unto themselves. They are basically formatting and template information. Those are not data, and they are not tags in GEDCOM. Those are personal preferences for how you want your data displayed. Doing the formatting should be the job of the program. As long as it receives all the data it needs and understand the concept of that data, then it can format it appropriately to any template you want that it supports.

3. Italics is another example of formatting. It is not data. This is the same as number 2. The receiving program should do your formatting.

You've done an incredible job testing out the inputs and outputs of transferring the source citations between these 3 program. Thank you.

But you really need to simply identify whether or not these programs are transferring the data correctly, which is THE ONLY important thing. And ignore whether or not any formatting is being transferred, since it shouldn't be.

Then, given the correct data, you can see if the program can format the source citation correctly with the templates it allows.

Remember, GEDCOM is to transfer data. The only formatting it should transfer is formatting embedded in notes and data to allow those to look as much as possible like the original. But how to make everything else appear should be up to the program. The program that does the latter job the best for you will then should earn your favor and become your program of choice.

Louis

GeneJ said...

Your findings have been most valuable to me. Thank you. We've highlighted your posts on BetterGedcom wiki and blog yesterday.

I hope the brilliant minds at BetterGEDCOM will help us devise ways that free users and developers from the current pain.

It's hard for me to believe current GEDCOM doesn't stifle innovation in this important area.

As for users, well ... if a footbridge is too narrow, more people will fall in the river or avoid passing all together. --GJ

Anonymous said...
This comment has been removed by a blog administrator.