Pages

Wednesday, September 3, 2014

Genealogy Source Citations and FHISO - A Simpler Proposal

The Family History International Standards Organisation (FHISO) recently started a public mailing list to discuss family history standards - see the Technical Standing Committee (TSC) page at http://fhiso.org/mailman/listinfo/tsc-public_fhiso.org, and the list archives to date in  http://fhiso.org/pipermail/tsc-public_fhiso.org/

There are some interesting ongoing discussions, but I am easily confused by the terminology that I cannot grasp yet.  A GEDCOM replacement to transfer genealogical data between programs (either online or stand-alone trees or programs) is one of the goals of FHISO, but not the only one.

One of the early discussion items with respect to a GEDCOM replacement was source citations, and how to implement them.

Louis Kessler has submitted a paper concerning sources and citations - see his blog post Standardizing Sources and Citation Templates (posted 28 August 2014).  Louis' recommendations include (see the complete list in his post):


*  FHISO should develop a set of standard source types and source element types.

*  FHISO should use a simple mechanism to transfer the source element values in the standard they will develop.

*  FHISO can allow, but should discourage user defined identifiers. FHISO should accept requests for new identifiers to be added to a future version of the standard.



Some of the discussion on the TSC list has suggested that there should be a standard list of source templates (examples might be Evidence Explained source templates) which could be used.  Other posts have noted that not every genealogist or software program or online database produces source citations in an American form, or a British form, etc., and that source citations should be able to be created in any language and in any format acceptable to the citation creator. After all, FHISO is an international organization.

My observations on this are:

1)  There are over 1,000 source citation templates in Evidence Explained and related publications.  

2)  Not EVERY source record type has a source citation template already created or implemented in genealogy software programs, by record providers, or in online family trees.

3)  Not every genealogist, or record provider, or family tree provider, wants to use source templates for source citations.

4)  Each source template in Evidence Explained may have unique fields.  There may be thousands of unique fields if EVERY source record type were to be modeled.

5)  GEDCOM uses a number of "standard" tags for source citations, including SOUR[ce], ABBR[eviation], TITL[e], NOTE, REPO[sitory], etc.  

6)  Genealogy software programs (notably RootsMagic, Legacy Family Tree, Family Tree Maker and perhaps others) have implemented additional custom tags to handle the Evidence Explained source template complexities.

7)  When the custom source template tags are used, many GEDCOM files do not transfer perfectly from one program or online service to another.  It is important that source citations retain their structure and format when exported and imported.

I have a suggestion for the FHISO TSC team to explore (I had this idea before reading Louis Kessler's post, but seeing his ELEM tags really helped me crystallize my suggestion):

1)  Don't use specific source templates, which in FHISO would probably have to account for differences in languages, alphabets, record terminology, local sourcing practices, etc.

2)  Do use a series of Elements (say 10, or 20, or 50, any number, really) that permit a user to create their own source citation using any source citation creation system that they wish to use.  If they want to use an Evidence Explained citation, then let them define it in Elements that would be presented in the order preferred by the user/creator.  If they want to use a British, or Hindi, or Hebrew, or Russian, or Chinese source citation system, that's fine.  Let the user create it.  We can always plug them into Google Translate to decipher them (I think?).

3)  For example, here is a fairly complex source citation (in "Footnote" or "First Reference Note" format) created by an Evidence Explained source template for an image of a City Directory entry found in an online database:

The Lakeside Annual Directory of the City of Chicago, 1909 (Chicago, Ill.: The Chicago Directory Company, 1909), page 205, entry for "Auble, Charles"; digital image, Fold3 (http://www.fold3.com : accessed 22 May 2010), searching Non-military records > City Directories > Illinois > Chicago > 1909.

I can break that source citation down into Elements like this:

Element 1:  The Lakeside Annual Directory of the City of Chicago, 1909
Element 2:  (Chicago, Ill.: The Chicago Directory Company, 1909), 
Element 3:  page 205, entry for "Auble, Charles";
Element 4:  digital image, 
Element 5:  Fold3 (http://www.fold3.com :
Element 6:  accessed 22 May 2010), 
Element 7:  searching Non-military records > City Directories > Illinois > Chicago > 1909.

4)  Here's another example, an American census record created by an Evidence Explained source template:

1940 United States Census, Worcester County, Massachusetts, population schedule, Leominster, enumeration district (ED) 14-181, Sheet 9-A, Family #202, Frederick Seaver household; digital images, Ancestry.com (http://www.ancestry.com : accessed 12 April 2012); citing National Archives Microfilm Publication T627, Roll 1651.

I can break that source citation down into Elements like this:

Element 1:  1940 United States Census, 
Element 2:  Worcester County, Massachusetts, population schedule, 
Element 3:  Leominster, 
Element 4:  enumeration district (ED) 14-181, 
Element 5:  Sheet 9-A, 
Element 6:  Family #202, 
Element 7:  Frederick Seaver household; 
Element 8:  digital images, 
Element 9:  Ancestry.com (http://www.ancestry.com : 
Element 10: accessed 12 April 2012);
Element 11: citing National Archives Microfilm Publication T627, Roll 1651.

5)  Note that I have included the punctuation and end-of-element spaces in the Elements above.  Italics and underlines might have to be accommodated, or not.

6)  In the simplest case, there could be only one Element for the whole source citation for programs or websites that don't use source templates at all - that just provide a field.  Or the source citation could be broken up into two Elements for the "Master Source" and "Source Detail" that some software programs provide.

7)  The ABBR[eviation] and REPO [ository] fields could be used also for a Master Source title or a repository information.   Source text, source comments, citation text, and citation comments could be accommodated also.

8)  If genealogy software programs, record providers and online family trees adopted this element feature, all they have to do is create the Elements from whatever source template they use to Export them.  All they have to do is string them one after another to re-create the source citation when they are Imported.  The information should transfer from one program or website to another easily without being mangled.

9)  The beauty of this system is that it is simple to use, and can be universal - any type of source template can be used, any source citation stylesheet or practice can be accommodated.  Of course, a poor source citation will remain a poor source citation.  But a good source citation will still be a good source citation reflecting the user's practice.

10)  It should be easy to implement, and easy to understand by users.  If users use source templates, then it is nearly invisible to users unless they look in the GEDCOM-like file.  Even so, it is easily understood (as long as the GEDCOM-like file is in a text format).

11)  I know that I have ignored the "Short Footnote" and Bibliography" source formats that are part of Evidence Explained.  They are part of the source citation creation process in some genealogy software, but not in all programs, and I don't recall seeing them in an online tree.  They are mainly used in Reports and Books created by genealogy software programs.  They can still be part of the specific software program and be used in reports and books.  They just wouldn't be transferred in a GEDCOM-like file.

What do the FHISO leaders think?  What do genealogists think?  Is this workable?  I think that it would be much simpler and easier to implement than thousands of source template types in a variety of languages.  

What have I missed?  Your thoughts and comments and opinions are welcome!

The URL for this post is:  http://www.geneamusings.com/2014/09/genealogy-source-citations-and-fhiso.html

Copyright (c) 2014, Randall J. Seaver



3 comments:

  1. I don't think FHISO will see this Randy unless entered into their discussions, either via their call-for-papes or in their current TSC-public mailing list.

    I don't believe that Louis was suggesting standardising the "templates" (Louis can correct me if I'm wrong). He was thinking about a scheme where the citations are broken down into a number of elements, and where the actual formatted version (which "isn't part of the data") can be generated by some unspecified templating system when required.

    What you're suggesting has been been proposed already, and is largely implemented already in models like STEMMA which has no limits to the number of source types that can be declared, and no restrictions on the elements within them.

    My main comment on Louis's suggestion was that we need to accommodate discursive notes, layered citations, and multiple sources in the same reference note.

    I will put together an example of how each of these work in STEMMA in a blog-post soon.

    ReplyDelete
  2. Randy, thank you for posting this. I can indeed imagine that elements are stored like this, but want to add a thought:

    When you look at the elements in your examples, you can see they have meanings, that, regardless of their names, can be put in categories. With that I mean that for instance a page (in a church book) and a sheet number (in a census), have similar meanings, and can therefore be stored in the same element type. One can do similar things for titles that appear to be used for collections, no matter whether they are blogs or magazines. Same for things like entry for person name or family, which in general terms in an item of interest. When you put terms in categories like that, and make sure that categories are translated, one or two dozen element categories should be enough. These categories can then be translated at will.

    P.S. I removed my previous comment, because I found no other way to remove typos.

    ReplyDelete