Monday, February 15, 2016

Hmmm. There are Still No Standards for Names in Family Trees

I wrote Are there standards for names in family trees? on 29 July 2009, and FamilySearch Family Tree Rules for Entering Names on 31 August 2015 in an attempt to standardize by names in my RootsMagic family tree database.

1)  Over the years, I have collected articles and links to proposals for standardizing names, including:

*  Gary Mokotoff, "A Proposed Standard for Names, Dates and Places in a Genealogical Database," Avotaynu, Volume XXIV, Number 3 (Fall 2008), available online at http://iijg.org/wp-content/uploads/2014/02/AVOTAYNU_XXIV_3.pdf.


*  Judith Schaefer Phelps, "Getting It Right: Data Entry Standards for Genealogists," Columbine Genealogical and Historical Society, Centennial, Colorado, 2010,(http://www.columbinegenealogy.com/pdfs/Getting%20It%20Right.pdf).



*  FamilySearch Knowledge Article: "Entering Names in Family Tree" - see https://familysearch.org/ask/salesforce/viewArticle?urlname=Entering-names-on-Family-Tree&lang=en

*  FamilySearch Reading List:  "Rules for Entering Names" - see  http://broadcast.lds.org/elearning/fhd/Community/en/FamilyTreeCurriculum/level01/Adding%20Information/Rules%20Entering%20Names.pdf.

I'm sure that there are more articles like these, and some information in genealogical reference books also.  If you know of any more articles, please make a comment with a link if possible.

2)  In my 2009 article Are there standards for names in family trees?, I had a list of names used by submitters of RootsWeb WorldConnect family trees when they had no given name or no surname.  I updated that list today and added some other alternatives:

* 114,548 entries for surname:  LNU
* 19.160 entries for surname:  MNU 
* 109,333 entries for first name: FNU
* 123,976 entries for surname: Unk
* 5,895,335 entries for surname: Unknown
* 244,951 entries for surname: (Unknown)
*  99,920 entries for surname:  NN
*  11 entries for surname: NNNNN
*  7,170 entries for surname:  XXX
*  7,697 entries for surname: XXXXX
* 25,852 entries for surname: --?--
*  52,069 entries for surname: (--?--)
*  138,478 entries for surname: [--?--]
*  37,147 entries for surname:  -----
*  9,549 entries for surname: _ (1 underscore)
*  10,170 entries for surname: __ (2 underscores)
*  13,609 entries for surname: ___ (3 underscores)
*  35,770 entries for surname: ____ (4 underscores)
*  164,969 entries for surname: _____ (5 underscores)
* 25,926 entries for surname: ______ (6 underscores)
* 1,477,387 entries for surname: ?
* 8,174 entries for surname: ?????
* 79 entries for surname:  Whoknows
* 70 entries for surname: Dontknow
* 44 entries with surname: Mystery
* 1,534 entries for surname: Who
* 40 entries for surname: Who?

I also did the same list for Ancestry Member Trees (using Exact Search):

* 453,540 entries for surname:  LNU
* 297,901 entries for surname: MNU 
* 232,837 entries for first name: FNU
* 300,595 entries for surname: Unk
*  204,803 for given name: Unk
*  4,347,478 entries for surname:  Unknown
*  2,586,736 entries for first name: Unknown
*  188,974 entries for surname:  NN
*  16 entries for surname: NNNNN
*  31,296 entries for surname: XXX
*  10,619 entries for surname: XXXXX
*  7,735 entries for given name: XXXXX
*  0 entries for surname: --?--
*  0 entries for surname: (--?--)
*  0 entries for surname: [--?--]
*  0 entries for surname:-----
*  0 entries for surname: _ (1 underscore)
*  0 entries for surname: __ (2 underscores)
*  0 entries for surname:  ___ (3 underscores)
*  0 entries for surname: ____ (4 underscores)
*  0 entries for surname: _____ (5 underscores)
* 0 entries for surname: ______ (6 underscores)
* 0 entries for surname: ?
* 0 entries for surname: ?????
* 62 entries for surname: Whoknows
* 205 entries for surname: Dontknow
* 13 entry for surname: IDontknow
* 135 entries with surname: Mystery
* 4,512 entries for surname: Who

I did the same search for entries in the FamilySearch Family Tree:

* 2,348 entries for surname: LNU
* 2,057 entries for surname: MNU 
* 1,300 entries for first name: FNU
* >2,500 entries for surname: Unk
* >2,500 entries for surname: Unknown
* >2,500 entries for surname: (Unknown)
*  >2,500 entries for surname:  NN
*  4 entries for surname: NNNNN
*  1,250 entries for surname: XXX
*  171 entries for surname: XXXXX
*  0 entries for given name: XXXXX
*  0 entries for surname: --?--
*  0 entries for surname: (--?--)
*  0 entries for surname: [--?--]
*  0 entries for surname: -----
*  0 entries for surname:  _ (1 underscore)
*  0 entries for surname:  __ (2 underscores)
*  0 entries for surname:  ___ (3 underscores)
*  0 entries for surname:  ____ (4 underscores)
*  0 entries for surname:  _____ (5 underscores)
* 0 entries for surname:  ______ (6 underscores)
* 0 entries for surname: ?
* 0 entries for surname: ?????
* 1 entries for surname: Whoknows
* 18 entries for surname: Dontknow
* 1 entry for surname: IDontknow
* 2 entries with surname: Mystery
* 266 entries for surname: Who

I'm not sure what all of that tells me, except that there are a number of methods of adding names for persons whose names are unknown.  It does appear that Ancestry and FamilySearch require alphabetical characters for their name fields.  

=============================================


The URL for this post is:  http://www.geneamusings.com/2016/02/hmmm-there-are-still-no-standards-for.html

Copyright (c) 2016, Randall J. Seaver

Please comment on this post on the website by clicking the URL above and then the "Comments" link at the bottom of each post.  Share it on Twitter, Facebook, Google+ or Pinterest using the icons below.  Or contact me by email at randy.seaver@gmail.com.


15 comments:

Matthew and Elyse said...

I can tell you I use [--?--] in FTM and sync to my Ancestry account, so that option is in my public tree! I do know that, when entering information directly into Ancestry, Ancestry won't let me use the "[]", so it's just "--?--".

Peggy said...

I've been using Mary H. Slawson's Getting it Right: the Definitive Guide to Recording Family History Accurately, 2002. Her rule is to leave missing given and surnames blank (a very few exceptions). However, there does not seem to be a rule when both given and surnames are missing, which can occur for the unknown mother of known siblings or in recording the place of birth (from the census) for a mother, otherwise unknown. How do we get to a consensus?

Barb said...

I feel that as long as it is pretty clear that the tree owner doesn't have this info, but obviously addressed it, any format will do. No standard needed, but consistency should be used throughout the database. Some of the ones in your list are actually pretty clever.

Densie said...

I don't use FNU, MNU or LNU because I've seen relative try to pronounce those as names and don't want my unknown ancestors stuck with unpronounceable names for eternity.

I use Unknown for first or last names when I know one but not the other and NN for the few times I need to name someone but don't have any name for them at all (a mother of known siblings, for example) or I know facts such as birthplace.

I also leave them blank but some programs I've imported into don't allow that.

Anonymous said...

Appreciate your posting as I'd been looking for some definitive FamilySearch guidance re last name capitalization. Unfortunately a FamilySearch user went through most of the Shackford names I'd posted changing them to SHACKFORD. Frustrating as it makes reports look a bit less professional looking. Sadly she also posted a rude comment but that caused my complaint re the changes to be responded to. FamilySearch personnel got her to stop making the changes and also removed the rude comments. But if she restarts with these unhelpful updates I can use this guidance information from FamilySearch as part of my concern. Have suggested on one of their forums that they add these tips of how to enter names as a formal TIP right near the name block. With all the other name standardization tips and Ancestry's new videos on how to muse these tools, I anticipate a future jump towards name standardization in these online programs.

Louis Kessler said...

Also see: http://www.tamurajones.net/FNULNUMNUUNK.xhtml

Cousin Russ said...

Randy,

I "standard" is 5 underscores "_____" (without the quotes) for ALL unknown Names. For two reasons. 1) it's clear that I don't know the Birth / Given name or the Birth Surname. 2) It does NOT interfere with searching and shaky leaf hints from Ancestry.com.

I have File Notes in my genealogy software for each file I create with that statement so any one looking at my file will KNOW what I have done and I am consistent with this.

When looking at names in the file index or, more importantly, in Charts and Reports I don't have to remember what the "codes" might mean.

By doing so, I have actually found some of those unknown names through Shaky Leaf hints.

I record the Names, exactly the way they are in the Record that I am looking at. If the record clearly is showing a married name, I record that AND add my Unknown name 5 underscores for her surname.

Each of hs had to determine what works for us. When I talk about this subject, and I do, I suggest that you See what it looks like in Charts and Trees. Does that work for you. AND will it work for a family member, or another researcher.

So far, this method has worked for me.

Thanks for bringing this topic up again.

Russ

Anonymous said...

I use [—?—] (em-dashes, not regular dashes) because this format is seen in the NGSQ. If Dr. Jones approves its use that is good enough for me :) If you do a search in FamilySearch for this you will find it because I have my One-Name Study sync'd.

Nettie said...

My method or procedure since days of Family Origins and suggested on one of the email lists back then, was to do Mary or Elizabeth [married surname]. I do not upload any tree to online websites only to my own web site as a pdf.
Why? RootsMagic has an index running right next to main data entry form. How many Mary uk or Elizabeth uk would you have? I had about 30 of them. So my goal for my RM database has been putting the surname between the [ ]. This made my workflow and time element go faster with keeping it consistent. Then I would know immediately which family I was working with.
If you are planning publishing thru NGS or any other genealogical society's quarterly, each has their own criteria. You do have to take what is in your database and submit it as a word processing format, and you can change the [ ] to what ever they suggest.
My point is have your own set of rules/procedure for your database and work within your own procedure.

cleaverkin said...

What Russ Worthington said. Recently I went through the exercise of converting all my unknown given names and surnames from "Unknown" to 5 underscores, mostly for correctness, but also because now they all alphabetize at the beginning of the list.

They also show up in my Ancestry tree that way - apparently Ancestry's member tree search doesn't correctly handle underscores as a search parameter.

I've read that [--?--] (with em-dashes) is possibly "more correct", but as noted previously, some software either can't generate it correctly, or doesn't process it in a reasonable way.

cleaverkin said...

I also dislike the idea of using blanks, as I think it's important to be able to distinguish between "unknown" and "I forgot to enter it".

John said...

I believe that Ancestry.com and FamilySearch.org should issue a joint statement on what is the preferred method of entering a person whose first name or last name is unknown. They should also specify what is acceptable and what is not acceptable. This is the only way we are going to get standardization.
My preference would be for FNU and LNU even against the known objections by the purists.
I also believe the same methodology should be used when indexing things like census records where the wife's maiden name is not known. John and Mary Smith (husband and wife) should be indexed as John Smith and Mary LNU.
John Carruthers
Victoria BC

Cousin Russ said...

John,

Don't think that will happen.

IF it were to happen, then you would need to expand your "joint statement" to include ALL Genealogy software programs to do that same. I was yelled at, by on desktop software technical support that I wasn't entering the Names correctly into their program.

Each of use needs to make our own choice based on our individual needs, of which how and where we use a search engine and/or how we want our Charts and Reports to look.

It goes way beyond those two website.

But, I am only one user.

Russ

Tony Proctor said...

Oh dear, oh dear! There's a huge gulf, here, between software best practices and what people are currently doing, would like to do, or recommending what we should all do.

Any scheme that involves an explicit term representing an unknown field -- name or otherwise -- that should be manually entered is just wrong! Reliance on a manifest substitution that the end-user is in control of is a hangover from the non-digitised world. Ignoring the possible translation and ambiguity issues, it should not be a choice by the end-user. I recommend following Louis Kessler's advice in reading Tamura Jones's post here.

Any good software product will have an option to indicate "unknown" without the end-user having to conjure-up some substitution, and the internal representation of that setting in the database or data files would be irrelevant to the end-user, standardised, AND unambiguous in relation to normal data values.

Fake, Fudged, Dummy, and other such "special" values were bad choices even in the 1970s. Heaven spare us from software design by proxy....

Tony Proctor said...

This article has resurfaced so I'll take a minute to make a point that I'd forgotten last time. Gary Mokotoff's paper (see start of article) falls into the same one-dimensional trap of many data standards: (a) what you see on the screen is what's stored, and (b) a database has to store one name (for person or place) so we'll go with 'this one'.

Both of these are wrong in the world of digital data.

a) The mark-up used BY software doesn't have to be anything like that used 50 years ago when written by hand in ink. Modern data mark-up has to be consistent and unambiguous, but it's not the same as what's seen or entered by the user. The same data mark-up can be used to display any number of variations to the end-user, dependent upon their personal preference.

b) Relational databases are very good at storing multiple names for the same entity; this was a fundamental part of their design. Since people and places both commonly have multiple names then this feature should be used. There are only two excuses for not doing this: ignorance or laziness on the part of the product designers.