Thursday, March 1, 2012

Puzzling Over the Evidence-Conclusion Process

A lot of thinking, and writing, was done on this subject during 2011 and early 2012 in the BetterGEDCOM Project.  See these pages:

*   Evidence and Conclusion Process: What is the Evidence and Conclusion Process and why is it important to BetterGEDCOM (I think this was a group effort). The description of the process is:

"The Evidence and Conclusion Process consists of the following steps:
  1. A researcher finds a source of information that contains evidence that mentions persons he may be interested in. He creates a record to document the source.
  2. The researcher creates records to document each item of evidence in the source that mentions the persons of interest.
  3. The researcher creates records to document each event mentioned in the evidence, where an event describes something that happened to one or more persons. Events occur at specific times and places, involve one or more persons as role players, and may serve to establish or change relationships between persons.
  4. The researcher creates records to document each person mentioned in each event, containing only information available from the evidence. The event and associated person records must be treated as a cohesive whole since the person records may hold information that is true only within the context of the event, e.g., the person’s name or age or place of residence, at the time the event occurred.
  5. The researcher continues this process, completing steps 1 to 4, for a number of sources, until he has built up a number of groups of associated event and person records that contain all he has discovered about a set of persons.
  6. The researcher reasons about the available person records and sorts them into groups, where each group contains the person records that the researcher believes refer to a single real person. The researcher builds these groups based on experience and good practices, and records the justification for each grouping decision. Because later evidence may prove some groupings to be incorrect, the grouping operation cannot destroy or remove original event or person records. Groups need to be supported by the model.
  7. The researcher reasons about the event records associated with the persons in each of the person groups in order to infer the relationships that existed between the real persons represented by the groups. The data model allows him or her to establish these relationships, possibly through new records that represent inferred genealogical events, or possibly by establishing relationship links between groups."

*  Research Process, Evidence & GPS -- contains some diagrams and process descriptions developed in a collaborative environment led by Adrian Bruce.  The research steps outlined include:


In particular see the “input” and “output” Adrian discusses as in his steps: 
  1. Set a focused goal
  2. Create or revise research plan
  3. Carry out research -- Understand the Records
  4. Select & Analyse the Evidence
  5. Has the Objective been met for this Work-Portion? -- Record Conclusions
  6. Go onto next work portion in research plan
  7. Check overall goal has been met
*  How do scholarly genealogists approach the evidence process? -- provides discussion by GeneJ Composer and links to blog posts by Mark Tucker and The Ancestry Insider.  

Louis Kessler, in  Evidence and Conclusion Modelling in Behold, summarizes the Evidence/Conclusion Modeling process as: 
Problem: Your current program has no ability to properly document the evidence you used and the conclusions you formulated.
Solution: Behold’s source-based data entry is the first step. Each source you use becomes evidence - evidence you use to formulate a conclusion. While you enter your sources, Behold will make it easy for you to add your conclusion information to your family information while quickly linking it to the evidence.
Benefits: You’ll never forget how you arrived at your conclusions. You will in the future be able to update your data with confidence as you compare your new evidence to your past evidence to allow you to properly modify your conclusions.

That's probably enough for my readers to chew on... I don't seem to have the patience to read and understand completely everything on these sites - or I'm not smart enough to figure it out.  Probably both.  If there are other explanatory websites or blog posts, please tell me in Comments and I'll add them to the list above.

Frankly, I'm confused.   If I enter a Source into my genealogy program (I can do that...), and identify all of the assertions contained in that source (I can do that - e.g., a name, parents names, a birth date, a birth place), what then do I do with that evidence?   I think I have to have (or create) a person (say Devier J. Smith) to attach the evidence assertions to, right?  Do I need to wait until I've gathered many bits of evidence (say from a Bible entry, a family paper, an obituary, a biography, some census records, etc.), then I attach all of them somehow to a person that I've concluded they belong to?  How is the evidence kept straight without identifying the person referred to?  Do I have to add twelve Devier J. Smith's (with various name spellings and birth dates) because I have 12 different sources for his birth name and birth date?  

When do I form a conclusion about a family structure based on the evidence collection?  The parent-child relationship is probably the most important assertion of all for the family tree!   I haven't sourced ANY relationship assertions yet in my database.  It never has crossed my mind until now.  What original sources provide irrefutable evidence of a parent-child relationship?  A birth record, a Bible record or a baptism record, I guess, if they are contemporaneous with the event, but "official" government vital records are available only since the 1800s, and in some states, the early 1900s.  'Tis a puzzle, methinks!

I appreciate that the BetterGEDCOM folks, and Louis Kessler, Tim Forsythe, the Ancestry Insider, and others have been thinking about this issue for some time.   I wish that I had a better handle on it!


Copyright (c) 2012, Randall J. Seaver.


Updated 8 p.m.  Added a section provided by GeneJ in Comments.  Thanks, GeneJ!

10 comments:

Russ Worthington said...

Randy,

I totally agree with you and I tried to participate, actively, in Better GEDCOM. I asked the same questions you have, and at the same place, confused. (or, in my case, not smart enough to figure it out.

What is interesting, is that Louis didn't know that there was a program "out there" that might be considered Evidence Based. I have tried to explain how I enter Evidence, and I think I can be considered Evidence based.

The one difference between the Better GEDCOM folk, is that most of the active members are Developers. GeneJ has been very active from the beginning, but uses a very different program then you or I use. Whey didn't want to hear what I had to say, so I stopped saying anything, but read all of the discussions on their Wiki.

Bottom line, I am still confused. But, at this point, I am not willing to change how I operate.

Thank you and I do appreciate your comments.

Russ

GeneJ said...

The work done by Adrian Bruce, “Research Process, Evidence & GPS” represents a BetterGEDCOM collaborative effort to describe a modern research process.
http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS

In particular see the “input” and “output” Adrian discusses as in his steps:
1. Set a focused goal
2. Create or revise research plan
3. Carry out research
3.5 Understand the Records
4. Select & Analyse the Evidence
5. Has the Objective been met for this Work-Portion?
5.5 Record Conclusions
6. Go onto next work portion in research plan
7. Check overall goal has been met

Also, the various points he makes (Caveats, Application Software, Data Analysis) following his first diagram.

It’s in this plainspoken and descriptive manner that we, as genealogists, need to begin to relate to technologists.

I’ve read the alternative 7 steps you quoted in your article many times. I don’t work that way. Russ, who commented just above, will recall the many times we discussed just that. –GJ

Dr. Bill (William L.) Smith said...

I still do not believe I (consciously chosen rather than 'we") ever want to get to the place where "everything I do in family history research' is on the computer. That seems to be the direction of the 'technologist' (all my IT friends, in all fields). The computing devices are great for recording and reporting. But, I still want to write my 'research report' myself, based on my analysis of the facts, the evidence, the conclusions - whatever you want to call them. That cannot (ever) all be on the computing device, controlled by the computing device - the be all, end all - of everything.
Perhaps I really am to the state of being 'old-fashioned' - but that is where I'm as when I read this. Off-base, or on? ;-)

Louis Kessler said...

"If I enter a Source into my genealogy program (I can do that...), and identify all of the assertions contained in that source (I can do that - e.g., a name, parents names, a birth date, a birth place), what then do I do with that evidence? I think I have to have (or create) a person (say Devier J. Smith) to attach the evidence assertions to, right?"

Yes, in my opinion you are correct. I think of the "evidence" is what a source becomes when it supports an assertion. You need to link the assertion to the source for the source to become "evidence".


"Do I need to wait until I've gathered many bits of evidence (say from a Bible entry, a family paper, an obituary, a biography, some census records, etc.), then I attach all of them somehow to a person that I've concluded they belong to? How is the evidence kept straight without identifying the person referred to?"

It depends on the program. If your program, like most programs, attach the sources to assertions/conclusions, then yes you'll have to create the person first. But if your program, like only a few do, allow you to create source details that are not attached to anybody, then you can enter your source data first if you want, and attach it later when you want.


"Do I have to add twelve Devier J. Smith's (with various name spellings and birth dates) because I have 12 different sources for his birth name and birth date?"

That's up to you. Some people want to do it that way and will tell you it's the correct way of doing it. But I think that makes it hard to differentiate the assumed correct from the incorrect. My personal preference is to enter just one Devier J. Smith with what I think is the most likely correct spelling and birth date. I'd attach him and his birth event to the 12 different sources and I'd state in my "assertion/conclusion" (a comment for the person and birth event) the reasoning why I think this is correct. Someone can then review the 12 linked sources and see if they agree with your conclusion.

Louis

Geolover said...

Randy, The current post is interesting (to me) as it sort of alludes to a major weakness in GEDCOM-based programs. They do not help to evaluate the evidence. One can state a source, put all data from the source in note or comment or document transcript, but there's no major help in comparing what shows what (I do spreadsheets or WP tables for this) or evaluating quality. Note I do not much care for simple numeric ratings because the nature of source, informant, etc. can have more or less subtle differences.

You say, "When do I form a conclusion about a family structure based on the evidence collection? The parent-child relationship is probably the most important assertion of all for the family tree! I haven't sourced ANY relationship assertions yet in my database. It never has crossed my mind until now."

This is really a key element of the present GEDCOM system that is lacking as in the first paragraph above. Plus, the 7-Step Program breaks down in the GEDCOM system when you don't know which one of an uncertain number of individuals by the same name will turn out to be the son Willie Wonka mentioned in Charlie Wonka's will, and the goal is to establish which Willie is the one who was father of already proven ancestor Sam Wonka.

It took careful examination of tax lists that allocated lands to the sundry Willies, the wills of two of the Willies, two out-of-state Deeds by one Willie and Susie his wife, a home-County Superior Court record of Willie and Susie acknowledging one of the aforesaid Deeds, a home-County deed by another Willie and Maggie his wife disposing of the land bequeathed in one will (the only mention of Maggie in a record other than her marriage record and her father's will that did not give her married surname), and extensive acquisition and disposal of land by one of the Willies and, after his death, by his heirs, plus similar but lesser activity by the younger Willie who made a will. All this established that there were three Willies: father, son and son-in-law. Tracking all the associated persons and building the family relationships was fun but complex, and the relationships were not really proved without all of the life-path elements.

Then there is the extensive court record of an estate which includes many depositions regarding the financial business of the decedent and of his administrator. Sundry court records identify the administrator as a son of the decedent, but who exactly asserted this is not always clear. The administrator made a deposition concerning the decedent's financial business with his third wife, and in passing the administrator notes he is "supposed to be" son of the decedent. Yoicks. The conclusion-based genealogy programs don't give much of a way to point to this statement. Only a GPS really could do it justice, but then what should the conclusion be? I've been tempted to just put it up for a vote.

Taco Goulooze said...

Well Bill, it's still you who has to set a research goal, and it's still you who has to find a source of information, it's still you who has to enter that information. The fact that you set a goal already implies you have an idea where your research should take you, both physical (to find the information needed) and mental (to make sense of the information gathered). We should never forget that whatever software you use, it should be used to assist you with your work, not to restrict how you work.

That said, I would like to point out that that point 1 of that first set of 7 points can never be a starting point, as it already assumes that the researcher knows what he or she is looking for, and where to look for it.

abercrombie uk said...

Hi dear. Thanks for your sharing, I just need them,it is very kind of you.

Tim Forsythe said...

GEDCOM supports an ASSOciation tag that can be used to document those parent-child relationships. Unfortunately, there are no predefined RELAtive types, so applications must agree on what strings to use. I use "father" and "mother" for instance. I also use ASSO tags for other relationship claims such as heir, grandparent, nepos, etc. I generally don't use them for non-ancestral claims such as siblings, cousins, etc, unless I think it might become relevant later on to help clarify relationships for less than clear family descents like what are sometimes found in medieval genealogies. Since it can be a lot of work, we have to decide how diligent we want to be. Frankly, it has become a habit for me, anytime I add a person, I immediatly set the parental associations of their children (I only reference a person from their children, not both ways).

I have had people complain to me that their software does not support this tag. Software is meant to make things easier, not to restrict them or make them conform. This is why I used a GEDCOM editor that supports ALL tags and record types, and allows adding new user tags when and where I need them.

As far as data entry, we may be over thinking the entire process. The point of evidence-based genealogy is to make sure we document separately every unique claim for each person. It doesn't matter which record is entered first and which last. People should do it in whatever way they are most comfortable with. Obviously if the person's record already exists, which it does in many cases, then the next step is probably to enter the source, and then fill in all the claims. If you want to enter the claims first and then go back and add the source, have at it. As long as the end result is the same it doesn't matter. If you are not sure if the claims belong to the same person, it would probably be better to add seperate individual records and merge them later if need be, than to try to break them apart later. We've all had plenty of practice at both.

None of this prevents genealogists from putting pen to paper to derive conclusions. It is simply a method used to gather all the relevant information about a person, and document where it came from. How it is used from there is open to our imaginations.

Ginger Smith said...

Hi Randy, I too have not had the patience to read through all of the do's and do not's of better gedcoms and what to do with our evidence.

However to address your question of what to do with 12 pieces of evidence that mention 12 Devier J. Smiths...I have this problem with my Godwins in Sampson Co., NC.

There are several Nathan Godwins. I have identified one of them as mine. I have collected all of the evidence in an excel spreadsheet. That is what I use to compare and contrast. I only have them attached to a "Nathan Godwin." These pieces of evidence do not go into my genealogy software until I am sure they belong to my particular Nathan Godwin, ie, one document identifies a relationship in terms of naming his brother or children. Land deeds and grants are the only pieces of evidence beyond vital records (and vital records are not existent in this time and place) that I have been able to track and confidently place in my gen software. So my answer is this: I use a combination of both genealogy software and computer resources like MS Word and Excel to save, track, and work with my evidence.

PS. I wrote a post about being an evidence-based genealogist this morning in this post: http://genealogybyginger.blogspot.com/2012/03/am-i-evidence-based-genealogist-or.html and how I use a hybrid system in my Rootsmagic software.

ACProctor said...

Re: “I still do not believe I (consciously chosen rather than 'we") ever want to get to the place where "everything I do in family history research' is on the computer.”

This is an important point Randy. Although I’m primarily a technologist, I also appreciate that there isn’t – and never will be – a single way of researching, documenting, and storing family history data. It worries me that we may be trying to overly prescribe how things are done for the benefit of computer orientated storage.

I made a passing comment on BetterGEDCOM , fairly recently, that may have been lost in a Lilliputian discussion of Persona. The essence of that comment was that any new format for the exchange and long-term storage of our data must be able to represent all our data without bias or presumption about the process used to obtain it. In other words, it should be as applicable to rigorously and methodically derived data as to the naïve collecting of names and dates.

This is a very fine line. It doesn’t dilute these discussions of best practices but a data format should be more concerned with being able to distinguish the types of data (and to link them together) rather than mandate a specific process for deriving them.