Tuesday, May 17, 2011

Data Mining and Fixing the Database - an Update

Since my last summary of the status of my genealogy database (Steadily Improving my Family Tree Database) in late February, I've done these tasks in my family tree database using RootsMagic 4:

*  All of my master sources are now in free-form style, but are modelled after Evidence! Explained quality citations.  I completed the free-form census citations last week (there were over 200 to do for the 1900 census).

*  I ran a Problem Report several weeks ago and corrected many of the errors found in the database.

*  My "to-be-entered-into-the-database" pile of papers was about two feet tall, so I sorted it out and put the "best source material" on top and started working my way through it.  The "to-be-filed" paper pile is now about four inches high.  This goes pretty quick because I don't have to enter many new people - it's mostly adding source citations to the database events, dates and places.  There is LOTS more to do here!  I also have quite a few document images on my computer from the FHL and FHC that I need to go through and enter into the database.  A data miner's work is never done!

*  I started making surname notebooks for my 16 great-great-grandparents, but am stuck on four of them.  There is no room in the bookcase for more notebooks.  I'm not sure what I can do, except try to make more piles to make space so that I can create more surname notebooks.  I need to buy more "nice" notebooks and dividers to continue this task (which is actually relatively fun).  If I did one a day, the job would be over sooner.

*  With so many state vital record databases appearing on FamilySearch.org, I've started "data mining" for my ancestral families and my Seaver, Carringer, Auble and Vaux one-name studies.  This involves picking a state and going through the vital records databases for the surnames or family names.  In the process, I've added quite a few marriages and deaths, female spouse's surnames, and mother's surnames to the database.  FamilySearch makes it easy to create the Sources from their Research Wiki page for the collection.  So far, I've done DC, Iowa, Kansas, Nebraska, and New Jersey.  These are mainly IGI extracted data, I think, although it's hard to tell.  There are many more states to go, although my folks were mainly from the northern tier from Nebraska to Maine.  I had the New England states pretty well done years ago before the databases came online.

The database status is now (with additions since the 23 February update):

*  People:  39892 (+ 173)
*  Families: 15830 (+ 85)
*  Events: 105397 (+ 540)
*  Alternate names: 394
*  Places: 4577 (+ 90)
*  Sources: 652 (- 19)
*  Citations: 21130 (+ 938)
*  Repositories: 61 (+ 4)
*  Multi-media items: 0 (+ 0)

The only category that was lower than three months ago is the number of Sources.  I had some duplicates, so I combined them.  I had some "widows" with no citations, so I deleted them. 

At this point in time, the database is about 20% sourced to EE quality (I'm not saying that all of the sources are authoritative!), has standardized place names, and is almost ready to be sent off to Ancestry.com to replace the one that I uploaded in January.  It will be a big improvement.

This has been a challenging task to get to this point over the past year or so.  I'm not done, but I'm a lot happier, and prouder, of the product.

1 comment:

Eileen said...

You should be proud. That was a tremendous amount of work.

I have been following all your efforts and have really appreciated all the work you've done analyzing citations. I would like to apply some of your ideas to my database but I confess that I am a little unsure of what you mean by "free form" citations, especially when it appears that you complete every field in the citation entry window.