Tuesday, May 17, 2011

Data Mining and Fixing the Database - an Update

Since my last summary of the status of my genealogy database (Steadily Improving my Family Tree Database) in late February, I've done these tasks in my family tree database using RootsMagic 4:

*  All of my master sources are now in free-form style, but are modelled after Evidence! Explained quality citations.  I completed the free-form census citations last week (there were over 200 to do for the 1900 census).

*  I ran a Problem Report several weeks ago and corrected many of the errors found in the database.

*  My "to-be-entered-into-the-database" pile of papers was about two feet tall, so I sorted it out and put the "best source material" on top and started working my way through it.  The "to-be-filed" paper pile is now about four inches high.  This goes pretty quick because I don't have to enter many new people - it's mostly adding source citations to the database events, dates and places.  There is LOTS more to do here!  I also have quite a few document images on my computer from the FHL and FHC that I need to go through and enter into the database.  A data miner's work is never done!

*  I started making surname notebooks for my 16 great-great-grandparents, but am stuck on four of them.  There is no room in the bookcase for more notebooks.  I'm not sure what I can do, except try to make more piles to make space so that I can create more surname notebooks.  I need to buy more "nice" notebooks and dividers to continue this task (which is actually relatively fun).  If I did one a day, the job would be over sooner.

*  With so many state vital record databases appearing on FamilySearch.org, I've started "data mining" for my ancestral families and my Seaver, Carringer, Auble and Vaux one-name studies.  This involves picking a state and going through the vital records databases for the surnames or family names.  In the process, I've added quite a few marriages and deaths, female spouse's surnames, and mother's surnames to the database.  FamilySearch makes it easy to create the Sources from their Research Wiki page for the collection.  So far, I've done DC, Iowa, Kansas, Nebraska, and New Jersey.  These are mainly IGI extracted data, I think, although it's hard to tell.  There are many more states to go, although my folks were mainly from the northern tier from Nebraska to Maine.  I had the New England states pretty well done years ago before the databases came online.

The database status is now (with additions since the 23 February update):

*  People:  39892 (+ 173)
*  Families: 15830 (+ 85)
*  Events: 105397 (+ 540)
*  Alternate names: 394
*  Places: 4577 (+ 90)
*  Sources: 652 (- 19)
*  Citations: 21130 (+ 938)
*  Repositories: 61 (+ 4)
*  Multi-media items: 0 (+ 0)

The only category that was lower than three months ago is the number of Sources.  I had some duplicates, so I combined them.  I had some "widows" with no citations, so I deleted them. 

At this point in time, the database is about 20% sourced to EE quality (I'm not saying that all of the sources are authoritative!), has standardized place names, and is almost ready to be sent off to Ancestry.com to replace the one that I uploaded in January.  It will be a big improvement.

This has been a challenging task to get to this point over the past year or so.  I'm not done, but I'm a lot happier, and prouder, of the product.


Eileen said...

You should be proud. That was a tremendous amount of work.

I have been following all your efforts and have really appreciated all the work you've done analyzing citations. I would like to apply some of your ideas to my database but I confess that I am a little unsure of what you mean by "free form" citations, especially when it appears that you complete every field in the citation entry window.

geometrydash said...

Greetings! I know this is somewhat off topic but I was wondering if you knew where I could locate a captcha plugin for my comment form? I'm using the same blog platform as yours and I'm having problems finding one? Thanks a lot!
geometry dash meltdown| happy wheels demo |happy wheels game |five nights at freddy's |agario

Regina Hilary said...

Wonderful blog! I found it while searching on Yahoo News. Do you have any suggestions on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get there! Many thanks.
tanki online | animal jam 2 | 2048 game | stick war 2 |stickman games |five nights at freddy’s 2 |five nights at freddy’s 4 |plants vs zombies | gold mine strike | age of war

Unknown said...

Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.

Discover our website bounty of free online games now!
Our website has the biggest collection of free online games. Totally new games are added every day!

age of war 2
gold Miner 2
unfair Mario 2
cubefield 2
tanki Online 2