Tuesday, February 5, 2019

Using GeneticAffairs.com to Create DNA Match AutoClusters - Part 2: The Cluster Graphic

I wrote Using GeneticAffairs.com to Create DNA Match AutoClusters - Part I: Getting the Clusters three weeks ago, describing the login process and getting to the point of creating the Autoclusters for my AncestryDNA matches.  

I actually did the AncestryDNA clusters twice.  The first time I used a range of 20 cM to 900 cM, which would analyze over 1,000 of my AncestryDNA matches (it did over 600 of them).  That worked, but the graphic chart was very "busy" with the names all run together and the cluster images were very small and "dull" because of the size limitations of the HTML graphic.  

So I  increased the minimum cM to 25 cM and received another set of AutoCluster files.  They have essentially the same problems, but analyzed only about 400 matches.  I will use this second AutoCluster result for this blog post.

Apparently, the only way to see easily read names on the chart and receive more vibrant color clusters is to reduce the number of matches by manipulating the minimum and maximum cM values.  I was told that 200 names or fewer works well.  Maybe next time.

A)  When Genetic Affairs sends their email with the AutoCluster data, they sent three files:

AncestryDNA-Profile-yourname.csv.zip of a .csv file.  This file can be unzipped and a spreadsheet can be created with all of the AncestryDNA matches within the specified limits are listed.  This file gave me over 5,000 listed matches with more than 25 cM.

AutoCluster_Ancestry_your_name_date_time.csv.zip.  This file can be unzipped and a spreadsheet can be created with only the AncestryDNA matches that were estimated to be in a cluster of matches - with two or more persons in a cluster.  This file gave me only 398 persons with over 25 cM, in 52 clusters.

*  AutoCluster_Ancestry_your_name_date_time.html.zip.  This file can be unzipped and clicked on to see the AutoCluster in your browser as a graphic.

B)  The AncestryDNA AutoCluster HTML page:

1)  Here is a 100% image of the AncestryDNA AutoCluster HTML page in my Windows 7 browser (two screen captures below):

There are names (all 398 of them!) on the left axis and on the top axis - they are the same names on both axes.  I've not hidden them for privacy reasons because it's impossible to read them.  On the right margin are the 52 different Cluster colors, with the number of persons in each cluster.  In the middle of the screen are the 52 clusters in essentially squares of varying size.  For example, Cluster 1 has 37 members in which almost all match with each other.

You can barely see the clusters on the 100% images above because of the size restrictions of the HTML file.

2)  I increased the magnification of the Window to 500%, and the clusters become a little more visible.  Here is the top left of the chart showing the first two clusters.  I truncated the names a bit so they can't be deciphered:
As you can see, Cluster 1 is orange and has 37x37 dots - they are colored if the name on the left is a DNA match to the name on the top.  If they don't match, the dots are smaller and gray.  Cluster 2 is green and has 27x27 dots.  For some match names, there are larger gray dots outside of their cluster which indicates that they match the other match person, but the other match person is not in their cluster.  These can be great clues because the match in the first cluster is probably related to the matched person in the second cluster.

3)  At 300% magnification, I can see six clusters:
I ran my mouse over one of the dots in cluster 2, and the "names" of the left-hand match and the top person are shown in red.  This helpful.

4)  Down on cluster 6, the first match person in Cluster 6 also matches almost everyone in Cluster 7, and the second person in cluster 6 matches some of the Cluster 7 persons:

This is because Match persons 1 and 2 have more DNA cM with me, and also match the persons in Cluster 7, as I do.  The first match person is my first cousin who shares my grandparents, Frederick W. Seaver and Alma Bessie Richmond.  She also matches persons in several other clusters - since she is descended from all of my 2nd great-grandparents - Seaver, Richmond, Hildreth, White, Rich, Oatley, and Smith, and their ancestors too.

C)  The purpose of these AutoClusters is to try to group my DNA Matches so that I can figure out who the common ancestors are for each Cluster, and also for the Cluster match persons.  

For example, if I think that Cluster 6 is a Seaver cluster, then everyone in Cluster 6 has a Seaver ancestor back in time.  At this point, I'm not sure of that - cluster 6 may be a Richmond or another surname line that I share with my first cousin.

Ideally, there would be a Seaver cluster, a Richmond cluster, a Hildreth cluster, a White cluster, etc.  The persons in each Cluster will not have the same common ancestor that I have with each of my Matches.  For example, my common ancestor at 800 cM (my first cousin) may not have the same common ancestor with me as a match with 30 cM (likely a 3rd or 4th cousin) will have with me.  However, me and my 1st cousin will share a Common Ancestor several more generations back on our Seaver or other surname line with the other cousin.

If I know that a specific cluster has a specific common ancestor surname, then I can identify the autosomal DNA chromosome segments if those DNA Matches have tested on FamilyTreeDNA, 23andMe, MyHeritageDNA and GEDmatch, but not on AncestryDNA (because, unfortunately, AncestryDNA does not provide a chromosome browser).  With that knowledge, I can add the known shared chromosome segments to my DNAPainter file.

I found this AutoCluster chart fascinating, and helpful, but it is impossible to use it because I cannot easily see the names or easily figure out the common ancestors.

D)  The second CSV spreadsheet file is the most useful to me, since I can see names and cM values and who else they match in each cluster.  

The next blog post will demonstrate using a spreadsheet to try to determine the common ancestors for each AncestryDNA Cluster. 


Disclosure:  I have no material connection to Genetic Affairs and am "just a user" of their service.

The URL for this post is:  https://www.geneamusings.com/2019/02/using-geneticaffairscom-to-create-dna.html

Copyright (c) 2019, Randall J. Seaver

Please comment on this post on the website by clicking the URL above and then the "Comments" link at the bottom of each post.  Share it on Twitter, Facebook, Google+ or Pinterest using the icons below.  Or contact me by email at randy.seaver@gmail.com.


Unknown said...

Thanks Randy. The names can be made readable with a script, but they are in listing below the charts, in any case.

Diane Gould Hall said...

I too have played around with Genetic Affairs and got similarly unreadable results. I like the idea though and will go back and play some more.

Unknown said...

OK, if I have this right, copy this into the URL location of your browser when the chart is showing, and it should change the view:
javascript:var foo = document.querySelectorAll('text');for (var i = 0; i<foo.length; i++) { foo[i].setAttribute('font-size', '40%'); };

(be careful to check the copy, as you may have to manually add the "javascript" even after copying)
I did not create this, but it was posted on one of the FaceBook pages, so I apologize for not being able to give proper credit for it.

Bill said...


Just saw Part 2 today and had not previously read Part 1. Very interesting capabilities even if it's not all worked out yet.

Looking at the process to accomplish this registration - giving them your login credentials to DNA testing information - aren't you concerned that your DNA information (and that of your matches) getting out into the wild? What protection is there against the fact that police might use this without a warrant?

Even larger question - who owns this site, could it be "sold" to someone who would use it for undesirable purposes?

Bill Greggs

Randy Seaver said...


My understanding is that the information that Genetic Affairs captures on AncestryDNA is the DNA Match information (how many cMs, how many segments, the Shared matches (with more than 20 cM), and not the actual raw DNA information (the 700,000 lines of ATCG stuff). If that's the case, it's no more than what they give me for my matches, or you for your matches. I can't see your matches, you can't see mine. No other tester can see my raw DNA data or yours.

Genetic Affairs shares my DNA match clusters only with me in an email. I could share it with other matches, but I haven't.

AncestryDNA has not permitted law enforcement access to the raw DNA data or to the DNA match data to date. They would have to change their Terms and conditions to do that. Law enforcement would have to request it by some sort of warrant. To date, FTDNA and GEDmatch have permitted law enforcement to access the DNA match information by submitting a set of raw DNA from the criminal's sample - just "another user." They have used the match data (like "this crook matches Randy, Nancy and Lisa, let's see who their common ancestor is, and then we'll see who the descendants are.") So it's more of a genealogy study with the DNA match data used to identify common ancestors of the criminal's DNA matches.

The fellow who created and owns the site and the program is Evert-Jan Blom who is active on the Facebook group for "Genetic Genealogy Tips and Techniques." (https://www.facebook.com/groups/geneticgenealogytipsandtechniques/).

Bill said...

Thanks for the info, Randy. I looked at the website and I see your point that they (currently) claim that they are just pulling the match info, but in fact the login does give access to you entire account.

Leaves me a little uncomfortable....

Bill Greggs