Making Sense of Ancestry.com DNA Matches

Mike Markowski
mike.ab3ap@gmail.com

After getting my results from an Ancestry.com DNA test, excitement gave way to feeling a bit overwhelmed with DNA match data. It was hard to make sense of it in a detailed way. As a result, I focused on my nearest matches, otherwise working mostly on traditional document searches to build up my family tree. The DNA matches were interesting but didn't seem to be helpful in putting together a tree.

Recently, I gave more thought to how DNA match data might be stitched together in some manner to yield useful information. After a few false starts, I realized that the old computer science technique of an adjacency matrix is the way to go.

Ideas for Linking DNA Matches

Here are some links to descriptions of what I did and to software that implements the method:

Paper Describing Algorithms: The following two links present the same paper in two ways:

Software Implementing Algorithms: Python and Graphviz must be installed on your computer before you can run dna. I use the popular anaconda python distribution.

Many thanks to Erik Mols for writing software, included in the zip file, to convert MyHeritage data to the format this software requires. His code works for the Dutch language site and will need some changes for other languages.

Results!

Here are samples to whet your appetite. It's more useful, probably critical, to limit the size of the output for genealogical research, but this gives confidence that the program can coordinate the many details. Colors and family surnames are not generated by the program, but were manually overlaid to show how family lines are grouped at the grandparent level. Farther out is not reliable until many more people take DNA tests.


Fully interconnected graph

Here is a directed graph showing the same data in a different format. I tend to prefer this type of graph. This is not a family tree, but a graph showing shared DNA, highest at top, least at bottom.


Directed graph

Web Analytics