This week I’d like to change gears a bit and do a bit of a case study – something I plan to do more of in the future. The case I’ll be covering today is one that I worked on through the DNA Doe Project, Hudson John Doe 2019. This is a case I was involved in on many levels – from the initial contact with the agency, through managing the logistics of the lab pipeline, bioinformatics analysis, and finally co-leading the team of investigative genetic genealogists that led to a successful identification. This case is especially interesting as it is one of the oldest (that I know of) persons identified through the use of investigative genetic genealogy (IGG).
The investigation began on August 16, 2019 when a pine box was unearthed by an excavator during the development of a new subdivision in Hudson, OH. The box / coffin contained partially fossilized, skeletal human remains. Anthropologists from Mercyhurst University were called in for further examination, and they determined that the remains likely belonged to a white male who was approximately 40-70 years at the time of his death. Additionally, they believed he had been buried for at least 50-75 years, but perhaps much longer. The area was not known to be a cemetery, but there were reports that people had been buried on the property in the mid-1800s.
The Hudson Police Department reached out to the DNA Doe Project in October of 2019, and we began working with them in the hopes of identifying these unknown remains. It took a bit of time for us to determine the most suitable lab for DNA extraction from such an old sample. However, at the end of February 2020, the detective shipped the left humerus and right femur to Astrea Forensics in Santa Cruz, CA. Astrea was spun out of the UCSC Paleogenomics Lab, and has extensive experience with ancient DNA / remains. They performed a total of four DNA extractions, two from each of the humerus and femur. Fairly large quantities of DNA were obtained, ranging from 107.2 ng to 412 ng. However, it should be noted that these measurements are based on a fluorometry-based quant of total DNA, and include both the endogenous human DNA we were interested in, as well as exogenous contaminant DNA. You can see a table detailing the extractions below.
We then had Astrea ship all four extracts to HudsonAlpha Discovery of Huntsville, AL for further QC, library prep, and sequencing. HudsonAlpha performed a round of shallow QC sequencing to get metrics such as mapping and duplicate rates. Unfortunately, the mapping rates to the human genome of all four libraries indicated that only a small amount of the DNA in each sequencing library was the human DNA of our John Doe. The remainder was contaminant DNA, mainly bacterial DNA. A table detailing the ALU-based human quant values and mapping rates is included below.
Sequencing a library with such a small portion of informative reads can be cost-prohibitive, but there was a solution to this. One of the libraries was selected to undergo hybrid capture-based whole genome enrichment. This greatly improved the ratio of human to bacterial DNA, and therefore also improved the mapping rate dramatically. The sample went on to full deep sequencing.
My company, Saber Investigations, then handled the bioinformatics analysis. While enrichment greatly improved the mapping rate, the sequencing results were still hampered by the low endogenous DNA content. This led to 91.1% of uninformative duplicate reads. Mean coverage was 0.3X, and only 14.2% of the genome was covered by at least one read. In other words, over 85% of the genome had no sequencing data at all. This necessitated a number of bioinformatics techniques to get the most out of the data. These included mixture/contamination assessment, raw read pre-processing, imputation-based variant (SNP) calling, and filtering of the SNPs to avoid “matchiness” in the GEDmatch batching. Trial and error, using a total of twelve different files utilizing different filtering thresholds, eventually led to adequate matching results.
Once the SNP data file was uploaded to GEDmatch (February 2021), I co-led a small team of DNA Doe Project volunteer genetic genealogists, who began attempting to identify John Doe. The top match shared approximately 169.1 cM of DNA John Doe, which suggested a relationship in the range of a second to third cousin. After further research, it was determined that this person had a great-great-great grandfather that lived in Hudson, OH. Further research into additional DNA matches allowed us to confirm that this was indeed the relevant ancestral line on which our match and John Doe appeared to be related. The team also uncovered a land survey map from the 1850s that showed this family owned property in the approximate location in which he was found.
A third great-grandparent relationship is one essentially never normally seen in traditional genetic genealogy research. However, with DNA shared being approximately halved going back each generation, one can work back from grandparent to third great-grandparent to get an estimated amount of shared DNA that would be expected. This amount fit with what we observed between the DNA match and John Doe.
Based on this genetic genealogy research, the DNA Doe Project and Hudson Police Department announced the identification on November 15, 2021. John Doe was identified as Richard Bunts/Bunce, born in 1793, later dying in Hudson, OH in 1852. This means that at the time of discovery, he had died (and we assume been buried) nearly 170 years prior. These are the oldest remains the DNA Doe Project, and possibly any IGG organization, has identified to date.
Note: This case was also presented as a poster at the International Symposium on Human Identification 2021 (ISHI). You can view a PDF of the poster below.
I can only imagine the (cautious) excitement the team felt as the family tree began to align with the geographic location of the remains. It’s a feeling like no other when the various points of evidence begin to converge!
Pingback: Four Challenges Seen With Forensic DNA IGG Samples - CrimeBench