In my last post, I gave my take on what the review of DNA evidence in the Knox/Sollecito appeal said about the alleged murder weapon. The other item they examined is the clasp from Meredith Kercher’s bra, on which, the prosecution claims, Raffaele Sollecito’s DNA was found.
The main issue for the court with regards to the clasp, as with the knife, looks like being the likelihood or otherwise that the DNA results observed might have been the result of contamination. That’s something I’ll look at in my next post.
Before that, I thought it would be a good idea to consider whether there might be any problems with the interpretation of the DNA found on the clasp as belonging to Sollecito. Clearly, if there are then it may be possible to reject the clasp as evidence before even getting to the question of contamination.
Stefano Conti and Carla Vecchiotti, the authors of the review, do raise concerns about the way electronic printouts from the DNA tests were analysed. If you’ve read the review and, particularly, if you’ve only read the conclusion to the review, you might be forgiven for thinking that the match to Sollecito is in some doubt. However, I don’t think this is a correct reading of what has been written, and I don’t think the authors can possibly have intended to leave that impression. I’ll explain why.
Two different sorts of DNA test were run on the sample from the clasp: a test from austosomal DNA (the type of test most people are thinking of when they refer to a “DNA profile”) and a test looking specifically for Y chromosomes.
In each case, Conti and Vecchiotti draw attention to peaks on the graphs that were originally interpreted as “stutters” – small false peaks which are particularly common to find in readings of mixed DNA samples such as this one, where the DNA of the victim was also present. It’s normal practice to just pretend they are not there. The review, though, questions whether all of these peaks really are stutters. They suggest that any peak over 50 RFU might in theory represent a genuine allele, regardless of their size compared to to the “main” peaks observed.
With regard to the tests run to identify Y-chromosomes in the sample, the review agrees with the finding of compatibility between the sample and Sollecito’s Y haplotype. You can see how this is difficult to deny by looking at page 129 of the review. At each locus point in the computer printout (i.e. each group of peaks you can see), the largest of the peaks corresponds perfectly in each case to Sollecito’s profile.
What the review is suggesting here is not that there is any doubt about that, but that some of the smaller peaks which are above 50 RFU might not be stutters. Instead, they may point to the presence of an additional Y-chromosome or chromosomes other than Sollecito’s. This, in turn, calls into question a working hypothesis of police scientist Patrizia Stefanoni in the original tests: that the DNA she was looking at represented a mixture of the victim plus one other person. I’ll examine the potential implications of this in my next post.
In her autosomal DNA testing of the sample from the clasp, Stefanoni found a match with Sollecito over 16 locus-points. That’s a very strong finding.
To give you an idea of how strong, I’ve done my own calculation. For each locus, I looked up the population frequencies for the relevant alleles (using the highest value present relating to Italy) in this database. Because some of these are overlaps with Kercher’s profile (i.e. they would be present even if Sollecito’s DNA were not) I then erred very much on the safe side and gave them a dummy frequency of 1. I then used the numbers to calculate the genotype frequency for each locus (slightly tedious to explain how that is done – please let me know if you care).
All you then need to do to get an overall frequency (or “random match probability”) for the 16 loci is multiply all the genotype frequencies together and invert the result (i.e. divide one by it).
I’ve either explained all that clearly or I haven’t. In either case, what I came up with was a random match probability of one in about six thousand billion.
I’m not a DNA expert. This is a very rough estimate and there may be methodological issues with my calculation. For example, maybe it would be better to use worldwide frequency figures rather than just concentrating on Italy. But the real point is not the specific figure I came up with, just the fact that is is very, very large. Genotype frequency values tend to be lower than 0.3. So, whatever specific values you use, once you’ve multiplied a load of them together and then inverted, you are guaranteed to end up with a very large number.
The specific problem identified in the review is that at four of the locus points there are peaks present which were disregarded as a stutter in the original testing, but which could be considered genuine peaks, according to Conti and Vecchiotti. You can get an idea of what they are talking about from page 120 of the review. The large peaks seen here are those that match with the victim. The others are, according to Stefanoni, a mixture of stutters and matches to Sollecito.
If we accept the possibility that some of the stutters may be real peaks, then this may have an effect on the random match probability.
There may be a number of potential ways of attempting to account for this. But, even if we were to bend over backwards for the defence and scrap the affected loci altogether, we would still have an overall frequency of one in about 22 billion over the 12 remaining loci. That slashes the odds quite considerably, but it still doesn’t exactly result in a small number.
Now, you might be tempted to formulate a line of thinking, based on the idea that those ambiguous stutters/peaks might represent some unknown person or persons, that perhaps the DNA reading which looks like it matches Sollecito is actually a random combination of DNA from other people. The trouble is, though, that this proposition doesn’t make any difference to the maths. What we’re talking about is just the frequency (to all intents and purposes, the probability) of that particular combination of alleles occurring in sequence. How they got there doesn’t really matter, to the extent that there’s just no realistic statistical possibility that they represent anything other than Sollecito’s DNA.
Still not convinced? Okay, well consider the Y-haplotype DNA. That’s a perfect match over 17 loci. A Y-haplotype match is generally not held to be usable for identification simply because it is not unique. Your Y-chromosome (if you have one) is, barring random mutations, likely to be shared with your father and other Y-chromosomed members of your immediate family. Or, if your Y-haplotype is a very common one, your not-so-immediate family. It’s not impossible that you have exactly the same Y-chromosome as hundreds of thousands of other people. Or it might be a handful. But, without knowing which it is, your Y-chromosome can’t be used to identify you.
Unfortunately, it also seems that available data isn’t extensive enough to reliably assign a population frequency to Y-haplotype. However, it is possible to classify a Y-haplotype as “rare”. At the original trial, Stefanoni referred to having checked Sollecito’s Y-haplotype in something called the YRHD database. You can do this yourself if you want. The search form is here and Sollecito’s Y-haplotype reading is on page 130 of the review. What you’ll find is that, out of 36,477 Y-haplotypes in the database, none match Sollecito’s.
[Note: I redid this search in April 2013, by which time the database had grown to 112,005 Y-haplotypes, but there was still no match for Sollecito.]
I’m thinking of the Y-haplotype as being like the bonus number in a lottery game. It’s not worth much as an identifier on its own, but it’s priceless if you’ve already matched all the main numbers.
NOTE: This post originally contained some incorrect figures and has now been corrected.