Signal Detection Theory: How We Evaluate When Our Judgements Fail Us

By: 

In 2002, a jury convicted Lana Canen of the murder of Helen Sailor based primarily on a partial fingerprint found on a pill bottle in the victim’s house in Elkhart, IN. At 44 years of age, she received a sentence of 55 years in prison.

She was later exonerated in 2010 after a private fingerprint investigator found that the print recovered at the scene was not a match for Canen’s. This case represents one of the relatively few false positives that exist within the field of latent print examination. Research has demonstrated that, as a whole, latent fingerprint examining has a very low false positive rate (3).

Mammograms, widely used early detection screenings for breast cancer, were introduced in the early 1960s and was were widely hailed as a method of detecting early stage breast cancer (1). Hundreds of thousands of women received annual mammograms to identify abnormal tissue within the breast.

However, unlike fingerprint examining, mammography has a very high false positive rate, with thousands of women being called back for unnecessary biopsies and surgeries every year (2). While these two examples represent two opposite ends of the spectrum, one with a very low false positive rate, the other with a very high one, they belie vital elements behind decision making and allow us to quantify how often we come to the correct conclusion.

The difference in the false positive probability between latent fingerprint examination and breast cancer screening also demonstrates how consequences factor into decision making. The implications for a false positive in a mammogram seem low compared to sending an innocent person to prison. However, new data has emerged surrounding the negative psychological consequences of false-positives in mammography.

As a result, we not only need new methodologies, but also a method to evaluate the effectiveness of new breast cancer screening techniques. Signal detection theory allows not only for a clear determination of when a screening technique is inaccurate, but can also be used in the comparison of different test to determine which methodology is best at detecting the signal from the noise.

Signal detection theory allows for the evaluation of the probability of a false-positives vs. the probability of a true-positives or "hit." Imagine two distributions, one that gives the population of samples that are noise and a second that has the distribution of samples that contain the signal that we are looking for that is slightly to the right of the first distribution.

There will be some overlap between the two groups, and it is where we place our "critical" threshold that will determine how many times we get a correct "hit" or an incorrect "false alarm." One way to decrease the probability of a false positive would be to move the criteria threshold further to the right. However, in doing this, the number of true-positives detected also decreases. 

This relationship between false and true positives hints at the fundamental problem seen in mammography. If the criteria are made stricter to avoid unnecessary biopsies, the probability of missing cancerous samples also decreases.

We can change the requirements, but we cannot reduce the overlap between the two distributions, or the sensitivity of the test without a change in data collection. The sensitivity is a feature inherent to the data. The metric d’ is used to describe the overlap between the two distributions.

For example, a d’ of zero would indicate that the two distributions are identical. As d’ increases, the overlap between the two distributions decreases, and in practical terms, the sensitivity increases.

Imagine a hearing exam where you are tasked in identifying notes as high or low pitch. Notes that are clearly very high or very low are easy to categorize, but notes that are midrange become harder and harder to categorize. As the overlap between the high and low distributions increases, the risk of miscategorizing will become higher. The same risks apply in mammography.

Ultimately, our ability to find the signal within the noise is dependent the degree of sensitivity inherent in our test, meaning that the more different a cancerous mammography is from a non-cancerous mammography, the more accurate our final decision making.

So what happens when we cannot know the distributions of the two groups in advance, and the only data that we have is the false-positive and the true positive probabilities? This is where empirical Receiver Operating Characteristics (ROCs) are appropriate.

ROC curves graph the False-positive probability on the X-axis and the True-Positive probability on the Y-axis. A scatter plot is then made using the probability of the false-positive and true-positives at different critical points.

The eventual curve shows the d’ of the distributions. A straight line, originating from (0,0) and ending at (1,1) demonstrates that the d' for these data is zero. As d’ increases, the curve will shift further towards the upper left-hand corner.

Going back to the previous examples, how is this applicable to the real world? Signal detection and ROC curves can be used to evaluate any test and can be used to detect differences in the sensitives between different methodologies. If there were another way to screen for breast cancer, a ROC curve could be used to demonstrate the difference between a traditional mammogram and the new method by quantifying the differences in sensitivities.

A difference in slope in the ROC curves would indicate that the new methodology had a larger d’ and thus more sensitivity that can be used to distinguish cancerous samples from non-cancerous ones. Signal detection and ROC curves are not limited to assessing decisions made by human judgment.

They also aid in evaluating the accuracy of judgment calls made by algorithms. By rigorously evaluating our decision making, we can identify where we fail to determine the correct signal and design better screening techniques to better separate the signal from the noise.

  1. Institute of Medicine and National Research Council. (2001). Mammography and Beyond: Developing Technologies for the Early Detection of Breast Cancer: A Non-Technical Summary, Patlak M., Nass S. J., Henderson I. C., & Lashof J. C. (Eds.). Washington, D.C.: The National Academies Press.
  2. Salz T., Richman A. R., & Brewer N. T. (2010) Meta-analyses of the effect of false-positive mammograms on generic and specific psychosocial outcomes. Psychooncology, 19(10), 1026-1034.
  3. Ulery B. T., Hicklin R. A., Buscaglia J., & Roberts M. A. (2011). Accuracy and reliability of forensic latent fingerprint decisions. Proc. Natl. Acad. Sci. U.S.A., 108(19), 7733-7738.