April 29, 2020

The Screening Dilemma

Let's start with a question:

The prevalence of a disease is 1 per thousand. The test has a sensitivity and specificity of 99%. The test result is positive. What is the likelihood of actually being ill?

When asking around I got a lot of different answers, ranging from 1% to 100%. Now, what do you think?

In order to answer the question let's look at what sensitivity and specificity actual mean, particularly under the lenses of  testing. To break it down, what are the outcomes when we run a test? Well there will be basically four reasonable groups in those tests:

  • Group a, who are disease positive and test positive
  • Group b, who are disease negative but test positive
  • Group c, who are disease positive but test negative
  • Group d, who are disease negative and test negative

To borrow from Wikipedia:

Prevalence in epidemiology is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seat-belt use) at a specific time. It is derived by comparing the number of people found to have the condition with the total number of people studied, and is usually expressed as a fraction, a percentage, or the number of cases per 10,000 or 100,000 people.
Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
The positive predictive value (PPV) are the proportions of positive results in statistics and diagnostic tests that are true positive results.

To bring this into formulas:

  • Prevalence  = number of people in sample ill / total number of people in sample
  • Sensitivity = a / (a+c)
  • Specificity = d / (b+d)
  • Positive Predictive Value = a / (a+b)

Back to our little riddle

Using the formulas from above we can run the following calculations. We expect 100,000 tests. Given the prevalence, only 100 are ill. Of these, 99 are positive and 1 negative evaluated. Of the 99900 who did not fall ill, 999 were rated as positive and 98901 as negative. The positive predictive value is therefore 99 / (99 + 999) thus 9.01%.

Despite the high sensitivity and specificity, the probability of actually being ill is only about 9%. This means, that 91% of those who tested positive are over-treated!

This is exactly the problem of a lot of tests: Due to the high number of tests, the sensitivity and specificity have to be well over 99% in order to enable reasonable screening. This also means, the specificity for such tests should be at least 99.9% to get the positive predictive value into a range above 50%.