False Positive Test Results

No Tags

Alignments to Content Standards: S-CP.B.8 S-CP.A.4

Task

A certain test for mononucleosis has a 99% chance of correctly diagnosing a patient with mononucleosis and a 5% chance of misdiagnosing a patient who does not have the infection. Suppose the test is given to a group where 1% of the people have mononucleosis. If a randomly selected patient's test result is positive, what is the probability that she has mononucleosis? Explain.

IM Commentary

This purpose of this task is to examine in a concrete situation a common statistical fallacy where two conditional probabilities are confused. For this problem the two probabilities in question are:

the probability of the test result being positive if the patient has mononucleosis
the probability that the patient has mononucleosis if the test result is positive.

The given information of 99% is for the former but in reality what the patient is interested in is the latter which can be, as is the case here, far smaller. The two conditional probabilities listed above are related by a result called the Multiplication Rule, the second equation in standard S.CP.8. This result is also called Bayes' Theorem and much background information about this can be found at http://en.wikipedia.org/wiki/Bayes'_theorem.

The confusion between different conditional probabilities lies at the heart of many statistical fallacies such as the prosecutor's fallacy: see http://www.agenarisk.com/resources/probability_puzzles/prosecutor.shtml. Some of these fallacies are easy to identify but others can be difficult to diagnose. More information about testing for mononucleosis is available here http://en.wikipedia.org/wiki/Infectious_mononucleosis. While the precise numbers in this problem are fictitious, the overall scenario is accurate. The importance of false positive test results is discussed in depth here: http://www.agenarisk.com/resources/probability_puzzles/diagnosis.shtml.

Two arguments are presented, the first making a table representing the situation with a hypothesis that 10,000 individuals were tested for mono. This solution is closely related to the 7th grade standards 7.SP.A and 7.SP.C: it is appropriate for high school, however, because the model and sample sizes are not provided. In the second solution, an abstract argument is given using the Multiplication Rule. In the first case, some argument is needed to verify that the chosen number of 10,000 patients does not influence the calculation. In the second case, it is important to realize that the calculation produces a theoretical probability and experimental results could differ.

This task was designed for an NSF supported summer program for teachers and undergraduate students held at the University of New Mexico from July 29 through August 2, 2013 (http://www.math.unm.edu/mctp/).

The Standards for Mathematical Practice focus on the nature of the learning experiences by attending to the thinking processes and habits of mind that students need to develop in order to attain a deep and flexible understanding of mathematics. Certain tasks lend themselves to the demonstration of specific practices by students. The practices that are observable during exploration of a task depend on how instruction unfolds in the classroom. While it is possible that tasks may be connected to several practices, the commentary will spotlight one practice connection in depth. Possible secondary practice connections may be discussed but not in the same degree of detail.

This task helps illustrate Mathematical Practice Standard 6, Attend to precision. Precision must be utilized throughout this exercise by students beginning with communicating their arguments precisely to others and using an appropriate sample size or applying probability rules. They must confirm that a selected sample size does not impact the calculation. When using the probability rules, students must realize that the results are theoretical and the experimental results may differ, especially if the sample size is small. This problem solving process lends itself to a discussion centering on the accuracy of the calculations based on sample size and on the appropriate level of precision that is required in this context. Teachers may guide this discussion with questions such as, “How might the sample size impact the results?” or “Does your solution method impact your calculations? Why or why not?” The real world context of the problem links it to MP.4, Model with mathematics. As with many modeling tasks, students will need to ''Make sense of problems and persevere in solving them'' (MP.1).

Solutions

Solution: 1 Making a table

Imagine the test is given to 10,000 people, 1% of whom actually have mono. Then 100 people have it and 9,900 people do not. Of the 100 people who have it, 99% will test positive and 1% will test negative. Of the 9,900 people who do not have it, 5% will test positive and 95% will test negative. The table below summarizes this information:

	Test is positive	Test is negative	Total number of people
Have mono	99	1	100
Do not have mono	495	9,405	9,900
Total number of people	594	9,406	10,000

So 594 people test positive, but only 99 of them actually have mono. Thus about $\frac{99}{594}\approx 0.167$ or 17% of the people who test positive for mono actually have the disease.

Note that all of the numbers in the table are proportional to 10,000, the hypothetical size of the sample. If the sample size changes, the numbers will change but their ratios remain the same. For example, if 20,000 people were tested, all numbers in the table would double and so the fraction $\frac{99}{594}$ would be multiplied by $\frac{2}{2}$, leaving the value the same. So we can see that our predicted answer of 17% does not depend on the size of the sample (as long as the sample is large).

Solution: 2 Using Probability Rules (S.CP.8)

Here we let $A$ be the outcome ''the patient has mononucleosis'' and $B$ the outcome ''the patient tests positive for mononucleosis.'' The problem is asking to calculate $P(A|B)$, the probability that the patient has mononucleosis assuming the test result came back positive. We have the formula $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)}. $$ Now for the terms on the right hand side of the equation, we are given that $P(B|A) = 0.99$: if a patient has mononucleosis, the test result will come back positive in 99% of the cases. We are also given $P(A) = 0.01$ since 1% of the population has mononucleosis. To find $P(B)$ we let $x$ denote the size of the sample, that is the number of people tested for mononucleosis. Then $$ P(B) = \frac{0.99 \times 0.01x + 0.05 \times 0.99x}{x}. $$ In this expression, $0.01x$ denotes the number of people with mono and $0.99x$ is the number of this group who test positive. Similarly for $0.05 \times 0.99x$, the $0.99x$ term is the number of people who do not have mono and multiplying by 0.05 gives the number from this group who test positive. So the numerator is the total number of people who test positive and the denominator is the number of people in the sample. The $x$'s cancel out giving $$ P(B) = \frac{0.99 \times 0.01 + 0.05 \times 0.99}{1} $$ Putting all of our information together we have

\begin{align} P(A|B) &= \frac{P(B|A)P(A)}{P(B)}\\ &= \frac{0.99 \times 0.01}{\left(\frac{0.99 \times 0.01 + 0.05 \times 0.99}{1}\right)}\\ &\approx 17\%. \end{align}

The fact that the $x$'s cancel out when we calculate $P(B)$ indicates that this probability does not depend on the sample size. In practice, however, if the sample size is too small, then an experimental value found by testing people in a study could differ substantially from this estimate. The larger the sample size, the closer experimental and theoretical probabilities should be.