# Coffee and Crime

Alignments to Content Standards: S-ID.B.6 S-ID.C.7 S-ID.C.8 S-ID.C.9

Many counties in the United States are governed by a county council. At public county council meetings, county residents are usually allowed to bring up issues of concern. At a recent public County Council meeting, one resident expressed concern that 3 new coffee shops from a popular coffee shop chain were planning to open in the county, and the resident believed that this would create an increase in property crimes in the county. (Property crimes include burglary, larceny-theft, motor vehicle theft, and arson -- From http://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/2010/crime-in-the-u.s.-2010/property-crime accessed on December 5, 2012.)

To support this claim, the resident presented the following data and scatterplot (with the least-squares line shown) for 8 counties in the state:

County Shops Crimes
A 9 4000
B 1 2700
C 0 500
D 6 4200
E 15 6800
F 50 20800
G 5 2800
H 24 15400

The scatterplot shows a positive linear relationship between "Shops" (the number of coffee shops of this coffee shop chain in the county) and "Crimes" (the number of annual property crimes for the county). In other words, counties with more of these coffee shops tend to have more property crimes annually.

1. Does the relationship between Shops and Crimes appear to be linear? Would you consider the relationship between Shops and Crimes to be strong, moderate, or weak?
2. Compute the correlation coefficient. Does the value of the correlation coefficient support your choice in part (a)? Explain.
3. The equation of the least-squares line for these data is:

$$\text{Predicted Crimes} = 1434 + 415.7 \text{(Shops)}$$

Based on this line, what is the estimated number of additional annual property crimes for a given county that has 3 more coffee shops than another county?

4. Do these data support the claim that building 3 additional coffee shops will necessarily cause an increase in property crimes? What other variables might explain the positive relationship between the number of coffee shops for this coffee shop chain and the number of annual property crimes for these counties?

5. If the following two counties were added to the data set, would you still consider using a line to model the relationship? If not, what other types (forms) of model would you consider?

County Shops Crimes
I 25 36900
J 27 24100

## IM Commentary

Note: The data in this task are roughly based on actual values but have been modified to facilitate the task and to disguise the counties in question.

This task addresses many standards regarding the description and analysis of bivariate quantitative data, including regression and correlation. Students should recognize that the pattern shown is one of a strong, positive, linear association, and thus a correlation coefficient value near +1 is plausible. Students should also be able to interpret the slope of the least-squares line as an estimated increase in $y$ per unit change in $x$ (and thus for a 3 unit increase in $x$, students should expect an estimated increase in y that equals 3 times the model's slope value).

From a perspective of context, students should consider other variables that may explain the association (e.g., counties with higher populations or higher population density may have both more coffee shops and more property crimes). This would also reinforce the fact that correlation (even strong correlation) does not specifically imply causation. Depending upon student knowledge of experiments and observational studies, a discussion can occur reinforcing the risk associated with implying causation based on data from an observational study. Lastly, students should consider how a trend observed in a small sample of bivariate observations may change drastically with the addition of just a few additional observations.

## Solution

1. The relationship does appear to be linear. The relationship would be considered a strong and positive given how closely the points adhere to a line with positive slope.
2. $r = 0.968$. Since the pattern shown is one of very strong, positive, linear association, a correlation coefficient value near +1 is plausible.
3. $415.7 \cdot 3 = 1247.1$. According to the model, the predicted increase in the number of annual property crimes for a county with 3 additional coffee shops would be 1247.
4. Association (no matter how strong) does not necessarily imply causation. It is unlikely that building a new coffee shop would cause crime rates to increase, for such logic would imply that coffee drinkers engage in more criminal behavior than non-coffee drinkers, the coffee shop attracts criminals to the county, etc. From a perspective of context, students should consider other variables that may be responsible for the association (e.g., counties with higher populations or higher population density may have both more coffee shops and more property crimes). As stated in the "commentary" above, depending upon student knowledge of experiments and observational studies, a discussion can occur reinforcing the risk associated with stating/implying causation based on data from an observational study.
5. With the addition of the two observations, the scatterplot now displays a curved relationship with one outlier at (50, 20800). The scatterplot still shows a positive relationship between "Shops" (the number of coffee shops of this coffee shop chain in the county) and "Crimes" (the number of annual property crimes for the county in the previous year) – but the relationship no longer appears to be linear (or does not appear as linear as before). When only a few observations are used to assess a trend, sometimes just adding one or two points can change the appearance significantly. The new plot is shown below. This relationship might be modeled using a quadratic or an exponential curve.