# Interpreting the correlation coefficient

• Fit a linear function for a scatter plot that suggests a linear association (S-ID.B.6c).

• Use available technology to compute correlation coefficients (S-ID.C.8).

• Understand that the correlation coefficient measures the “tightness” of a line fitted to data (S-ID.C.8).

• Understand the significance of correlation coefficients close to 1 or –1 (S-ID.C.8).

• Interpret the rate of change and constant term of a line fitted to data in the context of the data (S-ID.C.7).

Students now know that a calculator has the power to find the line of best fit, and that best fit has something to do with residuals. But how can we gauge just how good a fit that “best” line is? Conversely, how close to linear is a linear-ish set of data? When they use their technology of choice to compute the line of best fit, students will learn to read the correlation coefficient and how to interpret its value.

## Tasks

WHAT: Students use technology to make two different scatter plots. They then generate lines of best fit and interpret them to compare the two data sets, plot residuals, and argue which is a better predictor of used car price; age or mileage. Requiring a robust justification verbally or in writing will ensure students’ engagement with (MP3). Students use the line of best fit to predict the price of a car given the age and mileage.

WHY: This task gives students an opportunity to practice their recently-acquired tech skills and interpret residuals, while learning about the significance of the correlation coefficient. In this task, correlations are negative, an important addition to students’ repertoire of experiences with correlation.

## External Resources

#### Description

WHAT: Students plot total points vs minutes played by each of the starters on the Los Angeles Lakers and Detroit Pistons (MP5). Then, one at a time, students will remove one player's data from the set and determine what effect, if any, the removal of that player's data has on the line of best fit and correlation coefficient (S-ID.C.8). For the Lakers, students will notice that the correlation coefficient is 0.75 when the data for all players is considered. However, when the data for Kobe Bryant is removed, the r-value increases to 0.95; when the data for any other player is removed, the correlation coefficient either stays the same or decreases. This indicates that the data for Kobe Bryant might be an outlier (MP7).

WHY: Students get a detailed view of how adding or removing one point can affect the correlation coefficient. This gives them a perspective on the meaning of the correlation coefficient (S-ID.C.8), and also suggests one effect of an outlier on the measures for a set of data (S-ID.A.3).

#### Description

WHAT: It seems that certain countries are perennial powerhouses in the Winter Games. So, is there a way to use existing data to predict how many medals an individual nation will end up taking home? Two researchers may have found a solution. In this lesson, students create and use scatter plots and lines of best fit to examine several variables that may help predict Olympic performance (S-ID.B.6). Students use the graphs to analyze the predictive power of different variables such as Gross Domestic Product, interpreting the coefficients of fitted lines in the context of the data (S-ID.C.7 ).

WHY: This lesson is an opportunity to examine correlation coefficients, as indicators of the strength of association between the number of medals a country wins and different variables (S-ID.C.8). Students demonstrate their ability to read and interpret scatter plots and lines of best fit by supporting their predictions based on the given data (MP3).

*Note that a paid subscription is required to access this resource.*