Task
Jane wants to sell her Subaru Forester, but doesn’t know what the listing price should be. She checks on craigslist.com and finds 22 Subarus listed. The table below shows age (in years), mileage (in miles), and listed price (in dollars) for these 22 Subarus. (Collected on June 6th, 2012 for the San Francisco Bay Area.)
Age |
Mileage |
Price |
8 |
109428 |
12995 |
5 |
84804 |
14588 |
3 |
55321 |
20994 |
3 |
57474 |
18991 |
1 |
11696 |
19981 |
13 |
125260 |
6888 |
10 |
67740 |
9888 |
11 |
97500 |
6950 |
6 |
36967 |
19700 |
12 |
148000 |
3995 |
2 |
29836 |
18990 |
3 |
32349 |
21995 |
10 |
161460 |
5995 |
4 |
68075 |
12999 |
3 |
30007 |
22900 |
8 |
66000 |
13995 |
10 |
93450 |
8488 |
3 |
35518 |
22995 |
3 |
30047 |
20850 |
8 |
107506 |
11988 |
11 |
89207 |
8995 |
13 |
141235 |
5977 |
- Make appropriate plots with well-labeled axes that would allow you to see if there is a relationship between price and age and between price and mileage. Describe the direction, strength and form of the relationships that you observe. Does either mileage or age seem to be a good predictor of price?
- If appropriate, describe the strength of each relationship using the correlation coefficient. Do the values of the correlation coefficients agree with what you see in the plots?
- Pick the stronger relationship and use the data to find an equation that describes this relationship. Make a residual plot and determine if the model you chose is a good one. Write a few sentences explaining why (or why not) the model you chose is appropriate.
- If Jane’s car is 9 years old with 95000 miles on it, what listing price would you suggest? Explain how you arrived at this price.
IM Commentary
This problem could be used for either a lesson or an assessment, or it could be adapted to a take-home project where students pick a product, collect data, and examine predictors for price.
If this is being used as an introductory lesson, more scaffolding would be needed to lead students to the solution in part (a) that two scatterplots would be the right plots to make. In general, students have a hard time deciding what is an appropriate display for data. In this problem they have to determine that the variables are quantitative, and that a scatterplot is a nice way to display a relationship between two quantitative variables. Once they have made scatterplots of price versus mileage and price versus age, they need to practice verbalizing what they observe by describing direction and strength, as well as a form such as linear or quadratic.
Both price versus mileage and price versus age scatterplots show a strong linear relationship. The appropriateness of a linear model can be confirmed by making a residual plot – the residual plot shows no pattern that would indicate that a linear model might not be an appropriate way to describe the realtionship. Note that there are several common forms for residual plots. It would be appropriate to plot residuals or standardized residuals versus either the explanatory variable or the predicted values. The form of the plot will be similar for any of these residual plots.
Although technology makes dealing with a data set this size much easier, the problem could be done by hand as well. You might want to reduce the number of data points used. Once the scatterplot is sketched, students can practice approximating the line of best fit with a line fitted "by eye". Residuals can also be measured and plotted by hand in this case.