Why Randomize?
Task
For the 100 rectangles shown, the small squares that compose them have area 1, so the area of the rectangle is the number of squares, e.g., rectangle 7 has area 12.
We want to estimate the mean (average) area of the population of 100 rectangles using the average area of a sample of 5 rectangles. We will use two methods and try to decide which method is better.
- Take a quick look at the chart and then select 5 rectangles as your sample. This is your judgment (J) sample.
- Calculate the mean area of the rectangles in your judgment sample. Combine your mean with those of the other members of the class to form a plot of the sample means.
- Describe the shape, center and spread of the distribution of sample means from the judgment sample.
- Now select at random 5 distinct numbers between 1 and 100 and mark the 5 corresponding rectangles on the chart. This is your random (R) sample.
- Repeat step (b) for the random samples.
- Repeat step (c) for the random sample means. Describe the key differences between the two distributions of sample means.
- Decide which sampling method you think is the better one for estimating the population mean by a sample mean, and explain your reasoning.
IM Commentary
The key to much of statistical inference lies in finding good estimates of the shape, center and spread of a large population using the shape, center and spread of a distribution of sample statistics (the sampling distribution). One way to estimate the mean is to take a number of random samples from the population, calculate their means, and consider the distribution of the results, the sampling distribution. Random sampling produces sampling distributions that center on the true value of the mean being estimated. In this case we say that the method of estimation is unbiased.
In addition, the distribution of means of random samples has a standard deviation that can be anticipated in advance by methods of statistical theory that will come later in the studentsâ study of statistics.
The exercise demonstrates that judgment (non-random) samples tend to be biased in the sense that they produce samples that are not balanced with respect to the population characteristics of interest. In this example, the judgment samples tend to favor the large rectangles.
Solution
Here is a typical display of resulting sample means for a class of 28 students.
The judgment sampling distribution has a mean of 10.14, while the random sampling distribution has a mean of 7.39. (The actual population mean is 7.4.) The judgment samples tend to contain more of the larger rectangles, perhaps because the eye is drawn to them more easily.
Note that both distributions tend be approximately normal (or at least mound shaped) in distribution, but the distribution from random sampling tends to have less variability.
Random sampling gives unbiased results (the distribution of sample statistics centers at the population parameter being estimated) with smaller variation than most other methods.
Why Randomize?
For the 100 rectangles shown, the small squares that compose them have area 1, so the area of the rectangle is the number of squares, e.g., rectangle 7 has area 12.
We want to estimate the mean (average) area of the population of 100 rectangles using the average area of a sample of 5 rectangles. We will use two methods and try to decide which method is better.
- Take a quick look at the chart and then select 5 rectangles as your sample. This is your judgment (J) sample.
- Calculate the mean area of the rectangles in your judgment sample. Combine your mean with those of the other members of the class to form a plot of the sample means.
- Describe the shape, center and spread of the distribution of sample means from the judgment sample.
- Now select at random 5 distinct numbers between 1 and 100 and mark the 5 corresponding rectangles on the chart. This is your random (R) sample.
- Repeat step (b) for the random samples.
- Repeat step (c) for the random sample means. Describe the key differences between the two distributions of sample means.
- Decide which sampling method you think is the better one for estimating the population mean by a sample mean, and explain your reasoning.