The previous section described the spread of data by looking at the range and at boxplots. It began with an example of three small data sets with the same mean, median, and range, but with different boxplots. But even boxplots can hide important differences. Data sets with similar boxplots can be spread out very differently. Here is an example:
Data Set A: 3, 3, 4, 6, 6, 6, 6, 10, 14, 15, 15, 15, 15, 16, 16
Data Set B: 3, 4, 5, 6, 9, 10, 10, 10, 10, 10, 11, 15, 15, 16, 16
(Yes, we chose these numbers so that it would be easy for you to see the main idea.)
1. Find the ranges, means, five-number summaries, and boxplots of Data Sets A and B. How do they compare with each other?
2. Make dotplots for Data Sets A and B. How do they compare with each other?
As you can see from your dotplots, these two data sets are spread out differently along the number line. In Set B, about half of the data is at or very near the mean. In Set A, nearly all of the data is at two values fairly far from the mean, one on each side of it. Set A is an example of a bimodal data set. Two data values that are fairly far apart have a much larger frequency than the other values. The dotplot for such a data set "peaks" in two distinct places.
There's another natural way to analyze how data are spread out. We could look at the difference between each data value and some measure of center, such as the mean. This difference is called the deviation of the value from the mean. We'll start by borrowing some "shorthand" from the graphing calculator; it makes writing these ideas a lot less tedious. Let's agree to represent the mean of a data set by
. (Read "x bar.") The letter x itself will stand for any data item in the set. (A letter that is used to stand for any one of a collection of numbers is called a variable.) Now we can write the deviation of any data item, x, from the mean,
,
as x -
Learning Outcomes
..........................
After studying this section, you will be able to:
Compute the absolute value of a number and of the difference between two numbers;
Interpret the absolute value of the difference of two numbers as the distance between them on a number line;
Compute the mean absolute deviation of data and interpret how it describes spread.