Let's take a throughput example to further illustrate this deception. Vendor X makes the claim, "In independent tests across a variety of sessions, our firewall product performed signature-based inspections on an average of 31,000 packets per second." Well, this sounds fantastic, especially if other vendors typically achieve rates of only 5,000 to 10,000 packets per second. But look at the data from which this average was calculated:
That last data point is what statisticians call an "outlier." Outliers are any oddball data points that lie way, way out in the tail of a skewed distribution. Setting the statistical jargon aside, it's easy to see that this one extreme data point makes the average entirely unrepresentative of the data. There are no data points at all near the claimed average of 31,000, so in this case the average does a very poor job of representing what is usual in this group of numbers.
A better, and more honest, choice for these numbers would have been the "median." If each data point is a stepping stone, the median represents the point where you are halfway across the river. In this case the median would be 5,600--the middle number in the sorted list above. As a representation of what is typical or usual in a set of data, the median is relatively unaffected by outliers and is thus a safe choice for unruly data. If Vendor X had used the median instead of the average, we would have gotten a much clearer idea of what the product could do in a realistic situation.
6. Connections vs. Transactions