Lie with statistics

6 Apr

I just finished reading a book How to lie with statistics written by Darrell Huff, first printed in 1954. It is an enjoyable book with jokes, satires and cartoons. It talks about the blunders, misinterpretation, misrepresentation and lie using statistics, which is still very much applicable today.

I have been looking at statistical value -mean, median etc – from science perspective. In science and engineering, the aim is to be as correct and close to reality as possible. Afterall, things can crush and burn due to wrong numbers. But in social science, statistics can be work of arts.

So it is fascinating to find out how easily can the numbers give different unpresentative views legally and how easily we accept them without question and, worse, pass them around. Some of these common ‘errors’ are already mentioned in Gary’s class. But I will summarize them anyway.
1. Sample with built-in bias
The sample is not representative of general situation. For example , only certain groups in the entire population are taken into account. Or, the question itself is misleading.
2. The well-chosen average
People can choose between mean, median and mode to suit what they want to portray.
3. The little figures that are not there
The sample size is so small that it is just not representative. For example, only one-thirds of female college lecturers is married. But the sample involves only 3 lecturers and one of them is married. Well??
4. Much ado with practically nothing
Exaggerating small differences through unwise classifications or clever drawing of charts. Sometimes the difference is so small that it is actually within (often unmentioned) margin of error.
5. Exaggeration using one-dimentional picture
For example: to say that now people earn twice what people’s, we can use picture twice the size of the initial picture.
   nowmoneybag.jpg ==> 10 years later moneybag.jpg
But this is misleading because the 2nd picture is not only twice the height of the first, but also twice the width. So in overall, it is four times bigger than the first picture and it will make readers think that the increment is greater than it actually is.
6. The semiattached figure
It is by proving something through something else that is not so relevant actually. 
The example given: a report said that the number of death chargable to railroads is 4712, which may scare people from taking train. But actually nearly half of those were victims of people who were in cars collided to the trains at crossing. Others were riding on the rods. Only 132 out of 4712 were passengers on the trains.
7. Post hoc rides again
Two clocks are perfectly in tune with each other. Only clock B have the bell. So when clock A shows 12 o’clock, clock B rung. People then assume that clock B rung because of clock A.
8. Statisculate
Basically ‘lie legally with statistics”. Use the wrong base value, use weird reasoning, use unrealistic estimations.

How to be careful with statistics then?
1. Who says so? Does he have obvious bias?
2. How can he know?
3. What’s missing?
4. Did somebody change the subject?
5. (most important of all) Does it makes sense?

I would say ‘if you have time to spare, read the book.’ I had some good laugh.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: