Spurious Accuracy

“At one point, the drone was estimated to be approximately 98 feet from the passenger jet.”

“Estimated?”  “Approximately?”  98 feet actually looks astonishingly accurate doesn’t it?  Is someone having a laugh?  No, not exactly; it’s just the sort of thing that happens when people do silly things with numbers.

We’ll come back to that one.  For now, to get an idea of what’s going on, let’s take another example, adapted from Darrell Huff‘s magnificent How to Lie with Statistics

Suppose you’re a would-be statistical researcher and you’ve decided to write something on how long people sleep.  You’re going to talk to 100 people (which isn’t a huge number for a study but things have been published on less data) about it.  And here’s where it starts to go wrong …

For each person you interview, you simply ask them how long they slept the night before.  You get a variety of responses a bit like:

  • “Oh, I don’t know: maybe about eight hours.”
  • “About seven hours.”
  • “Seemed like I was awake half the night: no more than five hours, I’d say.”
  • “I’m not sure. Six hours, maybe.  No, seven; let’s say seven.”
  • “Seven hours, twenty minutes.”

Obviously, you’re going to get an odd mix of answers.  Apart from the fact that you can’t really trust their memories anyway, the figures just aren’t going to be very precise.  Well, the 7h20 guy might have a good reason for saying that but the self-proclaimed insomniac isn’t very reliable.  The thing is, you can’t record some data to a level of accuracy that you know you can’t match with all of it, so you record rough figures for everyone.  That’s probably: 8, 7, 5, 7, 7 for the five cases above; then you do it for another 95 people.  The responsible thing to do at this point would be to find the median, which means put all 100 values in order and find the middle one (or split the difference between the 50th and 51st values if you really have to).  That might give you 7.  Conclusion: “the average person sleeps for about seven hours a night”

The problem with this is that it’s just not very interesting.  No-one’s going to publish that: not even the Daily Mail.  So, you’re going to have make the result sound more significant.  How can you do that?  Well, scientific sounds significant; and more accurate looks more scientific.  So, how about calculating a different average: the mean?  Add up the 100 values: let’s say that gives you 719.  Divide by the 100 (values) and you get 7.19Two decimal places!  Now, that looks impressive: it’s accurate, so it must be scientific, which makes it significant.  The Daily Mail will run with that any day of the week, particularly if they can make it scary somehow.  Nice.

Sadly, though, you had no right to do this with the data.  Very little – if any – of your original data was taken down with this level of accuracy so it’s meaningless to report any result this way.  There’s way, way too much experimental error.  Even reporting a figure of 7.2 is probably dishonest.  The only remotely responsible figure is, er … about 7.

So, is this what’s happened in our opening example.  Well, no; not quite.  But it’s a related problem in which spurious accuracy results from changing units.

What’s happened here can be summarised by the following fictional dialogue:

  • Reporter: “Boss, I’ve got this story about a drone nearly hitting a plane”
  • Editor: “How nearly?”
  • Reporter: “Eh?”
  • Editor: “How close was it?”
  • Reporter: “The pilot couldn’t really say: more than 20 metres, she said; less than 5030 metres maybe?”
  • Editor: “OK, let’s go with that.  Except our readers don’t understand metric.  Convert it to feet.”
  • Reporter: “Will do, boss.”

And there you have it … 30m converts to 98.4252… feet.  Knock off the bit after the decimal point.  (Because it would be silly to be that accurate, wouldn’t it?!?!)  And the drone was “approximately 98 feet” away!  It could have been worse, of course: it could have been reported as “approximately 98 feet 5 1/8 inches” but somehow a molecule of common sense remained.  Actually, it sounds like it could realistically have been anywhere from 70 to 150 feet.  “100 feet” would be entirely sensible as an estimate: it’s only to give an idea.  However, people are often reluctant to show this sort of initiative: they sometimes don’t understand the numbers well enough; and occasionally there may actually be legal restrictions!

So, you can probably imagine how this happened …

But does it really matter?  Well, yes.  Here’s a similar example taken from http://www.fallacyfiles.org/fakeprec.html, in which the result is just plain wrong.

“Consider a precise number that is well known to generations of parents and doctors: the normal human body temperature of 98.6 degrees Fahrenheit. Recent investigations involving millions of measurements have revealed that this number is wrong; normal human body temperature is actually 98.2 degrees Fahrenheit. The fault, however, lies not with … [the] … original measurements — they were averaged and sensibly rounded to the nearest degree: 37 degrees Celsius. When this temperature was converted to Fahrenheit, however, the rounding was forgotten, and 98.6 was taken to be accurate to the nearest tenth of a degree. Had the original interval between 36.5 degrees Celsius and 37.5 degrees Celsius been translated, the equivalent Fahrenheit temperatures would have ranged from 97.7 degrees to 99.5 degrees.  (John Allen Paulos, A Mathematician Reads the Newspaper (Anchor, 1995), p. 139)

… which means we’d have remembered 98F as being ‘about right’, which is, er … about right.  And sometimes, ‘about right’ is right!

Daft levels of accuracy are almost always … wrong.

Consider, as an exercise, the various ridiculous calculations that could have led to that headline!

There’s been an inexhaustible supply of examples of spurious accuracy over the years – in all fields.  These are just a few:

  • A sink-hole appeared in the middle of a road.  Reported as being “66 feet in diameter”.  (No ‘approximately’ on this one.)  Origin: More than 10m, less than 3020m = 66 feet!
  • Old Scroat is a healthier cigarette.”  Origin: someone listed a number (not all) of brands of cigarettes in a table of how much deadly chemicals they had in them.  One of them had to be at the good end of the table.  That was Old Scroat.  The difference was negligible: they were all lethal!
  • A ‘circular walk’ around Countrytown is only a (very) rough circle and has a (very) rough radius of 6 miles.  But, hey, the circumference of a circle is 2 x π x r (=6=  37.669…. so that’s 37.7 miles!
  • A man escaped from a bull by running across a field.  He ‘reached a speed of 15.8mph’.  Origin: The field was ‘about 100 metres square’ and he took ‘perhaps no more than 20 seconds to run from one corner to another’.  So Pythagoras gives us  √(2 x 1002)/20  =  7.071068 metres/second  =  25.45584 km/h  =  15.81753 mph.  (Could have been worse!)
  • A student describes an experiment with repeated data and reports a standard deviation to 20 decimal places.  Origin: that’s what came out of the spreadsheet!
  • The polls said Remain and Clinton; but the votes said Leave and Trump!  Origin: polls are generally over-optimistic in terms of liberal objectives and are only accurate to a range of several percent … but report results to within a single percent.
  • And so on …

Be careful out there!

About Vic Grout

Professor of Computing Futures at Wrexham Glyndwr University, Wales, UK. View all posts by Vic Grout

So what do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: