Thursday, August 06, 2009

Significant digits 

Not sure this got noticed yesterday.
[S]omeone hoping to dump an '87 [Plymouth] Sundance in the cash-for-clunkers program was shocked recently when the Environmental Protection Agency re-checked fuel economy figures. In the new math, some Sundances got 19 miles per gallon, just ahead of the clunker-cutoff of 18. It and 77 other cars were bumped from the bad-enough-for-cash list.

In a statement, the EPA said "more precise" data calculated "to four decimal places" caused the revisions.

Just how precise can a fuel-economy test be? Not that precise. After all, 0.0001 miles is about six inches, and, if you could count it, a car getting around 18 miles to the gallon would consume about half a drop of fuel in that distance.

The article goes on to point out all the different places we see this measurement mistake. Of course false accuracy has been a longtime issue. Prof. Phillipp Bagus reviewed the work of the economist Oskar Morgenstern. Bagus notes that in the physical sciences we would have an explicit statement of measurement error,
Yet in economics there is simply no error estimate. This means that we do not know the accuracy of the economic data presented to us. This is even more troubling when we consider that in social or economic data there are more possible sources of error than in the physical sciences. We therefore face the question of why the problem of accuracy of economic data is rarely mentioned or passed over in silence in economics, while in the physical sciences this problem is widely acknowledged.
EPA made a mistake in using the data as it did, but that's easy to recognize. Morgenstern's career was in no small part dedicated to finding the many ways economic data fails to provide a context for what I would call the "economic signficance" of some change (and often fails to measure statistical significance, but that's another point.)

Today's example will be the unemployment insurance numbers, which some will proclaim to mean "the worst is behind us." Yet buried deep within:
Jobless claims tend to be volatile in late June and July when automakers typically halt production and idle workers to re-equip factories to build new models. GM and Chrysler Group LLC halted production earlier than usual as they worked through bankruptcy proceedings.
So those initial claims already have come and gone; the data in this period is "volatile" meaning the signal-to-noise ratio has fallen. I'm not sure what the current numbers mean, and I think I'd wait for tomorrow's employment report before I started to make statemetns about the worst being behind us.

There are those who are so convinced economic data is fouled by such errors that they won't use it at all. A majority of the profession is more positivist, but that only increases the burden on us to be clear about data fragility.