Friday, January 15, 2010

The importance of knowing your statistic's quality 

Because I seem to have exasperated poor Benjamin, let me give an example of what I mean by watching your measurements. The use of principal components analysis (PCA) allows for the construction of a synthetic temperature measurement to replace whatever poor or non-existent data you have. My caution -- not a refutation, though Benjamin would have you believe I am skeptical of the whole global warming hypothesis (which means nothing -- I'm skeptical of the law of demand -- what you want to know is the diffusion of my prior beliefs in a Bayesian sense) -- my caution is from experience that PCA can give you very different synthetic measures depending on choice of proxy. That is my considered opinion from having written papers on the subject as a social scientist.

Now perhaps that isn't a problem in the natural sciences. I don't know; I'd be surprised if that was true, but I would respect a scientist's opinion on it. But one thing we probably COULD agree on is that if you had actual temperature data, you would rather use that than a synthetic measure.

Comes then to the discussion one Lubos Motl, a Czech physicist who has collected 351 years of annual average temperatures from a single site in central England. He uses a moving average of different lengths (beginning with a 30-year window, but varied between a decade and a century) and looks at the results. What does he find?
In the late 17th and early 18th century, there was clearly a much longer period when the 30-year trends were higher than the recent ones. There is nothing exceptional about the recent era. Because I don't want to waste time with the creation of confusing descriptions of the x-axis, let me list the ten 30-year intervals with the fastest warming trends:

1691 - 1720, 5.039 �C/century
1978 - 2007, 5.038 �C/century
1977 - 2006, 4.95 �C/century
1690 - 1719, 4.754 �C/century
1979 - 2008, 4.705 �C/century
1688 - 1717, 4.7 �C/century
1692 - 1721, 4.642 �C/century
1694 - 1723, 4.524 �C/century
1689 - 1718, 4.446 �C/century
1687 - 1716, 4.333 �C/century

You see, the early 18th century actually wins: even when you calculate the trends over the "sufficient" 30 years, the trend was faster than it is in the most recent 30 years.
Motl's graphs do better that that chart, but you get the point. 300 years ago, a period without the industrial age and CO2 in abundance, and it's just as warm. Bad for those who blame CO2 for the warming.

Suppose you wanted to be skeptical of this measurement? What could you say? You could argue the thermometer was bad -- after all, the earliest part of this period was a period of rapid development of thermometer technology. Maybe the calibration just isn't good enough.

Or you could argue central England is itself anomalous. Maybe -- Motl calls it a "decent proxy" for world temperature, but maybe it isn't. You would want to read more about that before you accepted it.

So here's the problem in a nutshell: I have a measure that I know, think it measured temperature well, measured it directly. I understand the underlying statistical technique well, and because the data is right there, I can replicate it. My confidence is improved by this, different from PCA. But because there's a chance they might have changed the thermometer, or that you can't generalize Northern Hemispheric temperatures from a single point in central England, I could not use it definitively. It's a skepticism not unlike my skepticism of people's claims about job loss based on the household survey. It could be that PCA's issues are not as severe as these. And it may be that there are other studies that give different experiences than the PCA studies and studies that support PCA. At the end of the day your beliefs are updated by the studies; you learn. And what you think is true evolves, with questioning and skepticism all along the way.

Or at least, that's what Scholars do.

Labels: ,