Copernicus versus the Scientific Method

1530-1556

I have to confess that for a long time I didn’t really “get” Copernicus. That is to say, while I knew that Copernicus is right and Ptolemy is wrong, I wasn’t clear on just why Copernicus had a better scientific theory, partly because I didn’t bother understanding Ptolemy. So here’s a brief summary of the two. (Howard Margolis’s book helped me out.) There’s a larger point here: what makes Copernicus’s theory better doesn’t quite fit with a lot of pronouncements about “the Scientific Method.”

First, a diagram that illustrates Ptolemy’s model.

To account for the motion of the planets, Ptolemy needed to assume that the five planets (not counting the sun and moon) have both cycles (the big circles) and epicycles (the little circles). If you’re going to put Earth at the center of the system, then you have to have epicycles to account for the motions of the planets, like the retrograde motions where planets seem to go backwards.

Here’s something to note about this diagram: some of the cycles and epicycles vary independently, while others are exactly tied to the motions of the sun. For Mercury and Venus, the epicycles vary independently, taking different periods of time (88 days, 225 days) to complete a circuit. Their cycles, by contrast, take exactly one Earth year to complete a circuit. Furthermore, the deferent, the point at center of each epicycle, is always exactly in line with the sun. For Mars, Jupiter and Saturn on the other hand, the cycles vary independently (1.88, 11.86, and 29.46 years to make a complete circuit). But the epicycles take exactly one Earth year to complete a circuit. Furthermore, in each case the line from deferent to planet is exactly parallel to the line from Earth to Sun. Note that it’s hard to see any reason why the epicycle for Jupiter, say, couldn’t take 3.14 years to complete a circuit. But instead somehow, mysteriously, it’s connected with the sun’s motion around the earth.

(A large fraction of diagrams on the web purporting to illustrate Ptolemaic astronomy get this crucial point wrong! They show higgledy-piggledy non-parallel deferent/planet lines, pointing any which way. So I’m not the only person not to get Ptolemy.)

Copernicus’s model, by contrast, doesn’t just replace five circles (the cycles for Mercury and Venus, and the epicycles for Mars, Jupiter and Saturn) with one (for the Earth). It also automatically explains why the five superfluous cycles show an otherwise unexplained synchrony.

People who read Copernicus 1543 book carefully (not many at first) could see he had a real explanation for something that’s just a coincidence in Ptolemy. But contrary to what you may have heard, and what students get taught, about the Scientific Method, Copernicus did not formulate a hypothesis and then collect data to test his hypothesis, and show that it made the right predictions. Ptolemy and Copernicus make the same predictions about where the planets will appear in the sky. (Both are slightly off because they assume circular rather than elliptical orbits.) Eventually other scientists would gather data in support of Copernicus, but the explanatory economy of his theory was a very strong reason for believing in it even before that.

Fortunately, there is a modern theory of how induction works – Solomonoff induction – that can explain why Copernicanism is a better theory. According Ray Solomonoff, induction has two parts. First there is Bayes’ Rule. Bayes’ Rule is an application of probability theory that tells you how you should revise probability estimates in the face of new evidence. Eliezer Yudkowsky gives one of the best introductions around to a counter-intuitive approach that has become enormously influential in recent years. (It’s fun to read too).

But Bayes’ Rule is only part of the story. The rule assumes that you have already assigned some prior probabilities to events before you look at the evidence. It doesn’t tell you where these prior probabilities come from. Solomonoff argues that we can use the theory of algorithmic complexity, as developed by Kolmogorov, to assign prior probabilities. Roughly, if your theory were turned into a computer program, how long would the program be? The longer the program, the lower the prior probability. (Probabilities fall off exponentially with length of program, and are weighted to sum to one.) Suppose I give you a sequence of numbers corresponding to the first 1000 decimal digits of π. A computer program to calculate the first 1000 digits of π is going to be a lot shorter than just a list of the first 1000 digits, so the theory that I generated the list by calculating π is astronomically more likely than the theory that I generated the list at random. This is a formalization of Occam’s Rule, that simple explanations with fewer working parts are better.

So collecting evidence in support of a theory is part of good induction. But proposing more economical theories, accounting for more data with fewer working parts, is another part. Sometimes a new theory is so much better than the alternatives that we can assign it a much higher likelihood even before we collect more evidence. With Copernicus, explaining some striking coincidences which were otherwise unexplained, this was the first act in the modern Scientific Revolution.