When the Four Colour Theorem (FCT) was finally ‘proved’ in 1976, it upset a lot of mathematicians. It was the first significant mathematical concept to be proved with a good deal of help from a computer and, for many, that didn’t make it a real proof. Although we’re largely (maybe not entirely) OK with it now, the objections at the time weren’t just theorists’ snobbery. At the heart of it all were some fundamental questions about the role a computer could or should play in formal logic.
Essentially, the FCT says that the maximum number of different colours needed to colour a map, so that no bordering countries are the same colour, is four. (Colours can touch at a point but not at an edge.) It’s easy to show that five will always do the trick and, in fact, most normal maps only need three. However, certain types of map certainly seemed to need four so was four always enough?
It’s worth trying a few examples of this with a pencil and paper and you’ll quickly get a feel for what the problem is. (You might need to consider a special case where a country is completely surrounded by other countries to get up to four.) Four colours certainly seem to be enough but how can we be sure?
In fact, after a while, you start to see patterns and there are elements of repetition as the maps grow in size. It starts to look very much as if four colours are enough but that isn’t a proof. How can we be sure we’ve considered all of the cases? What’s needed is a bit of insight into the different patterns and how they might be similar to each other and some confidence that we’ve covered all the angles. And that’s why the 1976 FCT proof was so controversial; it was people that provided the insight but a computer that checked out all the cases.
And, as they say, ‘The Devil’s in the detail.’ How could the mathematical world be sure the computer had got it right? Computers might not make mistakes (as such) but they only do what they’re told to do. How are they told what to do? Well, people write programs to instruct computers; the FCT computer was following a program. So what we’re really asking is was its program correct? Mathematicians had pretty clear boundaries back in 1976: a formal proof could be checked for correctness in every line, every logical step, but could a program be verified in a similar way? Since then this very notion of ‘verifiable programming’, ‘formal methods of software development’ to give it a modern turn, has come a long way but not everything’s been resolved entirely. And to some extent, the lessons learned by programmers have forced mathematicians to reconsider the strength of their own analyses of ‘conventional’ written proofs.
The FCT is a pretty tricky beast but the same principles apply in simpler situations. Here’s the ‘Teaser’ question from last week’s Sunday Times News Review section:
For anyone interested in problem-solving with examples that often involve a nice combination of some elementary maths with some assistance from a bit of nifty programming (although sometimes it’s just one or the other), these Teasers are a terrific resource. Essentially here, we’re being asked to find a set of starting cubes with consecutive whole-number edge lengths such that, when we break them into their smaller ‘unit cubes’, there are as many ‘part-painted’ unit cubes (i.e. the outside ones) as there are ‘non-painted’ unit cubes (i.e. the inside ones). As with the FCT, our maths may only take us so far …
Obviously, the number of unit cubes in a starting cube of edge length n is n^{3}. The unpainted, inside unit cubes will be those left after the ‘outer layer’ is removed on all sides, which is itself a cube of edge length n-2. So there are (n-2)^{3} of these. The part-painted unit cubes are, of course, the outer layer that we’ve just removed so these must be the difference between the original n cube and the n-2 inner cube. So there must be n^{3} – (n-2)^{3} of these. Alternatively, if you can picture the cube well enough, you can count the outside unit cubes directly. Start with two big opposite sides of n^{2}; then two more opposite sides of n(n-2) (you’ve already counted some edges), which just leaves two opposite sides of (n-2)^{2} unaccounted for. This all adds up to 2n^{2} + 2n(n-2) + 2(n-2)^{2}, which reduces down to 6n^{2} – 12n + 8 with a bit of simple algebra. (In fact, you can also write this straight down if you think, ‘six faces, minus twelve edges counted twice, add the corners back on’.)
Remember we’re looking for a set of starting cubes with consecutive whole-number edge lengths such that the total number of inside unit cubes is equal to the total number of outside unit cubes. So we’re after a sequence of consecutive edge lengths (in other words, just a sequence of consecutive whole numbers) n_{1}, n_{2}, n_{3}, … such that
(n_{1}-2)^{3} + (n_{2}-2)^{3} + (n_{3}-2)^{3} + …
= 6n_{1}^{2} – 12n_{1} + 8 + 6n_{2}^{2} – 12n_{2} + 8 + 6n_{3}^{2} – 12n_{3} + 8 + …
or, in fancy maths notation, we’re looking for the first and last numbers a and b such that
Now, you might be at an interesting point here. Are you a better mathematician than a programmer or a better programmer than a mathematician? If the former, then off you go … try to solve this to find a and b (but, be warned; often these things aren’t easy – sometimes they’re not even possible). If the latter, then you’re already going to be feeling the urge to write a program to work this out for you. The following C# code will do, written as a simple console application in Visual Studio. It would take a competent programmer a few minutes to write; probably a lot less time than it would take them to solve the equation … if they could solve it at all.
We’re trying a and b in the range 2 to 20, although b obviously has to be larger than a. (We can allow a cube of size 2; it’ll just be all outside and no inside. On the other hand, if 20‘s not big enough, we could increase it and try again.) For each a and b range, we add up the non-painted (inside) and part-painted (outside) unit cubes for all the different starting cubes. If the totals are the same, we snap out the values and carry on. This runs to give the, hopefully correct, output of:
OK, so we send the answer (3024) off to the Sunday Times and, if we’re drawn out of the hat, we win £20. Well, we would if the answer’s right, of course. How confident should we be? (At the time of writing, the ‘correct solution’ hasn’t been published yet.) Well, we’re probably justified in being reasonably confident; after all, now that we know the values of a and b that we’re looking for, we can check the solution by hand like a good, old-fashioned mathematician:
n |
n cubed |
n-2 |
n-2 cubed |
n cubed – n-2 cubed |
4 |
64 |
2 |
8 |
56 |
5 |
125 |
3 |
27 |
98 |
6 |
216 |
4 |
64 |
152 |
7 |
343 |
5 |
125 |
218 |
8 |
512 |
6 |
216 |
296 |
9 |
729 |
7 |
343 |
386 |
10 |
1000 |
8 |
512 |
488 |
11 |
1331 |
9 |
729 |
602 |
12 |
1728 |
10 |
1000 |
728 |
Totals: |
3024 |
3024 |
So that looks pretty good. The table supports the findings of the program. Provided we’ve understood the problem the way the setter envisioned it, we might well have the same answer. But how does it stack up as a formal proof that this is the right result? Well, performing simple calculations with pencil and paper to establish something right or wrong looks like sound mathematics. We might reasonably take the table as proof that the range 4, …, 12 gives us the result we need. There are two issues, however, with the table:
- It was actually produced on a spreadsheet, which is really just another program, and
- It doesn’t prove that the range 4, …, 12 is the only valid solution; there could be other sequences that our program missed.
The first point doesn’t really seem like much of an objection on its own; we can easily check the result genuinely by hand. But would we? Or would we probably still use a calculator (still a computer running a program)? For most of us, working with three- and four-digit numbers, which would actually give us the greater confidence; our accuracy or the calculator’s? The second point, however, really does take us into some messy ground.
If we entertain any doubts that 4, …, 12 is the only valid solution range then the simplest way (logically, at least) to convince ourselves is to try out all of the other ranges by hand to see that they don’t work. In other words, reproduce the above table for all the other ranges and show that the totals are different. The problem is then that there wouldn’t be much point in running our original program in the first place as it hasn’t saved us any effort. A compromise might be to automate the table production within the spreadsheet; huge numbers of copy-pastes will still take too long (and take up too much room); maybe a macro to run through all of the different values to see which … Oh, hang on; that’s another program – we’re back where we started.
For what it’s worth, the setter is clearly only expecting one valid range – or at least, if there are more, then they all give the same answer of 3024. So where have they got their answer from? Where would we stand if, actually, they’d made a mistake? (Mathematically? Ethically? Legally?) What have either of us really proved? Chances are, no-one’s going to get too worked up over £20 (although times are hard these days) but mathematicians absolutely hate these uncertainties and they’re hard to entirely avoid.
In the ‘real world’, whatever that might be to a mathematician or computer scientist, we have fortunately made some progress over the years. The ‘formal methods’ approach to software development has gradually produced a form of Principia Mathematica of programming. In this way, starting from the absolutely trivial and building upwards, various code and data structures have been shown mathematically to be valid in appropriate contexts. This ‘knowledge base’ of ‘correct’ code has been expanded, step-by-step, in a layered, or hierarchical, manner and the philosophical divide between mathematicians and computer scientists has narrowed – if not quite disappeared – as a result. There’s probably not yet a a proof of exactly our problem in the knowledge base but we could hopefully build one if we could be bothered, based on something close enough. More significantly, there’s now a simplified version of the FCT proof, which most people accept. We appear to be heading in the right direction. Not everything’s as clear as it could be though and debate still rumbles on in some areas – it probably always will. In the end, we have to accept that Principia Mathematica was ultimately unsuccessful in what it was trying to achieve. Is there a Gödel‘s Incompleteness Theorem for software lurking somewhere? Well in one sense, there is of course; it’s Turing’s computability proofs. (The Halting Problem, etc.) But can we now separate program termination from terminating with the right answer? Can it be undecidable whether a program that does stop, actually stops having done what it was supposed to do? Hmmm …
Anyway, we might come back to that. Putting mathematical rigour aside for now, if this has whetted your appetite for ‘problem-solving programming’, here’s this week’s ‘Teaser’ (although, admittedly, this one’s a little harder):
Happy hunting …
July 19th, 2015 at 5:53 pm
[…] post follows on (loosely) from a previous discussion on maths and computing and asks what it really means to ‘prove’ something in each […]