Most of us, watching TV from time-to-time, find ourselves saying, “Why’s that made the news? Surely there are more important stories?” That may well be true so is there a logical way to approach the issue?
Let’s start with a couple of extreme examples, to try to get a feel for this:
- On May 7th this year, the UK went to the polls to elect a new government. Just about every news programme on radio and TV, as well as the whole Press, led with variants on the ‘story’ “Voters start voting in the General Election”. But was this actually ‘news’?
- At the other end of the scale, what about the dog that can play the piano? (Or any similar ‘and finally’ space-filler.) Is it worth the space?
Well, the election is clearly important – we won’t question that, but the date was set years ago so the start of voting was hardly unexpected. So, in a simple sense it scores highly for ‘significance’ but poorly for ‘surprise’. The piano-playing dog is very much the reverse – (probably) insignificant but (maybe) surprising.
So, can we measure these values? Well, maybe. In fact, it turns out that the surprise element is the easiest one to start with …
Communication engineers measure what we’re calling ‘surprise’ as ‘information’ and have established an entire theory around it. The essence of information theory is that the information in a ‘message’ (the news story, in our case) is related to how much effort is involved in (efficiently) relaying the message (telling the story), which – it turns out, in this context – is the same as the ‘value’ of the message. If a message/story is already (fairly well) known, then its information is (close to) zero because we don’t really need to bother to send/tell it. On this basis, “Voters vote on election day” has very little information (no surprise). The dog actually does better in this respect. In general, if I(story) is a function describing (just) the information in a story then it would appear that:
I(“Voters vote”) < I(“Dog plays piano”)
The truly extreme cases are then I(certain) = 0 and I(impossible) = infinity. (If something genuinely can’t happen then finding out that is has will be utterly mind-blowing and, at least in principle, truly telling the story properly will be impossible. But to really get this last bit, we have to appreciate that there’s a huge logical difference between ‘impossible’ and ‘very unlikely’: an alien invasion is pretty unlikely but we at least understand the concept.)
“When you have eliminated the impossible, whatever remains, however improbable, must be the truth.” – Sherlock Holmes, The Sign of the Four (Arthur Conan Doyle, 1890)
We can take this a bit further if we want to … Take a pack of cards and select one – maybe the seven of clubs, say. If we want to help someone guess which card it is then telling them the rank (seven) is clearly more use than telling them the suit (clubs) because the former leaves them fewer to guess from than the latter. So giving information from a large number of options seems more useful than from a smaller number of options. In this case
I(13) > I(4)
and, probably in general then:
I(x) > I(y) if x>y.
Also, in an intuitive sense, giving both the rank and the suit defines the card precisely in the same way as if we’d labelled them individually, which of course we do. That may sound confusing but all we’re really saying is:
I(“seven of clubs”) = I(“it’s a seven”) + I(“it’s a club”),
which means, in numbers:
I(52) = I(13) + I(4),
which might suggest a further pattern (since 52 = 13 x 4). Is it possible then that
I(xy) = I(x) + I(y)
generally for information?
To firm up this last thought a bit more, consider a grid of x by y cells (a map, perhaps). Any individual cell could be identified by either giving the x and y values (coordinates) separately or by simply labelling each cell itself from 1, 2, 3, … up to xy-2, xy-1, xy. Either identifies the cell exactly so:
I(xy) = I(x) + I(y),
the same relationship as before. Hardly a proof of the equation but probably good enough for us. But what does it mean?
Well, this might stretch our memories of school maths rather but there is actually a fairly simple function that behaves like this. The logarithm (log) of a value, relative to a ‘base’ is the power to which we have to raise the base to get that value.
(It’s not really that hard. For example, 34 = 81 so log381 = 4.)
Now the other thing we might remember about logarithms is, because for example, 25 = 2 x 2 x 2 x 2 x 2 = 32 can be written as 4 x 8 = (2 x 2) x (2 x 2 x 2) = 22 x 23, and in general then bx+y = bxby, we can turn this around for logarithms to get logbxy = logbx + logby. (In our example, log232 = log24 + log28.)
OK, once again, simply having I(xy) = I(x) + I(y) and logbxy = logbx + logby (in other words, having the our information function behave like the log function) doesn’t exactly prove that it is the log function but it will, once again, do for us. We can hopefully accept that:
Ib(S) = logb S
for any message or story taken from a possible range of S values. We’ve had to introduce the base b from the logarithm function here because real communications systems will work with a particular number base (binary, base 2 or decimal, base 10, for example) but, if we assume it’s never going to change for our purposes, we can simply write
I(S) = log S,
which is nice and simple … but what does it tell us?
Well, the graph of the log function is interesting in this respect. It shoes that, although the ‘information content’ of a story does indeed increase as the ‘possible stock’ increases, it doesn’t increase quickly … a lot less than linearly, in fact. What does this mean in practice? Well, these ‘theory-into-practice’ arguments are always less than logically solid but it might mean that we get a diminishing return on our efforts to find more interesting stories to cover. It does however still imply that “voters vote” shouldn’t be of any interest to anyone – at least in an information sense.
But this is only one aspect, of course: the surprise element we started off with. What of significance?
This is harder to deal with because it’s not so easy to perform a calculation. Also, the context matters in a different but similarly difficult-to-measure way. (‘Congratulations to the happy couple’ and ‘Call an ambulance immediately’ are both rare messages but require very different levels of prioritisation and response.)
The best we can probably hope for here is an agreed ranking in an agreed context (in our case, that of a news programme). Such a ranking seems implicit on many news channels …
R(“Major disaster”) > R(“Famous person dies”) > R(“Big political event”) > R(“Famous person does something”) > … > R(“Run-of-the-mill political event”) > … > R(“Dog plays piano”)
where R(S) is the ‘ranking’ of the story S. However, even concept this can’t be consistently applied because ‘disasters’, ‘famous people’, ‘politics’, etc. clearly have their own internal significance scales which cause overlap. (Some disasters are worse than others, some people more famous than others, etc. so a really famous person might be more important than a ‘minor’ disaster? (Don’t shoot the messenger for the cynicism of the message!)) Significance is also a function of time: few of the world’s trouble spots have improved over the past few months but many have dropped out of the news because there’s nothing new to say about them. (Compassion overload is a recognised problem for aid charities.) I(S) are R(S) are tightly coupled. All of this is best illustrated by noting that, in practice, R(“Voters vote”) > R(S) ‘for all’ S (“Voters vote” outranks all stories) despite I(“Voters vote”) being close to zero.
It certainly doesn’t seem as if we can attempt an explicit formula for R(S) = … as we contrived for I(S).
But even if we can agree on a fixed ranking, in a given context, at a fixed point in time, there’s a problem … because the huge number of potential news stories out there, from which a relative few have to be selected, will naturally group themselves into the classifications above – or any other classification that we might prefer. In other words, on any given day, there will be a selection of disasters to choose from, several deaths, lots of politics, many famous people doing things, etc. Simply sorting these stories as R(S1) > R(S2) > R(S3) > … will almost certainly mean that S1, S2, S3, … are all the same type of story (probably disasters on most days). That won’t make for an interesting news programme.
And it’s at this point that we have to accept that even ‘serious’ news channels have an ‘agenda’. At the very least, they have to attract viewers and that probably means maximising interest, surprise, significance and variety. Choosing S1, S2, S3, … to maximise R(S1) + R(S2) + R(S3) + … will give nothing but disasters, whereas choosing S1, S2, S3, … to maximise I(S1) + I(S2) + I(S3) + … gives a diet of talented pets. Perhaps we could try defining a ‘combined importance’ measure as:
C(S) = I(S) x R(S) or C(S) = I(S) + R(S) ?
The problem with the first version is that close-to-zero values for either I(S) or R(S) will give small values for C(S), possibly favouring the mediocre, which score moderately on both scales? The second may be difficult because we’ll somehow have to get I(S) and R(S) aligned to the same scale (normalised), else one or other will dominate. Even then the grouping issue will probably remain. Choosing S1, S2, S3, … so that C(S1) > C(S2) > C(S3) > … is likely to pick S1, S2, S3, … from the same broad categories.
The next approach then is presumably to try to formulate the problem as a more complex one. If simple sorting, maximising or minimising don’t work, then are we looking at some form of multi-objective optimisation? Well, we might be but such formulations are notoriously hard to define in terms of input and structure in the first place and even harder to solve. In fact, we’ve already touched on this in a previous discussion about network optimisation. There the problem was turning link bandwidth into profit (or spirituality); here it’s turning arbitrary measures of interest into viewing figures. Maybe not quite as hard but still pretty hard.
It would appear then that any formulaic, and therefore probably algorithmic, approach to news selection is unrealistic at this moment in time. ‘Drop the Dead Donkey’ is still very much a human exercise – so, for now, the ‘Is that really news?’ question will remain. Also, we’ll never keep everyone happy anyway because opinions and priorities differ. However, this is one to watch out for as AI develops and more and more of the jobs we traditionally regard as too hard for automation eventually succumb. (In the context of reporting individual stories, a start has already been made.) Is it only a matter of time before news selection becomes a job for a computer?
For the time being, news programme planners are likely to pick stories that are representative of their ‘genre’. They’ll include one or two from each of the ‘disaster’, ‘famous’, etc. pools before moving on to the next – rather than attempting to exhaust each one in turn. Whether or not a particular story makes the cut will depend more on its novelty and significance relative to similar stories, rather than its absolute score – however that might be measured. Ultimately, of course, this means that stories aren’t chosen for inclusion in isolation or on their independent merits, but for the shape, balance and appeal of the programme as a whole. And this, in turn, depends on what audience the channel is aiming at, what impression it’s trying to give and, ultimately, its agenda.
And, of course, the moment we start discussing ‘agendas’, we introduce the concept of ‘bias’, but that’s probably best left for another day …
October 7th, 2015 at 9:33 am
Thanks Vic… for the headache at half 9 in the morning! (Maths at this time is never fun)
Great post as always and hope enrolment has gone smooth. Look forward to seeing you all on graduation day!