7 November 2015


I’m going to assume that anyone who reads this post has almost certainly been landed here by Google, and therefore is likely to be familiar with the Good Judgement Project (GJP), the brainchild of Professor Philip Tetlock at the University of Pennsylvania and described in his and Dan Gardner's recently published book, Superforecasting: The Art and Science of Prediction. So I won’t provide a description of the GJP or the book.

Before going any further, I should say that I participated in the last GJP tournament and my score over some 135 questions was undistinguished – well below that of the top 2% who Tetlock defines as superforecasters (SFs). So you are welcome to dismiss anything that follows as sour grapes.

Undoubtedly, the book has been reviewed with enthusiasm in the UK by, among others , John Rentoul (not once, not twice “a terrific piece of work that deserves to be widely read”, but three times), Dominic Cummings (“Everybody reading this could do one simple thing: ask their MP whether they have done Tetlock’s training programme.”), Daniel Finkelstein (“This book shows that you can be better at forecasting. Superforecasting is an indispensable guide to this indispensable activity.”) and Dominic Lawson (“fascinating and breezily written”).

A key point to appreciate about individual SFs is that their capability as forecasters is based not so much on what they know - the forecasting questions range widely, so being a subject matter expert is not an advantage - as the way they set about it.. Nor are SFs mostly in the IQ top 1% (135+) of the population, although they are in the top 20% for intelligence and knowledge (Page 109). However, they “are almost uniformly highly numerate people” (Page 130) and “embrace[d] probabilistic thinking” (Page 152). The other factors in their success and how anyone could set about improving their own forecasting skills is the subject of the book.

SFs, being numerate, will have no problem with the Brier scoring system used by the GJP. Brier scoring may well be an eye-glazing subject for most people, but it is one that ought to be understood if an appreciation of superforecasting is to be more than superficial. Hence the Annex below for anyone interested.

Superforecasting, having more significant issues to address, does not bother the reader with much on the way the Brier score is calculated, but it does feature on pages 167/8 when the importance of objectively updating forecasts in the light of new information is being emphasised. A GJP question being asked in the first week of January 2014 was whether “the number of Syrian refugees reported by the UN Refugee Agency as of 1 April 2014” would be under 2.6 million. This chart* from page 167 shows successive forecasts made by one SF, Tim:

On page 168 the reader is told that “Tim’s final Brier score was an impressive 0.07.” The smaller the Brier Score, the better the forecast, of course. By my reckoning, that is the score which Tim would have received if he had made an initial forecast of 81% probability of the answer to the question being “Yes” and stuck with it. But, as is clear, he started with slightly better than evens in early January and then moved towards certainty in late March, thereby achieving SF status, on this as on many other questions.

But what if the UNRA had needed its forecast in January for planning purposes and a more accurate forecast in March would have been too late? The next chart attempts to break out Tim’s forecasts on a monthly basis. 

For January it looks as though Tim’s average probability was about 67% so the Brier Score for this month would have been a 0.22, rather poorer than the glittering 0.07. The GJP methodology is time-independent in the sense that it does not attach a greater value (weighting) to early forecasts as opposed to later ones. It might be interesting to see the effect of some discounting, as in (but opposite in time to) Discounted Cash Flow where early cash flows are valued more highly than late ones. This would seem appropriate if an important driver of a real-world forecast is someone’s need to make decisions well before outcomes are known.

Another time-related aspect of the GJP scoring is the termination date of the question. A scenario from the past might be helpful. On 30 September 1938 the leaders of Britain, France, Germany and Italy signed the Munich Pact allowing German occupation of the Sudetenland. The British Prime Minister, Neville Chamberlain, returned to London (above) where he was greeted by enthusiastic crowds and spoke of “peace for our time”. However, Churchill said that “England … has chosen shame, and will get war”. So a GJP question in October 1938 might well have asked “Will Britain and Germany be at war by 31 August 1939?”.

A Chamberlain supporter would have started his (or her) answer at a probability much less than 50%, a Churchill supporter at a substantially higher one. As the events of 1939 unfolded, both parties would probably have increased their estimates. However, when the question closed, the Chamberlain supporter would certainly have had the lower Brier Score. Of course, if the question had been “Will Britain and Germany be at war by 30 September 1939?”, their Brier Scores would have been reversed** and the Churchill supporter’s initial pessimism would have been vindicated. So, depending on the date of the question to within just a few days, a good (ie low) Brier Score can be obtained for questionable judgement in a broader sense. There were 135 questions in the recent GJP tournament, 23 of those ended on 31 May and another 39 between 1 and 10 June, dates which probably have more to do with Penn U’s academic year than geopolitics.

It is probably worth bearing in mind that SFs are people adept at minimising time-independent Brier Scores on questions which terminate on arbitrary dates and that is the basis on which their ability as forecasters has been assessed.

As pointed out earlier, the reviews of Superforecasting have been enthusiastic and keen to see its approach being adopted in business and government. I came across this interesting comment on Forbes/Pharma & Healthcare by an admiring reviewer, Frank David, who works in biomedical R&D:
So, companies may be able to nurture more “superforecasters” – but how can they maximize their impact within the organization? One logical strategy might be to assemble these lone-wolf prediction savants into “superteams” – and in fact, coalitions of the highest-performing predictors did outperform individual “superforecasters”. However, this was only true if the groups also had additional attributes, like a “culture of sharing” and diligent attention to avoiding “groupthink” among their members, none of which can be taken for granted, especially in a large organization.“ A busy executive might think “I want some of those” and imagine the recipe is straightforward,” Tetlock wryly observes about these “superteams”. “Sadly, it isn’t that simple.” 
A bigger question for companies is whether even individual “superforecasters” could survive the toxic trappings of modern corporate life. The GJP’s experimental bubble lacked the competitive promotion policies, dysfunctional managers, bonus-defining annual reviews and forced rankings that complicate the pure, single-minded quest for fact-based decision-making in many organizations. All too often, as Tetlock ruefully notes, “the goal of forecasting is not to see what’s coming. It is to advance the interest of the forecaster and the forecaster’s tribe,” [original emphasis] and it’s likely many would find it difficult to reconcile the key tenets of “superforecasting” with their personal and professional aspirations.
Very likely - “the toxic trappings of modern corporate life” - how true indeed.

* I may be taking the chart on page 167 too seriously, but a couple of points. The grey space above a probability of 1 is ,of course, just artistic licence. However, I don’t understand (apart from further artistic licence) why the successive forecasts have been joined by straight lines as shown. Surely a forecast remains extant until it is superseded by the next one, and the forecast line is a series of steps, as shown for January 2014 below?

** US readers, and this blog has a few, may need to be reminded that the UK declared war on Germany on 3 September 1939. Germany declared war on the USA on 11 December 1941.

Only superforecasters working in teams are likely to approach perfection - please let me know by commenting if you spot any mistakes in this post!


The red line is the Brier Score, S, as a function of (1-p), the dotted line for comparison is S = (1-p).


The Superforecasters' prediction for the UK "Brexit" referendum earlier this month was less than stellar - see this post.

No comments:

Post a Comment