What are good forecasts?

— There are no good forecasts, or every forecast is good.

A casual google search will suggest there is an extensive amount of research and thought went into answering this very question. However, lately I’ve thought of a different question that isn’t asked that extensively in the machine learning setting: what are good “imprecise” forecasts. I don’t have the complete answer yet, but in the process I might have some words to add to the age-old question of what are good forecasts?

First of all, what is a forecast? Given any uncertain event, like rain tomorrow, or the stock price the day after tomorrow—these are the events that haven’t happened yet. Yet we’d want to plan a certain course of action to prepare for them. So to plan ahead, one can refer to the forecasts that are issued ahead of time like, what is the chance of rain tomorrow? We assume a forecaster who converts historical and current information into forecasts, and we’re concerned with if someone is a good forecaster? The issued forecast is a number $p \in [0,1]$ for some uncertain event, which we assume to be binary $X \in \{0,1\}$. Since my treatment is the same as the framework of game-theoretic probability pioneered by Glenn Shafer and Vladimir Vovk [1], I’ll attach the “betting” interpretation to the number $p$.

Given the uncertain outcome $X$ (whose value is yet to be materialised), a forecast $p$ means the fair price for the uncertain outcome $X$, i.e. the forecaster as an act of making this forecast is signalling that they are indifferent between having $X$ or $p$, where the value of $X$ is assumed to be in the same unit scale. This is arguably the most ancient interpretation of probability: probability does not exist in nature, but one is putting money on the table to signal the strength of the evidence one has towards the uncertain outcome $X$. Game theoretic probability take this primitive interpretation of probability, and extends it to include two other very natural agents in the setup: the sceptic and the nature.

So the setup goes like this: the forecaster announces the fair price $p$ for some uncertain outcome $X$, which can materialise to either $0$ or $1$. The sceptic will now engage in the betting game with the forecaster, i.e. the sceptic will literally buy or sell the gambles the forecaster will make available. So what are the gambles, and what are the gambles made available by the forecaster? A gamble is just an uncertain outcome with some desirability constraint. For example, when the fair price as per the forecaster is $p$ (or the forecaster value $X$ same as $p$), then for some number $q \leq p$, the uncertain outcome $X - q$ is desirable to the forecaster, i.e. if the forecaster will be in favour of buying $X$ for $q$, or $X-q$ has net positive value for the forecaster. Similarly, for any number $r\geq p$, the outcome $X-r$ is not desirable to the forecaster. However, $r-X$ is desirable to the forecaster, which means that the forecaster will be willing to sell $X$ for price $r$ making net profit. Next, for the forecast $p$ we can define the set of available gambles as $\{-(X - q), -(r-X) \ \mid \ q \leq p, \ r \geq p\}$. By making the forecast, and assuming the betting interpretation of the probability, these are the set of gambles available to the sceptic. So what does sceptic do? A sceptic starts with some unit wealth, and strategically starts putting some proportion of their money on these gambles. For instance, given some gamble $G$ of the form as in the set of available gambles, the sceptic will stake $\lambda$ of their wealth for $G$, hoping to have $1 + \lambda \cdot G$ as the end wealth as a result of this transaction. But the sceptic would not stop at just one round, they will continuously choose strategic stakes to have the accumulated wealth after, say $t$ rounds as

$\quad \quad \quad W_t = \prod_{i=1}^{t}\left(1 \ + \ \lambda_i\cdot G\right), \ G \in \{-\left(X-q\right), \ -\left(r-X\right) \ \vert \ q\leq p, \ r \geq p\}$.

After each round, the agent Nature reveals the realised value of $X$. Note that the forecaster is independent of nature, and has no bearing whatsoever with how nature behaves. However, the forecaster claims to have some information about how the nature may behave. The sceptic is trying to test this claim, by actually making use of the announced forecasts.

A standard argument (which I won’t go much into details) would say that $W_t$ process is a super-martingale (as per the forecaster) which means that $\mathbb{E}\left[\ W_t \ \mid \ \mathcal{F}{t-1} \ \right] \leq W{t-1}$, where one can think $\mathcal{F}_{t-1}$ as encapsulating all the knowledge before the nature reveals $X_t$ at round $t$. As per the forecaster means, that this is what this agent is getting into by the very act of making forecasts and by providing a set of available gambles, i.e. this agent thinks the sceptic will not be able to make money. And the whole setup now reveals that if the sceptic does indeed ends up making unbounded money, then the forecaster has no clue whatsoever on the behaviour of the nature. Informally, the forecaster sets up the game, and decide the rules of the game, and if the sceptic (who has everything against them the way the game is setup) ends up winning (making money) in this rigged game, then the forecaster is not making very good forecasts.

This is a very standard argument, and has laid the groundwork for what is now a rather popular framework of testing by betting [2]. However, what didn’t occur to me before is the notion of information asymmetry that the business of forecasting is based on. There are no universally good forecasts—a forecast is good only subjected to the information the sceptic (or the evaluator in general has). In the setup described above, we can look at sceptic at bit closely. In the game that is rigged, a sceptic is trying to get ahead by strategically choosing $\lambda$ at each time step, and to be able to do so the sceptic is allowed to use any information that does not depend on the future, i.e. before the nature reveals the uncertain $X$. It is easy to see that if the sceptic has more information, or is able to design an informative mechanism to be able to foresee the future better than the forecast, the forecaster can be defeated in their own game.

I like this sentence by Paul Vitányi in his very nice overview of the notion of randomness [3]: “describing ‘randomness’ in terms of ‘unpredictability’ is problematic and possibly unsatisfactory.” At the risk of digressing a bit, I’ll give a brief informal overview of the program of studying randomness via unpredictability. This program was initiated by Richard Von Mises as an attempt to justify the application of an axiomatic and abstract theory of probability to real-life. The question is given an infinite sequence of $0$ and $1$ as $01001001000111010101\ldots$, when can we say if this sequence is truly random (or generated by some random source) or not. To think of this as an outcome sequence of some coin with heads rate $p$, Von Mises notion was based on the ‘Law of Excluded Gambling Strategy,’ i.e. no gambler betting in fixed amounts on the flips of a coin cannot make more money in the long run by betting randomly (as in with no strategy) than betting with some strategy in mind. Here, strategy is defined by some selection rule that looks at past and strategically decides whether to bet on the next coin flip or not. This notion quickly went into problems as it was not clear what selection rules are admissible, and the resolution came from the notion of computability, i.e. there should not exist any computable strategy or system that can make the gambler make money. It that happens, the sequence can be deemed non-random. Subsequent notions of randomness make this more tangible by allowing for not when to bet but also how much to bet, and if any martingale process can make unbounded money on the outcome sequence, then the sequence will be deemed non-random. However, we do not have a good mechanism to test if something is truly random.

Randomness and forecasts are two sides of the same coin, and the betting interpretation is a unified way of studying both. And hence, one common message is that of Popperian falsifiability [4]: randomness and forecasts are based on falsification, and not verification. One can run sophisticated tests to try to falsify forecasts, but one cannot claim if the forecasts are truly good. And the falsification strategy depends on the information or the extent of sophistication (compute, etc.) the sceptic (or evaluator) has.

Cynthia Dwork and colleagues [6] operationalise this via their now celebrated framework of outcome indistinguishability that has seen some success. The framework assumes a forecaster and computational form of tests, which they call distinguishers, and a forecaster will be called outcome indistinguishable from the nature, if no distinguisher can falsify the forecaster based on the observational data that is coming from nature. However, the framework assumes that the distinguisher is also operating under the same information access as the forecaster. In the machine learning predictor scenario, when we’re designing algorithms for some high-dimensional objects like people, we have some representation in the form of what we call a covariate $\mathbf{x}$, and the predictor gives the forecast for some outcome, like the chance of defaulting a loan, or some welfare program access like housing. It is clear that we’d need some evaluation mechanism on such a predictor to be able to rely on the predictions, and clearly outcome indistinguishability offers one—any predictor is good with respect to some set of distinguishers if it cannot be falsified any distinguishers from the set. And the popular notions of multi-accuracy [7] and multi-calibration [8] follow from it.

It seems like we can make peace with this, and design better distinguishers or tests to definitely falsify the forecasts. However, while outcome indistinguishability respects computation boundedness, it as per my understanding, does not consider information asymmetry, i.e. when the evaluator has more information than the forecaster? This post has gotten long, and I’ll stop here. I’ll elaborate more on information asymmetry next.

References:

[1] Game-Theoretic Foundations of Probability and Finance by Glenn Shafer and Vladimir Vovk. https://www.probabilityandfinance.com/2019_book/index.html

[2] Testing by betting: A strategy for statistical and scientific communication. https://glennshafer.com/assets/downloads/articles/article104_jrss_shafer-testingbybetting-with-discussion-response.pdf

[3] Paul M.B. Vitanyi. Randomness. https://arxiv.org/abs/math/0110086

[4] Popperian Falsifiability. https://en.wikipedia.org/wiki/Falsifiability