"But to us, probability is the very guide of life." The analogy of religion, Bishop Joseph Butler 2. BAYES THEOREM AND ITS IMPLICATIONS 2.1 BAYES THEOREM Bayes (1763) first stated the following result (given below in modern notation). Bayes theorem is: p(A GIVEN B) is proportional to p(B GIVEN A).p(A) with the constant of proportionality independent of p(A), where p(A) is the unconditional probability of A and p(A GIVEN B) is the conditional probability of A, given B This result can be interpreted as follows. If B is the observed data, and A is a hypothesis about the model that generated the data, then: p(A GIVEN B) = probability that the hypothesis is true after observing the data; called the posterior probability. p(A) = probability that the hypothesis is true before observing the data; called the prior probability. p(B GIVEN A) = probability of observing the data, given that the hypothesis is true; called the likelihood. p(B) = probability of observing the data unconditionally (whether the hypothesis is true or not). For example, suppose we have some prior information about T, a parameter of a model. This information is expressed as a probability distribution. Suppose the parameter T can take only a certain (finite) number of values. Then a probability greater than 0 can be attached to each value T , such that the total sums to 1. Now suppose some new data become available. This can be given in the form of: for each possible value of the parameter T, we can calculate the probability of observing the new data; denote these by L(data GIVEN T = Ti). This, the likelihood function, expresses all the information contained in the data. Then, by Bayes theorem, the posterior probability that T = Ti is given by: p(T = Ti GIVEN data) is proportional to L(data GIVEN T = Ti ). p(T = Ti ) But SUMi[p(T = Ti GIVEN data)] = 1, as the events T = Ti are exclusive and exhaustive. p(T = Ti GIVEN data) = L(data GIVEN T = Ti).p(T= Ti)/SUMi[L(data GIVEN T = Ti) . p(T = Ti)] equation 2.1.1 This is easily generalized to the continuous case (Lindley 1965a, p.118), as follows: If a parameter Y can take values y, then p(Y=y GIVEN data) = L(data GIVEN Y = y).p(Y = y) / SUMdy[L(data GIVEN Y = y).p(Y = y)]dy equation 2.1.2. In both the discrete and the continuous case, L is the likelihood, as defined above. One of the methods of estimating the parameter Y is to choose that value of y that maximizes the likelihood function, L(data GIVEN Y = y). Thus, of the various possible values of Y, we choose the value that makes the data observed have the greatest probability. This is called the maximum likelihood (ML) estimator. 2.2 THE BAYESIAN/FREQUENTIST CONTROVERSY There are two main schools of thought on probabilities. The first of these can be described as the frequentist school; a probability is defined as a limit. The statement that the probability of throwing a six with a fair die is 1/6 means that, as the number of throws tends to infinity, number of sixes 1 --------------- tends to - number of throws 6 The second school of thought is called Bayesian. Suppose we want to ascribe a probability to the hypothesis that an income elasticity is less than one. No frequency definition is possible, as there can be no repetition of the "event". The only meaning that a probability can have is as a degree of belief that the hypothesis is true. The distinction is explained more fully in Lindley (1965a, pp.30-41), Savage (1954) and Savage (1962). It should, however, be emphasised that both Bayesians and fequentists accept the results given in 2.1 above. If the Bayesian definition of probability is accepted, then the formulae of 2.1 above describe the way that beliefs should be revised in the light of new information. If the frequentist definition is used, then 2.1 shows how the estimates of relative frequency should be revised in the light of new information. One problem is that of the "prior probability". Where does it come from? There are various possible sources. 1. The prior probability may come from a previous posterior probability, and likewise the posterior probability calculated from Bayes theorem may be used later as a prior probability. 2. The prior probability may come from a subjective estimate. 3. The prior probability may come from information other than the data to be analysed. Many statisticians are dubious about the use of subjective sources for prior probabilities, arguing that each researcher could specify different prior probabilities, and so calculate different posterior probabilities. The results are therefore not reproducible, and so unscientific. Very few econometricians would entirely go along with this view, as is evidenced by the following. Econometricians tend to build assumptions into their models (by selecting which variables to use, by selecting which data to use, by constraining some of the parameters). Econometricians tend to reject a model if the estimated signs of one or more parameters disagree with their (subjective) prior notion. In this respect they are applying subjective prior probabilities. In any case, the problem is not all black and white. It is possible to specify subjective prior probabilities representing as much uncertainty as is desired, and such a prior would have very little influence on the posterior probabilities if there is significant information in the data to be analysed. It is even possible (as will be seen later) to specify a prior that has no influence whatsoever on the posterior probabilities. Thus, as a comment by Chatfield (given in Harrison and Stevens, 1976, discussion p.231) points out, you do not need to be a Bayesian to adopt the method of recursive Bayesian estimation. A Bayesian viewpoint does, however, help, in that the various concepts have a more intuitively appealing meaning. Econometricians rarely concern themselves with the Bayesian/frequentist controversy on probability. Econometric estimation is mainly concerned with the estimation of model parameters, such as a price elasticity. Probabilistic statements about the possible range of values that the elasticity could take (for example, -0.5 with a standard deviation of 0.1) would be difficult to interpret in a frequentist way, as the concept of repeated trials is clearly inappropriate (we might be discussing the price elasticity of wool in the UK in 1977, and there is no possibility of repeated trials). However the Bayesian concept of degree of belief fits very well; we could say that we are 68% confident that the elasticity lies between -0.4 and -0.6 ( using the above example, and assuming normality). Thus the Bayesian viewpoint is very natural to econometricians. This paper will henceforth adopt the Bayesian approach to probability and inference. Another advantage of the Bayesian approach is the simple and natural way that prior information can be added to the information contained in the data (which is expressed by the likelihood function (Maddala 1977 p.39)), using the equations of 2.1 above. The researcher is not, however, forced to specify prior information; the prior distribution can be as vague and diffuse as desired, or may even be distributed uniformly over the range (-infinity, infinity), an improper, or Jeffreys prior (Jeffreys, 1961). This is sometimes called great prior uncertainty.