'The trial's beginning!' Alice's Adventures in Wonderland, Lewis Carroll. 8. THE BENEFITS OF KALMAN FILTER MODELS COMPARED TO CONVENTIONAL METHODS; A MONTE CARLO STUDY. Theoretical considerations explained in previous parts of this paper, the results of part 5, and the results of other researchers as given in part 7, all lead us to believe that the Kalman filter may provide better fitting models than conventional regression methods, in terms of forecasting ability. It would be more useful, however, if we knew how close the parameter estimates were to reality, and whether the parameters estimated by the Kalman filter were any closer to reality than the parameters estimated by least squares. So we need to know what the parameters are in reality, and this, sadly, we do not know. But it is possible for us to know the parameters of a model which does not come from reality; we can simply specify the parameter values and generate the data from this model. To simulate the errors in observation and the fact that more things affect the dependent variable than the explanatory variables used in the model, random numbers will be added to the simulated data. A rather pedantic point; the random numbers are in fact pseudo-random, as they are generated by a deterministic process, N=(N*69069+1)mod 232. The high order 24 bits of N are then converted to floating point. This is the pseudo-random number generator provided as a system subroutine with the DEC VAX computer (see the VAX-11 Fortran language reference manual). This paper shall, however, continue to call them random numbers for brevity. This procedure produces uniformly distributed random numbers in the range (0, 1). These can be transformed into normal random numbers distributed as N(0,1) by the Box and Muller (1958) method. normal1 = (-2 log uniform1)0.5*sin(2.pi.uniform2) normal2 = (-2 log uniform1)0.5*cos(2.pi.uniform2) where uniform1 and uniform2 are two random numbers uniformly distributed in the range (0,1) and normal1 and normal2 are two random numbers normally distributed as N(0,1), pi is 3.14159..... logs are natural logarithms. This algorithm is given by Box and Muller, 1958. A simple transformation will convert these normal random numbers with zero mean and unit variance to normal random numbers with mean A, variance B2 , N(A,B2) = A+B N(0,1) If the stochastic properties of the data and parameters are known then perhaps it should not be necessary to simulate OLS and the Kalman filter. Fildes (1983) says that it has proved difficult to understand the Harrison-Stevens model's performance in all except the simplest circumstances if only analytical methods are used. A major problem is that an analytic calculation of, for example, DSSE is very difficult for models such as those used in this paper. The forecast performance depends not only on the stochastic properties of the data and parameters, on V, on W, and even on the values of the explanatory variables. When this analysis has then to be applied to several time periods, the algebra becomes very complex. It is far easier to simulate the estimation many times, and summarize the achieved values of DSSE; as will be seen later in this paper, 100 simulations are sufficient to show a significant difference between OLS and the Kalman filter. Cantarelis (1979) used simulation to test the Harrison-Stevens model's sensitivity to some of its parameters. Some calculations will be made, however. Suppose the parameters at time t are known precisely. Then the forecasting error will be given by the additive noise of equation 8.1.1, and so will have a distribution N(0,0.0009). The squared error, therefore has an expected value of 0.0009. Because the DSSE is the sum of the squares of four normal random variables with variance 0.0009, DSSE/0.0009 must be distributed as a chi-square distribution with 4 degrees of freedom. Such a chi-square distribution has a mean of 4 and a variance of 8, so the mean and variance of the DSSE must be 0.0036 and 0.0072, respectively. The simulations done later in this chapter use 100 runs; the variance to be expected in the observations of DSSE is therefore .0072/100, and so the expected standard deviation is .0085. This analysis, of course, yields an underestimate of DSSE. The parameters will not be known precisely, and so the error in the parameter estimates will contribute to the DSSE. The extent to which this is so will depend on the accuracy of the parameter estimates, which in turn depends on the estimation method, the number of observations analysed and even the values of the explanatory variables. In practice (as will be seen in simulation 1 below) this analytically derived value of DSSE turns out not to be a very great underestimate for OLS, but a rather greater underestimate for the Kalman filter. When the parameters drift (simulation 2 to 6) the forecasting error will be made up of the additive noise as above, plus the noise on the parameters, multiplied by the values of the explanatory variables, so should be expected to be much greater. But of course the explanatory variables themselves are changing, which greatly complicates the problem of calculating even an approximate distribution for the DSSE. Very rough calculations (making rough approximations to the size of the cahnges in the explanatory variables) point to a DSSE whose mean is several times as great as when the parameters do not change, depending on which simulation is examined. 8.1 THE MODELS The model for generating the data will be rather like the model used in Solomon, 1980. Log NDCt = B1 + B2logPCEt + B3log PW/PSt-1 + et equation 8.1.1 Where NDCt is wool consumption in year t PCEt is consumer expenditure in year t, in real terms PW/PSt is the price of wool deflated by the price of synthetic fibre, in year t et is a random number distributed N(0,V) There are 19 observations of the data in each simulation run. The parameters of the model, B1, B2 and B3, will be set to 0, 1.0 and -0.5 respectively in year zero. Thereafter, they will be allowed to drift, according to the following process: Bit = Bit-1 + fit equation 8.1.2 where fit are random numbers, distributed N(0,Fi) When estimating a model using OLS, one would normally specify the model in a way such that the parameters are expected to remain constant. If it then turns out that the parameters drift, one would hope that a Durbin-Watson test of the residuals would alert one to the danger of a misspecified model by warning of autocorrelation. But the Durbin-Watson test is very often inconclusive; also autocorrelation can be caused by other factors. Davidson et al. (1978, p679) states that measurement errors are not likely to cause autocorrelation. The contention is that there is no substitute for a well specified model. But an estimation technique that leads to better forecasts in the presence of some misspecification is obviously desirable. As was indicated in 1.2.1 above, the use of preliminary data may also raise problems in the use of OLS. If preliminary data are given equal weight with previous data, this violates the OLS assumption of heteroscedasticity. The results may suggest re-specification, which final data may not support. The Kalman filter is a means of avoiding this difficulty. V will be set to 0.0009, representing a standard deviation of .03. The figures for the parameters of the models B1, B2 and B3 come from Solomon (1980); B1 is of course just a scaling factor, and might as well start at zero, B2 is the price elasticity and will start at -0.5. But the work of chapter 5 of this paper shows that the price elasticity has changed substantially over time, and if this is true then it may also be true that the other parameters have changed. The table in part 5.4.2 shows that an rBe estimate of the price elasticity could be half of the OLS estimate; this is because the rBe estimate is local to the end of the time-series, while the OLS estimate is (roughly speaking) an average over the whole time series (which is centred on 1969). Thus the price elasticity can halve in 9 years, from 0.5 to 0.25. If x1, x2,... xn are each distributed independently normally as N(0,s2), then x1+x2+...xn (where n is a constant) is distributed normally as N(0,ns2). So if the values of Bit drift as described by equation 8.1.2, after n years, Bin would be distributed as N(Bi0,nFi). To make a change of .25 over 9 years to be a fairly easy (but not too easy) change, we shall set (9 F3)0.5 = .25 F3 = .007 This will be rounded up to set F3 = .01. The same figure will used for F2, allowing the income elasticity to move with a similar amount of freedom. F1 will be given rather more freedom, and will be set to 0.2. Thus 8.1.2 becomes Bit = Bit-1 + fit with fit distributed as N(0,F) and F = (0.2 0 0 ) ( 0 .01 0 ) ( 0 0 .01) 8.2 THE RESULTS The data generated by the process described in 8.1 were used to estimate a model using ordinary least squares, and the same model was estimated using the Kalman filter, with three different values for W (the parameter controlling the rate at which data lose relevance). The three estimations using the Kalman filter will be referred to as KF1, KF2, KF3. When estimating KF1, KF2 and KF3 the prior means for all of the B's were 0.0, the prior variances were 100 (representing great prior uncertainty). For each of these estimations the dynamic sum of squared errors (DSSE) was calculated, using forecast for the last four periods. This procedure was repeated 100 times (a number sufficient to demonstrate statistically significant improvement in forecasting ability without undue use of computer time; for the significance of the stopping rule in statistical experiments, see Savage 1954). The mean value and standard deviation the the DSSE was calculated for this sample. KF1 KF2 KF3 0 0 0 .01 0 0 .1 0 0 W = 0 0 0 0 .001 0 0 .01 0 0 0 0 0 0 .001 0 0 .01 V = .0009 For the first simulation, the data were generated with static parameters; the Bs were constant, as F1 = F2 = F3 = 0. The model was estimated using OLS, KF1, KF2 and KF3. Table 9 SIMULATION 1 OLS KF1 KF2 KF3 Mean DSSE .0042 .0041 .0059 .0064 Standard deviation of DSSE .0032 .0031 .0045 .0048 KF1 is only slightly different from OLS; the mean DSSE's are almost equal. This is as expected, as with zero W the KF is equivalent to OLS. The slight difference is caused by the value of V input to the Kalman filter being different to the variance estimated by OLS. KF2 and KF3 are markedly worse than OLS. The mean DSSE for KF2, 0.0059 is (.0059 - .0042) --------------- = 5.3 standard deviations away .0032 / 1000.5 from the value expected if the KF2 DSSE values were drawn from a population with the same mean and variance as the OLS estimates. This result is not surprising. The parameters of the model were in fact static, and so the assumption that they could drift has led to worse forecasts. The mean DSSE is not much greater than the value calculated above on the assumption of perfect parameter estimates. We would expect the Kalman filter to do better when the data are generated with parameters that are not static. The second set of simulations uses the F's suggested above (.2, .01, .01). Table 10 SIMULATION 2 OLS KF1 KF2 KF3 Mean DSSE .0212 .0209 .0128 .0141 Standard deviation of DSSE .0232 .0230 .0120 .0134 If the Kalman filter results were in fact drawn from a population with the same mean and standard deviation as the OLS results, the expected value of the mean DSSE would be 0.0212 with a standard deviation of .00232 about this mean. KF2 is 3.6 standard deviations from this value, so we can say with a high degree of confidence that the KF2 DSSE's are, on the average, better than the OLS. The KF3 mean DSSE is 3.1 standard deviations from the OLS value, again a significant difference. To explore the stability of this result to different values of F, a third simulation was done, with F set to (0.1, 0.01, 0.01). Table 11 SIMULATION 3 OLS KF1 KF2 KF3 Mean DSSE .0229 .0225 .0127 .0139 Standard deviation of DSSE .0294 .0293 .0113 .0123 If the Kalman filter results were in fact drawn from a population with the same mean and standard deviation as the OLS results, the expected value of the mean DSSE would be 0.0229 with a standard deviation of .00294 about this mean. KF2 is 3.5 standard deviations from this value, so we can say with a high degree of confidence that the KF2 DSSE's are, on the average, better than the OLS. The KF3 mean DSSE is 3.1 standard deviations from the OLS value, again a significant difference. For simulation 4, F was set to (0.05, 0.01, 0.01). Table 12 SIMULATION 4 OLS KF1 KF2 KF3 Mean DSSE .0219 .0218 .0118 .0128 Standard deviation of DSSE .0215 .0215 .0102 .0111 If the Kalman filter results were in fact drawn from a population with the same mean and standard deviation as the OLS results, the expected value of the mean DSSE would be 0.0219 with a standard deviation of .00215 about this mean. KF2 is 4.7 standard deviations from this value, so we can say with a high degree of confidence that the KF2 DSSE's are, on the average, better than the OLS. The KF3 mean DSSE is 4.2 standard deviations from the OLS value, again a significant difference. To test the effect of changing the F governing the rate at which the income elasticity changes, F was set to (.1, .005, .01). Table 13 SIMULATION 5 OLS KF1 KF2 KF3 Mean DSSE .0218 .0216 .0105 .0113 Standard deviation of DSSE .0234 .0231 .0085 .0095 If the Kalman filter results were in fact drawn from a population with the same mean and standard deviation as the OLS results, the expected value of the mean DSSE would be 0.0218 with a standard deviation of .00234 about this mean. KF2 is 4.8 standard deviations from this value, so we can say with a high degree of confidence that the KF2 DSSE's are, on the average, better than the OLS. The KF3 mean DSSE is 4.5 standard deviations from the OLS value, again a significant difference. For simulation 6, the first element of F was changed by an order of magnitude; F was set to (.01, .005, .01). Table 14 SIMULATION 6 OLS KF1 KF2 KF3 Mean DSSE .0180 .0178 .0111 .0120 Standard deviation of DSSE .0217 .0214 .0122 .0137 If the Kalman filter results were in fact drawn from a population with the same mean and standard deviation as the OLS results, the expected value of the mean DSSE would be 0.0180 with a standard deviation of .00217 about this mean. KF2 is in fact 3.2 standard deviations from this value, so we can say with a high degree of confidence that the KF2 DSSE's are, on the average, better than the OLS. The KF3 mean DSSE is 2.8 standard deviations from the OLS value, again a significant difference. 8.3. CONCLUSIONS When parameters drift (simulations 2 to 6) the Kalman filter gives better forecasting performance than OLS; the OLS DSSE's are about 50% worse than the KF2 and KF3 DSSE's in those simulations. The fact that this is true for all those simulations indicates that this result is not confined to one particular set of F's, or to one particular W-matrix. When parameters do not drift, the Kalman filter is inferior to OLS; the DSSE's of KF2 and KF3 in simulation 1 are 50% worse than the DSSE of OLS.