'I'll tell thee everything I can; there's little to relate.' Through the Looking Glass, Lewis Carroll. 7. PAST APPLICATIONS OF THE KALMAN FILTER Nearly all the applications of the Kalman filter have been in process control; for example Battin (1962) on rocket navigation. More recently, Harrison and Stevens (1971, 1976) have brought the technique to the notice of statisticians in this country. 7.1 HISTORIC DEVELOPMENT OF KALMAN FILTERING. The difficulty in recounting the historic development of Kalman filtering is in knowing how far back to start. The recursive process shows us how we should modify our view of the world when presented with fresh information; this is something that has been done for a very long time. This paper shall however, begin to trace the Kalman filter from the 18th Century, and the story shall be kept fairly brief. There is not one line of development, of course, more a tangled web. The figure on the next page shows diagrammatically how the various lines fit together, with various published works used as signposts. 7.1.1 FILTERING The Reverend Thomas Bayes (1763) was the first to show how to combine new information with old information to produce updated estimates. Norbert Wiener (Wiener, 1949) invented the theory of filtering (i.e. separating signal from noise), although the practice of filtering goes back many years in the history of telephone engineering. Rudolf Kalman (1960) expressed the problem in terms of differential equations, which makes it possible to design an optimal filter. This result was then used extensively by control engineers, in controlling everything from rockets (for example, in orbit and trajectory estimation for the Apollo Moon mission) to chemical plants; much was published, mainly in engineering journals. It was not until somewhat later that statisticians began to publish papers using the Kalman filter; for example Whittle (1969). 7.1.2 LEAST SQUARES Another thread in the story is in the field of least squares. Gauss (1821) and Legendre (1806), both suggested the idea of minimising squared deviations; Gauss pointed out 'But of all principles ours is the most simple; by others we would be led into the most complicated calculations'. Galton (1886) first used the term 'regression' in papers describing the relationship between the height of the children and of parents. Plackett (1950) showed how least squares parameter estimates can be updated recursively, i.e. given the parameter estimates and their covariance matrix as estimated from n observations, Plackett gives formulae for updating these estimates and the covariance matrix to take account of an additional m observations. Gauss (1821) gives the formulae for one additional observation (according to Plackett). Young (1969) showed that Kalman's equations can be obtained by an extension of Plackett's work; this is of course also a demonstration of the equivalence of the least squares principle to the Kalman filter. 7.1.3 TIME SERIES FORECASTING Meanwhile, statisticians were developing non-causal time series forecasting (by which is meant the analysis of time series without recourse to explanatory variables). Kendall (1973) describes the following: Decomposition into trend, seasonal and error term Exponential smoothing Double and treble exponential smoothing Holt-Winters Moving averages Spectral analysis Harrison's seasonal model Box-Jenkins (more fully described in Box and Jenkins 1970) Harvey (1981a) and Harvey (1981b) describe (in addition to some of the above): Autoregressive processes Moving average processes Mixed processes ARMAX models Linear filters State space models and the Kalman filter All these methods have in common the fact that they do not estimate causal models but are forecasting methods only. Exceptions are the Box-Jenkins method, which does have a facility (very rarely used), called the transfer function for estimating the relationship between two variables, and the spectral analysis method which also has such a method (called cross-spectral analysis, and also very rarely used). For example, Maddala (1977) devotes five pages to Box-Jenkins methods, but does not even mention transfer functions. Armstrong (1978) has studied Box-Jenkins as a forecasting technique; he concludes that it is no better than simpler autoprojective methods, such as exponential smoothing. Armstrong has a similar conclusion about spectral analysis. In 1967, Trigg and Leach described a forecasting system using exponential smoothing in which the smoothing parameter is allowed to vary in response to a tracking signal; this is exactly what the Kalman filter does. The most recent development in time series forecasting is the method of Harrison and Stevens (1976); a generalisation of Harrison and Stevens (1971). This uses the Kalman filter to determine the local level and rate of change of a variable and whether the variable has just undergone a step change, or a transient; they call this method Bayes or Bayesian forecasting. Like Box-Jenkins and spectral analysis, there is a way of introducing causal variables, but this is rarely done. 7.1.4 ECONOMETRICS Econometricians have been estimating models with varying parameters for a long time (see Rubin 1950). The most famous of these is perhaps the Hildreth-Houck model (1968) in which the parameters vary randomly about a fixed point. Berg (1973) edited a special issue of 'Annals of economic and social measurement' devoted to time-varying parameters, in which the Kalman filter appears. Since then Athans (1974) published a paper advocating the use of Kalman filtering in econometric modelling, and Chow (1981) published a book which was mainly devoted to the control and optimisation of economic systems using the Kalman filter, but also had a chapter on the estimation of econometric models. Econometricians seem scarcely to have noticed this new technique. Of eight textbooks consulted (Maddala 1977, Johnston 1972, Armstrong 1978, Dhrymes 1970, Harvey 1981b, Zellner 1971, Judge et al 1982 and Common 1976), only Maddala and Harvey mention Kalman filters; Maddala devotes only half a page to the subject, while Harvey (1981b) mentions them in passing three times, and refers the reader to Harvey (1981a) for a fuller treatment. Thus, most of the papers published using the Kalman filter have been on control engineering, and nearly all the rest have been published in the statistics literature. This paper is an attempt to use the Kalman filter technique for estimating a causal model of wool and energy consumption, and to show how this model differs from the same model estimated by conventional regression techniques. 7.2 HARRISON AND STEVENS 1976 This paper would not be complete without some mention of the work done by Harrison and Stevens. They call their technique 'Bayesian Forecasting', but it has come to be called the Harrison-Stevens method, and this less confusing term shall be used in this paper. The Harrison and Stevens method was first published in Harrison and Stevens 1971, and generalised in Harrison and Stevens 1975A and Harrison and Stevens 1976. As described in Harrison and Stevens 1971, the method works as follows. At all times the system is in one of four conditions: - just after a transient (or outlier) - just after a step change - just after a slope change - no change in slope or level The Harrison and Stevens method attaches a probability to each of these four states, and also maintains estimates of the level and slope (rate of change) of the data, for each of the four states. Each time an observation is analysed, each state could move to any of the four states, so there are now 16 possible states, for each of which there are estimates of levels and slope. These are then collapsed to 4 possible states, with estimates of level and slope for each of them, and with an estimate of the probability of each. This process continues for each observation. The procedure is very easily extended to allow seasonality in the observations, and is also able to cope with causal variables (Johnston and Harrison 1980). This forecasting procedure has been marketed by Mr. Stevens as a private consultant, by SIA under the name of SHAFT, and by SCICON under the name SCYCIC. It is also available on the Manchester Business School computer. In Harrison and Stevens 1976, the general Kalman filter model (equations 6.1.1 and 6.1.2) is shown to reduce to (with various different assumptions): A regression model with static parameters A regression model with random-walking parameters A model for data with no upward or downward trend A model for data with a linear time trend An additive seasonal model A seasonal model using periodic functions Autoregressive models Moving average models Smoothing This paper explores the first two of these. 7.3 BOROOAH AND CHAKRAVARTY, 1978 Borooah and Chakravarty explored the relationship between building society behaviour and activity in the market for new private dwellings. They set up a fairly complex model with over a dozen endogenous variables, and estimated them over the time period 1956 Q1 to 1975 Q4, using OLS. Three of the equations (those describing the individual's behaviour; housing starts, housing completions and net change in building society deposits) were selected as being most susceptible to change in their parameters. These three equations were re-estimated over the period 1956 Q1 to 1969 Q4, and the resulting values were used as the starting values for the Kalman filter. For both of the OLS estimations, the parameters are correctly signed and significantly different from zero, but the magnitude of some parameters are very different, reinforcing the belief that the parameters have shifted. When the Kalman filter is applied, parameter estimates are produced for each quarter, for each parameter, and these are tabulated. Some of the parameters are fairly stable, but some of them move substantially, it is possible to see approximately when major changes in the parameters happen. Finally, Borooah and Chakravarty tabulate the residuals from OLS and from Kalman filtering; in two of the three equations there is a dramatic reduction. The root mean square of these residuals is shown below. OLS KALMAN FILTER Equation 1 81.55 2.96 Equation 2 5.02 8.87 Equation 3 4.82 0.34 The improvement in the fit is startling for equations 1 and 3, and more than compensates for the deterioration in equation 2. The values of V and W (the variances attached to the observation model 6.1.1 and the parameter model 6.1.2) are not reported. Without knowing V and W, and the priors used for the parameters, it is not possible to reproduce or evaluate the work. In an unpublished update of this work Borooah and Chakravarty find that in four out of six cases, the filtering technique produces smaller forecast errors. Their recommendation is to re-specify and re-estimate models infrequently; between these revisions to use the Kalman filter to incorporate the new information. They find that the sample fit improves, and prediction gets better, unless there are sudden changes in the direction of data series. 7.4 MEADE 1979 Meade uses the Kalman filter to estimate a model which relates two variables, X and Y, by first using univariate methods on each of the two series, and then using the forecasting errors on each of these variables. The two sets of errors are then linked by a linear model. Errory,t = SUMi(Wi,t qi,t errorx,t-i) Where Wi,t is a coefficient qi,t is one with probability pi zero with probability 1-pi The effect of this is to give a variable lag between Y and X. Meade's results (using simulated data) give good estimates of the lag and when the lag changes, it very soon picks up the new lag. Meade goes on to analyse data given by Box and Jenkins 1970. He reports an improvement in forecasting ability over the univariate model; the error for 1 step ahead forecasts is reduced from 1.5 to 0.5. 7.5 JOHNSTON AND HARRISON 1980 Johnston and Harrison estimated a model of cider sales using Harrison and Stevens (1976) method. The model used was (my notation): Log Qt = log At + log St + B1 Weathert + B2 Pricet + transfer response + ut Where Qt is cider despatches in month t At is the current estimate of the intercept St is a seasonal factor Weathert = a measure of how good the weather was in month t Pricet = price of cider in real terms. Transfer response = B3(Pricet+1 - Pricet)+B4(Pricet - Pricet-1)+ B5(Pricet-1 - Pricet-2)+ B6(Pricet-2 - Pricet-3) This is not entirely a causal model as in addition to the causal price and weather variables there is the term At, which is an intercept that moves along a time trend which is fitted locally. The whole thing is estimated using the method described by Harrison and Stevens (1976), with the observation noise variance estimated using a method developed by Cantarelis (1979). The model performed rather better then a similar model without the causal variables; the root mean square error was 13% less. The aggregate sales of the company were also forecast using this system 12 months forward; the errors were 0.13% compared with 6.9% using a model without the causal variables. 7.6 HUGHES 1980 Hughes (1980) estimated a model of petrol (gasoline) consumption, using as explanatory variables the price of petrol, per capita income, consumer prices, and a dummy to represent the effect of government restrictions; quarterly data from 1969 to 1979 were used. The model was estimated using OLS and the Kalman filter; Hughes reported that there were distinct changes in the parameters in the Kalman filter model; the post-1974 price elasticity is about 50% greater than the pre-1974 price elasticity. The mean square error for the Kalman filter model is reported as being 19.5% less than the mean square error of the OLS estimated model. The Kalman filter model is closer to the actual then the OLS model for 61% of the observations. The Hughes paper is also interesting for the way the initial estimates of the parameters are obtained, which was first suggested by Athans (1974). The Kalman filter requires a prior estimate for the mean and variance of the parameters. Athans suggested, and Hughes has followed this, that the data be analysed by OLS, and the parameters (and their variances) estimated by the OLS model should be used as the prior to the Kalman filter. But this is not prior information; it is information extracted from the data to be analysed. Prior information is simply information that is available before the time series is analysed. Such information is usually available; if there is genuinely total ignorance about the parameters of the model (perhaps the analyst has not been told what the figures represent) then great prior uncertainty could be used (a very large variance). In this case there is not this excuse, as Hughes refers to a comprehensive survey of price and income elasticities of demand for gasoline in New Zealand. Hughes paper is also rather odd in another respect. Although he claims to use time-varying parameters, in fact his model for updating the parameters is Bt = Bt-1 W = 0 (using my Kalman filter notation). Thus there is no systematic variation in the parameters, and no down-weighting of old data. In fact, the only difference between Hughes version of the Kalman filter, and OLS, is that the Kalman filter model is estimated recursively and so displays the changing parameters period by period (and uses these changing parameters to forecast; this is probably why the forecasts produced by the Kalman filter are a bit better than those produced by OLS). The OLS procedure, on the other hand, conceals the year-by-year estimates of the parameters behind an average. 7.7 BURMEISTER AND WALL, 1982 Burmeister and Wall (1982) set up a model of unobserved rational expectations for the German hyperinflation, using the Kalman filter. Their technique involves treating V and W (using my notation) as unknowns (in addition to the unknown structural parameters B). They then perform an optimization over V,W, and B to minimize a function of the one-step-ahead prediction errors. The estimation failed for a model in levels; Burmeister and Wall say that this was because of severe negative correlation between two of the model parameters. A model in first differences also failed because of serial correlation in the residuals. A more elaborate model of the money supply process cured this. They tested the sensitivity of their estimates to changes in the prior parameter variance; they found that their estimates were highly insensitive to the choice of prior variance. They also estimated a model in which W = 0. This leads to substantial differences in the expectations parameter, but not in the others. Their conclusion is that rational expectations do not always lie on a convergent path. 7.8 MCWHORTER, NARASIMHAM AND SIMONDS, 1977 McWhorter et al. (1977) examine the forecasting performance of six non-causal models and four causal models estimated by various methods including the Kalman filter. The models are all Klein's model 1 (Klein, 1950). The estimation techniques were: 1. Forecast of no change. NC 2. Simple exponential smoothing ES 3. Second order exponential smoothing ES-2 4. Box-Jenkins BJ 5. No change on first differences NC-FD 6. Simple exponential smoothing, first diff. ES-FD 7. Ordinary least squares OLS 8. Two stage least squares 2SLS 9. Three stage least squares 3SLS 10. Kalman filter KF The ten techniques were evaluated by making ex-ante forecasts (as in DSSE, defined in chapter 5). The econometric techniques (OLS, 2SLS, 3SLS, KF) compare very unfavourably with the time-series techniques. The no change and the exponential smoothing model compare remarkably well with the Box-Jenkins model, but no one method dominates. Among the econometric techniques, the Kalman filter does slightly better than OLS and much better than 2SLS and 3SLS for 1 quarter ahead forecasts. For 4 quarter ahead forecasts, the Kalman filter is soundly beaten by all the other techniques. The coefficients estimated by the various econometric methods are not reported, but are reported as being very different. The V and W of the Kalman filter are not reported either, so it is difficult to comment on the model. 7.9 MITCHELL, 1982 Mitchell (1982) uses the Kalman filter on money multiplier observations for two different definitions of money supply. He estimates the shocks hitting the different definitions of money supply, and shows how these estimates can be used to revise the target values of money supply. V and W are not reported. His conclusion is that it is optimal to use a Kalman filter on these observations to infer the shocks to velocity. 7.10 MCNELIS AND NEFTII, 1982 McNelis and Neftii (1982) estimate two single-equation partial models of the US economy using the Kalman filter, so as to be able to examine the way the models' parameters vary over time, and to assess whether there is any causal relationship between these parameters and changes in policy variables. They find that the models' parameters do indeed vary over time with a particularly large change in 1974-1975, ascribed to the 1974 oil price increase. They find that there is a causal relationship between several policy variables and the time-varying parameters in one of the equations. 7.11 AOKI, 1982 Aoki (1982) shows that when the parameters of a model are time-invariant, it can be put into ARMA (autoregressive-moving average) form or Kalman filter form; the relative merits of these two formulations are not clear-cut. He then shows that the calculation of a response to an impulse is very simple using the Kalman filter. He concludes that the Kalman filter is particularly appropriate for modelling dynamic, interdependent economic systems. 7.12 KURE, 1983 Kure (1983) describes the use of the Kalman filter to monitor the stability parameter (called metacentric height) of floating platforms. Information about the angle of roll and the rate at which that angle changes is input to a Kalman filter. The estimation process leads to an estimate of metacentric height in a very short time (measured in minutes). The Kalman filter is particularly appropriate for this work, as data are arriving separated by only short intervals, and it is highly desirable to have a very up-to-date estimate of metacentric height. While a platform is being loaded, the metacentric height changes. Up till now, the only way of keeping track of it has been to keep account of the weights being loaded, unloaded and repositioned. With this new system, a continuously updated estimate can be presented for shipboard decision-making. 7.13 FILDES, 1983 AND FILDES, 1982 The 1983 paper is a summary of the 1982 paper. Fildes (1982) is an comparison of the Harrison-Stevens forecasting method (Bayesian Forecasting) with de-seasonalized single exponential smoothing. Fildes emphasises the importance of comparing forecasts made outside the sample used for fitting (ex ante forecasts). He also points out that it is necessary to test each method on many time-series. Harrison-Stevens compares unfavourably with exponential smoothing for most forecast lead time and for most measures of forecasting accuracy (table 4.6 of Fildes, 1982). Another comparison is among three different implementations of Harrison-Stevens, and exponential smoothing. Again, exponential smoothing outperforms Harrison-Stevens. 7.14 MAKRIDAKIS ET AL. 1982 1001 time series were forecast using 24 different methods, (including Harrison-Stevens) and the ex-ante forecasts were evaluated using five different measures. The paper presents a large amount of information about the performance of the various forecasting methods; an attempt will be made to summarize some of the results on Harrison-Stevens. Using mean absolute percentage error as the measure, Harrison-Stevens is inferior to such simple techniques as deseasonalized single exponential smoothing (DSES) for forecasting horizons from 1 to 18 periods, but does better than many other methods. Using mean square error as the measure, Harrison-Stevens does well for up to 6 period ahead forecasts, but for longer time-horizons it is out-performed by the simpler techniques. Looking at the average ranking of the methods as the measure, Harrison-Stevens is again outperformed by DSES for more than 6 period ahead forecasts, but Harrison-Stevens does better than most other methods. A very interesting measure is the percentage of the time that a naive forecasting method outperforms the various techniques. Again Harrison-Stevens is more often outperformed by a naive method than is DSES when the naive method consists of "forecast = last actual"; the opposite is true when the naive method is used on deseasonalized data. The simple methods tend to do better than the sophisticated methods (such as Harrison-Stevens) on micro data; the reverse is true for macro data. Overall, eight methods were named as being more accurate than the others; Harrison-Stevens was one of these. 7.15 HARVEY, 1983 The Kalman filter is used to develop a unified framework within which various extrapolative forecasting procedures can be fitted. The exponentially weighted moving average procedure is expressed in Kalman filter form, and in this form its properties are examined. Double exponential smoothing and Holt's smoothing method are also put into Kalman filter form, and it is shown that seasonality terms can be added easily. He then goes on to treat causal models within the Kalman filter framework. In particular he shows how data irregularities (such as missing data, or the aggregation of two or more observations) can be treated very easily in this framework. 7.16 MCWHORTER, SPIVEY AND WROBLESKI, 1976 McWhorter et al. examine the problem of specifying H, V and W and the starting value for B (using the notation of my paper). They point out that these are not generally known in an econometric context (in an engineering context they might be known, as, for example, in the ballistics field, where H would represent the laws of motion; V might represent the accuracy of the radar). These parameters can, therefore, be misspecified, and McWhorter et al. examine the consequences of misspecification, using Monte Carlo simulation. The conclusion is that Kalman filtered estimators (the estimates of B) are sensitive to misspecification of H and the starting value for B. In particular, if H is not the identity matrix, but is specified to be, the effects on the estimation of B are serious. The accuracy of forecasts of Y, however, is robust to misspecification. The accuracy of forecasts of Y is most influenced by the relative sizes of V and W. 7.17 SUMMARY Considerable work has been done using the Kalman filter for parameter estimation, but not a great amount in the field of econometrics. V and W and the prior for B are rarely reported, which makes it hard to comment fully on the results. The work described in this paper develops the Kalman filter in the direction of causal modelling, and tries to find where the Kalman filter has an advantage over OLS. Some of this ground has been covered in the papers listed above, but many of these papers were published after the first draft of this paper was prepared (e.g. Harvey, 1983). Views on the relative merits of the Kalman filter and other forecasting methods remain mixed. While some of the papers summarised in this chapter show enthusiasm for the Kalman filter, some of those which report comparative studies reveal that it can be inferior to longer established and more simple techniques.