'Would you tell me, please, which way I ought to go from here?' 'That depends a good deal on where you want to get to,' said the Cat. Alice's Adventures in Wonderland, Lewis Carroll 11. SUGGESTIONS FOR FURTHER WORK As the Kalman filter has scarcely been used by econometricians there is a large number of possibilities that have not been explored, and this paper has done little more than scratch the surface. This chapter will suggest areas which look as if they might repay some research. 11.1 USING THE KF PROGRAM The Kalman filter program is printed in Appendix D. It is written in FORTRAN to run on a DEC VAX computer, but was originally written for an HP 3000 and converted. The conversion was easy, so it should be possible to convert it to any other machine fairly easily. A machine readable version is available from the author of this paper. The program is well commented, and it is fairly easy to follow; the notation used in the program is that of Harrison and Stevens, 1976. A new subroutine, GETDATA should be written for each problem. This subroutine has three functions. The first time it is called, it does all problem- dependent initialising. It must set MVARS (less than or equal to 20), the number of dependent variables, and NPARS (less than or equal to 24) the number of parameters (including constants, if required). IFMATS controls the printing of each step of the recursive process (IFMATS = 0 suppresses most of it). The matrix G (H in this paper; describing how the paramters are forecast from the previous estimate) is usually set to the identity; whatever value is required, it should be set here. The initial means and variances of the parameters are read or set. Any files that need to be read by the program should be opened. NTIME (the number of time periods to be analysed) should be set, and also NFORC (the number of forecasts. IFFMATS controls the printing of the final values of the matrixes; zero suppresses it. GETDATA is then called befor each iteration until ITIME = NTIME and it must read in the dependent and explanatory variables, and do any necessary transformations on them. It must also read or calculate V and W. GETDATA is then called for forecasting (if forecasting is required). Values for the explanatory variables and for W must be read. V may be set to zero, or to a non-zero value to get a better idea of the forecasting accuracy. The dependent variables need not, of course, be set. The forecast is put into the array YHAT. GETDATA is problem specific, but it is the only problem specific part of the program. Program limits are: 20 dependent variables and 24 parameters. There is no limit on the number of observations, as they are read and processed one at a time. 11.2 SYSTEMATICALLY VARYING PARAMETERS This topic was introduced in part 3.7.1 of this paper, but with only one parameter. When there are multiple parameters, there is the possibility of interaction between them. This is commonly used in rocket control systems (Battin 1962) where the parameters to be estimated include position and velocity; the position parameter changes systematically according to the velocity. This facility might be useful in an econometric model where parameters interact. For example, if there are two parameters representing the current level and current rate of change of a variable, then clearly the current level changes systematically in accordance with the current rate of change. Another example of interaction of parameters might happen in a model of a country's economy, when a parameter representing the percentage of GDP invested might partly depend on a parameter representing the anticipated rate of GDP growth. Berg (1973) contains several papers on time-varying parameters, but is mostly concerned with stochastically changing parameters. Maddala (1977) has a section on time varying parameters. Very little work has appeared in the econometric literature on systematically changing parameters; this may be because no simple technique is generally available for coping with them, rather than because of a positive rejection of them. Now that such a technique is available, perhaps such models will emerge. 11.3 FORECASTING A way of using the KF to prepare forecasts is to use the standard KF algorithm, but with the gain (G) of the filter set to zero at each iteration. This gives us the opportunity to set v to a finite quantity without the normal consequence of the dependent variable affecting the parameters. The uncertainty attached to the forecasts is given by equation 6.1.2.5 in the form of a variance - covariance matrix. The off-diagonal terms of this matrix describe how the forecasts of each of the dependent variables are related to the other dependent variables; this matrix is not usually available when conventional econometric methods are used. If non-zero W is used, the parameters (and hence the forecasts) will get less and less precise as we look further into the future. This is surely right, as our uncertainty about the future should increase with distance. I shall now address the problem of how to treat residuals (see parts 1.1.5, 1.2.5 and 3.6.5) when preparing forecasts and the problem that our forecasts might be better than the preliminary data (Ball 1978, p17). 11.3.1 HOW TO TREAT RESIDUALS One reason given in the literature for treating residuals in a special way is that it is very expensive (in computer time) to completely re-estimate a large simultaneous equation system using (for example) full information maximum likelihood, when additional data becomes available. This cost is greatly reduced if the Kalman filter is used, as it is not necessary to re-analyse all the data. Only the new data need be processed using the previous parameter estimates and variances as priors. Another reason given in the literature is that the latest residuals may seem to indicate that there has been a structural shift in the parameters. The Kalman filter can treat this by having a non-zero W at the time when the structural shift appears. Another situation calling for special treatment of the most recent residuals is when it is felt that some special circumstances, which will not persist in future, are causing large residuals. Then the special treatment called for is to ignore the implication of the residuals (i.e. set V very large). 11.3.2 PRELIMINARY DATA LESS ACCURATE THAN FORECASTS Ball (1978) cites an example in the London Business School macro-economic forecasts of the forecasts being more accurate (i.e. closer to the final data) than the preliminary data. At the International Wool Secretariat, in February of each year, very preliminary estimates of wool consumption are made for the previous year for eight countries. These estimates are based on data for the first two quarters of the year and on preliminary data for the third quarter. Wool goods are, of course, bought more heavily in winter than in summer, so the fourth quarter has a disproportionately large share of the year's wool consumption. The first year that these estimates were made, they turned out (when final data became available) to be less accurate than the forecasts (made shortly before) and given the problems of making the estimates (some of which are quoted above) this is not very surprising. The second year that these estimates were made, I had a problem - when I came to prepare the forecasts of wool consumption, I could: a) ignore the very preliminary data. b) treat the very preliminary data like all the other data. I could see that both these alternatives had drawbacks. Ignoring the data would be throwing away useful information (even thought the very preliminary data had less information in them than the final data, we still felt that they contained some information. Treating the very preliminary data like all the other data would surely be giving excessive weight to some very inaccurate numbers. I thought about this problem on and off for several months, moving towards some way of taking a weighted average of the model-based forecasts and the very preliminary data, using the accuracies of the two numbers as weights. It is from this problem, and this intuitive solution to the problem, that this paper was born. The solution to the problem, in the terms used by this paper, is to choose a V for the very preliminary data that describes its imprecision. The Kalman filter will then give it a weight accordingly, and will weight the forecast for the same time-period according to the estimated forecasting accuracy. 11.4 FORECASTING WITH UNCERTAINTY IN THE EXPLANATORY VARIABLES When calculating the uncertainty in the forecasts, the uncertainty in the explanatory variables should obviously be taken into account, especially when the explanatory variables themselves are forecasts, in which case there can be very large uncertainties. Harrison and Stevens, 1976, show how the case of uncertainty in the forecasts in the explanatory variables can be treated. Suppose the explanatory variables, X at some future time can be represented by a normal distribution, N(Xm, XC ) Then the forecasts of Y at the time t is given by: Yt = Xm.MFt and the variance attached to this forecast is given by: Var(Yt) = Xm CFt XTm + Vt + Z where Z = Zij using Zij = trace[XCij (MFt MFTt + CFt) ] Z is the additional uncertainty arising from the uncertainty attached to X. 11.5 ESTIMATION WITH UNCERTAINTY ATTACHED TO THE EXPLANATORY VARIABLES Just as there can be uncertainty attached to the dependent variables, so there can also be uncertainty attached to the explanatory variables. Johnston, 1972, p. 281 call this "errors in variables", and shows that applying ordinary least squares results in biased and inconsistent estimates. As OLS has been shown to be equivalent to Kalman filtering, this must mean that the Kalman filter yields biased and inconsistent estimates also. However, it may be possible to devise a recursive estimation process that will cope with stochastic explanatory variables, simply by applying Bayes theorem repeatedly, and using the standard results for the variance of a product and a quotient. We shall follow the scheme of 3.2. Equations 3.2.1 and 3.2.2 are unchanged. BFt = BEt-1 equation 11.5.1 Var(BFt) = Var(BEt-1) equation 11.5.2 We shall suppose that Xt is distributed normally with mean XMt and variance XCt, and that XMt and XCt are known at time t. Then the forecast of Y is (using the usual result for the variance of a product) YFt = BFt . XMt equation 11.5.3 Var(YFt) = (BFt)2 XCt + (XMt)2 . Var(BFt) + BFt . XMt . Cov(BFt, Xt) equation 11.5.4 The covariance term is needed because although XMt is known, Xt is a random variable. The observation of B is given by : BOt = YOt / Xt equation 11.5.5 And the variance of B is given by the formula for the variance of a quotient. Then BEt and Var(BEt) can be calculated exactly as in equations 3.2.7 and 3.2.8, and if required, YEt and Var(YEt) can be calculated exactly as in 3.2.9 and 3.2.10. 11.6 ESTIMATING V In this paper, V has been treated as known; i.e. Vt is known at time t for all t. But suppose V is not known ? Then four ways of estimating V can be listed. The first way is by tracking the forecasting error of the Kalman filter. At time t, the mean squared forecasting error over time = 1 to t-1 is known, and an estimate of Vt could be based on this mean. This has the drawback of not distinguishing between observation error (V) and model error (W), as both of these would contribute to the forecasting error. The second way is that suggested by Athans (1974). This is to estimate the model using ordinary least squares, then use the mean squared errors as V in a Kalman filter. The third way is to treat V as a parameter of the model to be estimated. In the multivariate case, write V = aI (or some more complicated multi-parameter function). A range of values that span all likely a's should be chosen (for example, .0001, .0002, .0005, .001, .002, .005, .01, .02, .05, .1, .2, .5, 1). Each value is assigned an initial prior probability pi, i = 1 to 13, and as each data point arrives the probabilities are recursively updated. This procedure is described by Harrison and Stevens, 1976, p.221. Cantarelis (1979), Cantarelis and Johnston (1979) and Mehra (1970) describe methods of estimating V (a problem which they call "on-line variance estimation"). McWhorter et al. (1977) use an approximate generalized least squares procedure for estimating V, W and B0, by making a rather restrictive assumption about the diagonal elements of W (it is assumed that they are all equal). The values found are not reported. Burmeister and Wall (1982) make V and W parameters of an optimization procedure, which involves searching over all possible values of B, V and W to find those values that minimize a function of the one-step-ahead forecasting errors. They do not report the values of V and W that were found. 11.7 ESTIMATING W As can be seen from the previous chapter, the problem of choosing W is the most difficult problem of Kalman filtering. We can get no clue from a pre-analysis using OLS, as the equivalent to OLS in Kalman filter terms is the assumption that W = 0. Nor can any consideration of the likely errors in the dependent variable help; that is all subsumed in V. One way of approaching the problem is via a consideration of the speed with which parameters are likely to drift. For example, if a parameter is likely to drift by 0.1 per year, then a sensible value for W is 0.01. Then, if the parameter is known precisely in year t, it is known with precision 0.01 in year t+1; the standard error is therefore 0.1. The work done in this paper seems to indicate that W should be between 10-3 and 10-5. Values of W smaller than 10-6 mean that information is scarcely degraded at all with time, whereas W's much greater that 10-3 mean that the historic data become irrelevant very quickly. Anyone who has tried to estimate an econometric model with only a few years observations will appreciate that large W's give very fuzzy models. 11.8 NON-LINEAR MODELS The generalization of the Kalman filter to non-linear systems is quite complicated, and I would refer the reader to Jazwinski (1970) for a rigorous treatment. The estimation process has to iterate many time within each recursion, and the elegant simplicity of the Kalman filter is lost. For models that are not too severely non-linear, however, I would suggest (without proof) the following. Observation model: Yt = f(Xt , Bt) + vt where vt is N(0,Vt) eq. 11.8.1 Parameter model: Bt = H . Bt-1 + wt where wt is N(0,Wt) eq. 11.8.2 Then Bt is N(MEt,CEt), where MEt and CEt are given by : Forecast parameters : MF t = H . ME t-1 eq. 11.8.3 CF t = H . CE t-1 . HT + Wt eq. 11.8.4 Forecast dependent variable : YF t = f(Xt , MF t) eq. 11.8.5 Var(YF t) = F' . CF t . F'T + Vt eq. 11.8.6 Kalman Gain : Gt = CF t . F'T . (Var(YF t))-1 eq. 11.8.7 Update parameter estimates : ME t = MF t + Gt . (Yt - YF t) eq. 11.8.8 CE t = CF t - Gt . F' . CF t eq. 11.8.9 Where F' is the matrix of partial first derivatives of f with respect to the parameter vector Bt. 11.9 NON-NORMAL ERRORS Part 2.3.1 shows how prior information can be combined with data when the distributions are Binomial and Beta. It might be possible to develop recursive Bayesian estimation or Kalman filtering for these or other conjugate distributions. It also should be possible to treat any probability distribution by using numerical integration instead of the algebraic method of the Kalman filter. This might be rather slow by comparison, but might not be so slow as to consume excessive amounts of computer time. 11.10 BACKWARD KALMAN FILTERING There is no overwhelming reason why data must be analysed in the order in which it happened. The Kalman filter can be run in reverse quite happily, which might be a good idea if the values of the parameters in the past are of greater interest than the values in the present. This would be particularly appropriate in the study of economic history. 11.11 DECISION MAKING One of the purposes of econometric analysis is to aid in decision making. The estimates of the parameters (and their standard errors) are reported to the decision-maker, who can then use this information in the decision-making process (see Solomon, 1980, p.74 for an example of how optimum promotion levels can be calculated using price elasticities). Another approach is for the decision-maker to specify a loss function (which need not be, and usually is not, symmetric). Then if the loss function depends in some way on the parameter estimates, minimization of expected loss can be done given a joint probability distribution of the uncertain parameters (such as is provided by the Kalman filter). The resulting decision is not necessarily the same as when the parameters are treated as being known with complete precision, especially when the loss function is asymmetric. This subject is covered in more detail in many works on decision analysis, such as Raiffa and Schlaifer, 1961. Decisions are made in order to affect the future, so clearly there is a need for information about the parameters that is as up-to-date as possible, rather that information about the parameters ten years ago. But if a model is estimated using OLS over a twenty year period, then the OLS averaging process gives us an average centered about ten years ago. The Kalman filter avoids this problem. By allowing the relevance of older data to be less (non-zero W), the parameter estimates are averages weighted towards the present. Decision makers often have prior information ("gut feelings") about the parameters of importance to their decisions. This prior information can be given to the Kalman filter very easily, if it is possible to quantify it into means and standard error of estimates. 11.12 INCORPORATION OF THE KALMAN FILTER INTO PACKAGES In order to make the Kalman filter easier to use, I would suggest that Kalman filtering be made accessible via the standard econometric packages, so that the ordinary econometrician (i.e., those who do not have substantial amounts of time to spend on understanding new techniques) and the business forecaster can use it. 11.13 REDEFINITION OF DSSE Throughout this paper, DSSE has been calculated for one-step-ahead forecasts. It would clearly be desirable to do further work with DSSE computed on n-step-ahead forecasts, for a more general n. Obviously DSSE's computed on n-step-ahead forecasts (DSSE-n) could not be compared with DSSE-m for different n and m, but I would suggest doing some work on DSSE-5 and on DSSE-20, to represent medium and long range forecasting. Another way of measuring forecasting error would be to compute the n-step-ahead forecast only once per time-series. This might make the properties of the metric more amenable to analysis.