'Why,' said the Dodo, 'the best way to explain it is to do it.' Alice's Adventures in Wonderland, Lewis Carroll 5. RECURSIVE BAYESIAN ESTIMATION, ONE DEPENDENT AND ONE EXPLANATORY VARIABLE; SOME RESULTS A model of wool consumption will be estimated using the method of chapter 4, and will be compared with a model estimated using ordinary least squares, and with a model estimated using weighted least squares. To compare the predictive power of these methods (recursive Bayesian estimation and ordinary least squares) it is necessary to develop a standard of comparison. This will be the forecasting error in one-step-ahead forecasts, and is explained in more detail in 5.1 below. 5.1 THE MEASUREMENT OF THE FIT OF A MODEL - DYNAMIC SUM OF SQUARED ERRORS It will be necessary to compare the explanatory power of models estimated using recursive Bayesian estimation with the explanatory power of models estimated using other methods. A commonly used method of gauging the explanatory power of models is to compare the ex post forecasts made by the model with the data; a model with a smaller sum of squared errors (SSE) is considered to be explaining the data better. This technique can be used for assessing the fit of models estimated using recursive Bayesian estimation. Forecasts one period ahead can be compared with the actual observation and the sum of squared errors cumulated. It is important to note, however, that at any given time, the one-period ahead forecast is based on only the data available before that time. The usual practice in calculating SSE after using OLS is to fit the model using all the data, calculate the model's forecasts over the time period, and hence SSE. This means that the data from periods T,T+1,T+2 etc. are used to calculate the forecast for T, and in practical, ex ante forecasting, these data are not available. Fildes and Howell (1977) stress the need for ex ante rather than ex post testing. A more realistic approach would be to calculate the forecast for period T by fitting the model to data for t=1,2,3,...T-1. The disadvantage of this approach for OLS is that it means almost as many regressions must be done as there are observations, a time consuming process. However, this should not be a problem, given the power of modern electronic computers, and so this will be the procedure adopted by this paper to compare OLS-estimated models with recursive Bayesian estimation-estimated models. To distinguish this procedure from the usual SSE procedure, it will be called dynamic sum of squared errors; DSSE. The models of this paper are estimated on annual data, and there are about 20 observations (19 for the wool model and 21 for the energy models). The DSSE's will be calculated using T = number of observations - 4, number of observations - 3, number of observations - 2, number of observations - 1. In other words, forecasts will be made for the last four years. 5.2 THE MODEL The model to be estimated is that of Solomon, 1980, p.59. Log(NDCt/TFCt) = A + B . log(PW/PSt-1) where NDCt is wool consumption in year t TFCt is total fibre consumption in year t PW/PSt is the price of wool deflated by the price of synthetic fibre in year t The data and data definitions are given in Appendix F. The estimates of the precision of NDC come from a consideration of the method for preparing the data, the quality of the sources, and from a consideration of the differences between estimates made for the same year using different assumptions. The calculations to produce the estimates of precision are given in chapter 10 of this paper. This model was chosen as it is a simple model, the data are readily available, and considerable work has been done on wool consumption models already (for example, Solomon, 1980). Obviously it would be possible to specify better models (and this will be done later in this paper), but at this point in this paper a very simple model is required. This model is not too badly mis-specified, as is evidenced by the following. Table 1 Coefficient t-value D.W. on PW/PS Correlation Belgium - .48 - 8 .90 1.5 France - .53 -12 .95 1.0 Germany - .29 -13 .95 2.2 Holland - .29 - 6 .82 1.0 Italy - .43 -10 .92 1.9 Japan - .38 - 5 .79 1.6 UK - .49 -10 .92 1.5 USA -1.05 -10 .93 1.0 The sign of the coefficient of the PW/PS variable is correct for all countries, and all the t-values are significant. The hypothesis of zero autocorrelation can be rejected with 95% confidence for France, Holland and the USA; it cannot be rejected for the other five countries. Similar models are sometimes used in technological substitution work. 5.3 THE MODEL ESTIMATIONS The model was estimated several times, as follows : OLS Using OLS rBe1 Using rBe, with Prec(BFt) = Prec(BFt-1) rBe2 Using rBe, but relaxing the assumption that Prec(BF t) = Prec(BF t-1) (a declining relevance for older data, the precision decaying as in 3.6.4) rBe3 As rBe2, but with the decay in the precision in B related to the size of the change in the explanatory variable. rBe4 As rBe1 (Wt = 0) but the precision of the data is a variable; 10% before 1970, 5% after 1970, and 12% for the last (preliminary) observation. rBe5 Precision of the data as rBe4, but with declining relevance for older data (W as for rBe2). The model was estimated for eight countries (Belgium, France, Germany, Holland, Italy, Japan, UK and USA). The decay in precision in runs rBe2, rBe3 and rBe5 was done by : Var(BFt) = var(BEt-1) + Wt In the rBe2 and rBe5 sets of runs, Wt was a constant, (.005 -.002 ) (-.002 .005 ) In the rBe3 set of runs, Wt was ( .01 0 ) ( 0 0.5*(logPW/PSt - log(PW/PSt-1))2 ) The choice of W in rBe2 to rBe5 requires some explanation. For rBe2, it was (rather arbitrarily) supposed that data 10 years old would be half as relevant as current data. If we assume that the relevance decays at a constant rate, then this rate must be 7% per year, as (0.5)0.1 = .933. Expressed as a fraction this is .07; squaring this we get .0049; rounded to .005. There is no point in trying to justify this procedure rigorously, as the initial supposition (10 years = 1/2 relevance) was quite arbitrary; it is very encouraging, however, that this arbitrary choice gives such a marked improvement in forecasting accuracy (as will be seen in 5.4.1 ). For rBe3, it was hypothesised that the change in wool consumption accompanying a large change in wool prices carries more information about the effect of wool prices than a small change in wool prices. Accordingly, when a larger change happens, it would be desirable to downgrade the precision of the estimates of price elasticity made on previous data, to give greater weight to the larger change in wool prices. The change is squared to constrain the weight to be positive, and to emphasize the importance of large changes. Again, this weight is rather arbitrary, and it is very encouraging that the model fit is so greatly improved (see 5.4.1). The rBe4 and rBe5 estimations were designed to discover whether the main improvements is DSSE under recursive Bayesian estimation come from the variable precision of the data (i.e. V a variable), or from the declining relevance of the data (i.e. W not equal to zero). In all of these runs, great prior uncertainty was used (a very large variance for the prior value of B, 400, was used with a prior mean value of 0.0). 5.4 THE RESULTS 5.4.1 THE MODEL FIT The fit, as measured by DSSE (see 5.1) is displayed below. Table 2 DYNAMIC SUM OF SQUARED ERRORS OLS rBe1 rBe2 rBe3 rBe4 rBe5 Belgium .011 .011 .010 .009 .011 .010 France .004 .004 .008 .006 .004 .008 Germany .005 .006 .007 .011 .008 .007 Holland .053 .053 .057 .065 .051 .057 Italy .019 .019 .026 .026 .019 .026 Japan .246 .246 .129 .169 .257 .130 UK .123 .123 .053 .059 .135 .053 USA .890 .890 .213 .260 .760 .217 ------------------------------------------ Average .169 .169 .063 .076 .155 .063 ------------------------------------------ Of the eight countries DSSE's, four are improved by non-zero W's and four are worsened. All of the improvement in the average DSSE comes from the three countries whose DSSE's were largest using OLS. The reason why non-zero W's are so beneficial when the model is forecasting particularly badly is simple; a bad forecast leads the procedure to make a larger change in the parameters if non-zero W's are being used. As the model is in logs, the residuals represent logs of multiplicative errors. So it is meaningful to combine the DSSE's of the eight countries in an average. The average DSSE is not changed by the move from OLS to rBe (as would be expected), but dramatically improved (63% lower) as soon as we admit the declining relevance of older data (rBe2) and reduced (but less so, 55% lower than OLS) when the relevance of data is linked to the size of changes in the explanatory variable (rBe3). The results from rBe4 and rBe5 show that the main benefit of the recursive Bayesian estimation procedure comes from allowing the precision attached to the estimates of the parameters to decay; the effect of variable data precision is quite small. For example, if we examine two runs for the USA: Table 3 RBE1 RBE2 PERIOD PARAMETER SQUARED PARAMETER SQUARED ERROR ERROR 12 -0.883 .095 -0.869 .062 13 -0.982 .145 -0.955 .042 14 -1.019 .013 -0.808 .019 15 -0.873 .244 -0.524 .190 16 -0.929 .146 -0.602 .168 17 -0.981 .284 -0.645 .042 18 -1.020 .260 -0.659 .002 19 -1.033 .200 -0.656 .001 The bad forecasts (large squared error) of periods 12 and 13 do not lead rBe1 to revise the parameter downwards by very much; rBe2 does revise its parameter estimate sharply. This means that in periods 17 to 19 it is producing far better forecasts than rBe1. There seems to be a structural change in the parameter round about periods 14 or 15, and this is captured by rBe2 but not by rBe1. It could also be asked whether a similar improvement in DSSE could be obtained by weighting the observations. To test this possibility, a further set of runs were made, using weighted least squares. The weights used were geometrically declining weights; weightt = (1 - r)T-t for t = 1,2...T. For example, for r = 0.9 the weights are 1.0 for the last observation, 0.9 for the previous observation, 0.81 before that, 0.729 before that and so on. This procedure is suggested by Gilchrist (1967) and by Rouhiainen (1978), who suggests r = 0.75 (using my notation). The DSSE is tabulated below for various values of r. Table 4 DSSE USING WEIGHTED LEAST SQUARES r = .05 .07 .10 .25 .50 .75 Belgium .011 .011 .010 .008 .010 .012 France .003 .002 .002 .003 .006 .009 Germany .006 .006 .006 .007 .006 .006 Holland .050 .050 .049 .048 .052 .071 Italy .019 .019 .019 .021 .024 .037 Japan .252 .253 .254 .225 .138 .116 UK .129 .131 .132 .117 .071 .039 USA .845 .828 .800 .592 .211 .009 ------------------------------------------ Average .164 .163 .159 .128 .065 .037 ------------------------------------------ As can be seen, weighted least squares does indeed produce an improvement in DSSE in Japan, UK and USA if r = 0.75, but the improvement is fairly sensitive to the choice of r. 5.4.2 THE PRICE ELASTICITY OLS estimates the price elasticity using the whole data set weighted equally. rBe1 does the same, but rBe2, rBe3, rBe4 and rBe5 give more recent data greater weight than older data. In addition, rBe3 gives data accompanying a large price change greater weight than data accompanying a small price change. This makes a considerable difference to the estimate of the price elasticity at the end of the time series. Table 5 PRICE ELASTICITY (B) OLS rBe1 rBe2 rBe3 rBe4 rBe5 Belgium -.480 -.485 -.189 -.253 -.438 -.196 France -.533 -.533 -.266 -.284 -.463 -.256 Germany -.290 -.287 -.236 -.223 -.282 -.223 Holland -.292 -.301 -.071 -.127 -.236 -.071 Italy -.425 -.423 -.289 -.290 -.389 -.294 Japan -.376 -.355 -.106 -.152 -.293 -.096 UK -.491 -.483 -.265 -.292 -.435 -.271 USA -1.051 -1.033 -.656 -.694 -.957 -.665 There is very little difference between the OLS and the rBe1 estimates (as expected); the main difference is between OLS/rBe1/rBe4 and rBe2/rBe3/rBe5. The elasticities estimated by rBe2 and rBe3 are all considerably lower (though rBe2 and rBe3 are not very far from each other ) than those estimated by OLS and rBe1. This argues that price elasticities are rather lower now than they used to be. This is confirmed by the elasticities estimated by weighted least squares. Table 6 WEIGHTED LEAST SQUARES PRICE ELASTICITY r = .05 .07 .10 .25 .50 .75 Belgium -.429 -.463 -.452 -.366 -.218 -.056 France -.513 -.503 -.489 -.404 -.377 -.594 Germany -.287 -.286 -.284 -.272 -.264 -.475 Holland -.262 -.249 -.229 -.141 -.135 -.163 Italy -.422 -.420 -.415 -.371 -.273 -.288 Japan -.368 -.363 -.354 -.225 +.207 -.224 UK -.481 -.476 -.467 -.387 -.118 -.021 USA -1.055 -1.050 -1.034 -.729 -.049 +.004 5.5 DISCUSSION OF RESULTS Allowing data relevance to be less for older data leads to an improvement in forecasting ability. This means that the parameters that we have assumed constant are in fact not constant. Although in the specified model constant elasticity is postulated, perhaps the elasticity is constant only locally, and that in the longer run (as, for example, improved synthetic fibres come on the market), the price elasticity is drifting. It could be argued that the elasticity should rise (as synthetic fibre companies introduce fibres which are more and more like, and so more and more substitutable for wool), or it could be argued that the elasticity is falling (end uses where wool can be substituted most easily have already fallen to synthetics (e.g. trousers) and what is left is not so readily substitutable). A priori, it is not known which of these hypotheses is more plausible, so the elasticity is allowed to drift over time with a model estimated using recursive Bayesian estimation. The results of this estimation are that the second hypothesis (declining elasticities) fits the data better. The effect on forecasting performance of variable data precision is very much smaller than the effect of decaying data relevance. We might expect that the main effect of variable data precision will be in the treatment of preliminary data; as there is only the one preliminary observation, this would obviously have less effect on ex post forecasts than decaying data relevance, as it is only one-off. However, the last observation is a very special observation, simply because it is the last, and so it is the jumping off point for ex ante forecasts. Thus we might expect that the main effect of the variable data precision would be on the forecasts about the future, via the lower precision of the last observation. Another interesting point is the improvement in DSSE when Wt is allowed to vary in proportion to the square of the size of the change in the independent variable. Thus, emphasising those observations that carry most information pays off in an improved forecasting performance for the three countries with the worst forecasting performance. This suggests that it would be worth emphasising the information-rich observations when estimating larger models. But observations may be rich with information about only some parameters, and the weighted least squares method would not be able to cope with this. Later on we shall see that the Kalman filter is able to give greater weight to certain observations, but with this greater weight affecting only some parameters. The improvement in forecasting error that results from using recursive Bayesian estimation is sufficiently encouraging to warrant further exploration of the recursive Bayesian estimation technique. Rather than explore the simple model presented above, the theory of recursive Bayesian estimation will be developed so as to permit the estimation of more complex models, with several dependent and several explanatory variables.