'Begin at the beginning," the King said gravely, 'and go on till you come to the end; then stop.' Alice's Adventures in Wonderland, Lewis Carroll 1. INTRODUCTION This paper describes a technique which is fairly new to econometrics which shall at first be called recursive Bayesian estimation (rBe) in this paper, and later called Kalman filtering, and its application to various model estimation problems and to forecasting. Five problems will be listed which lead to dissatisfaction with conventional econometric techniques (by which is meant estimation methods commonly available in econometric packages such as OLS (Ordinary Least Squares) and WLS (Weighted Least Squares)); possible ways of treating these problems with conventional methods will be examined. The literature survey will examine work done in this and related fields, and conclude that there is a serious gap in the literature; very little work has been done using recursive Bayesian estimation on causal models. The survey of the historic development and past applications of Kalman filtering and recursive Bayesian estimation will be deferred until chapter 7, as it is necessary to introduce several concepts before any discussion can be meaningful. Causal models can be estimated using recursive Bayesian estimation for one dependent and one independent variable (the analogue of simple regression). This will then be extended to one dependent and several independent variables (the analogue of multiple regression) and to several dependent and several independent variables (the analogue of a simultaneous equation estimation procedure). The rest of the paper will be devoted to applications. 1.1 THE PROBLEMS TO BE TREATED Conventional econometric techniques do not treat satisfactorily the five problems of estimation and forecasting listed below. There are refinements of conventional techniques applicable to these problems, but, as will be seen, these are awkward and not intuitively appealing and so little known and rarely used. They also have technical weaknesses and create new difficulties. l) Preliminary data; should they be used or ignored or treated differently from final data? 2) Imprecise data; most data are imprecise to some extent. Does this matter, and how should the problem be treated? What if some data are less precise than other data? 3) Incorporation of other information; how should information extraneous to the data be incorporated into the estimated model and the forecasts? 4) Irrelevance of old data; much econometric analysis assumes that data prior to a certain date are irrelevant while all data after that date have equal relevance (as is shown by their giving equal weight to all observations after a certain date, and using no observations before that date). 5) Residuals: in forecasting should the differences between the estimates and observations at the end of the time-series be ignored, or taken note of? (This is explained more fully in 1.1.5 below). 1.1.1 PRELIMINARY DATA The problem is as follows. Preliminary data are more up-to-date than final data, but are not as accurate as final data, by definition. Sometimes preliminary data are only slightly less accurate than final data; sometimes they are much less accurate. Five questions can be listed. (a) Should preliminary data be used when estimating the model? (b) How should the lower accuracy be taken account of by the estimation method? (c) Should preliminary data be used when preparing forecasts? (d) If the forecast is expected to be better than the preliminary data, should the preliminary data be ignored? (e) How should the preliminary data be used in forecasting? For example Ball (1978, p 17) demonstrates that the London Business School forecasts are better than the Government's published preliminary data for private fixed investment. Should they then ignore the data in preparing their forecasts? The conclusion reached by most economists is that they should use the most up-to-date data (Howrey, 1978). For example, Ball (1978) does the same, in spite of his demonstration that in at least one case his forecasts are better than the preliminary data. Preliminary data can be very imprecise; for example King (1982) compares the accuracy of preliminary data for quarterly US GNP with various updates (each figure is updated at least eight times); the preliminary growth estimates can have an error of up to 5 percentage points. 1.1.2 IMPRECISE DATA It is generally recognised (for example Johnston 1963, p 148) that most economic statistics contain errors of measurement, so that they are only approximations to the underlying "true" values. A further problem; what happens if some of the data are less precise than the rest? Preliminary data (see above) are a particular case of this, but not the only one. For example, older data may be less precise than newer data, owing to improved data collection methods. 1.1.3 INCORPORATION OF OTHER INFORMATION Normal econometric practice is to estimate parameters from a time series of data. Sometimes (but not often) this time series is supplemented by cross-sectional, longitudinal or experimental data. Very rarely will a researcher be willing to admit to incorporating information from subjective sources. In fact subjective information is used all the time, but rarely recognised as such. The researcher chooses a model on partly subjective grounds; the variables to include and those to leave out may be chosen a priori from economic theory. The researcher may decide to put some variables in per head terms (which is equivalent to a subjective estimate of the influence of population ). The functional form may be chosen subjectively; one of several possible income variables may be selected, usually on subjective grounds. Where a researcher has prior information (whether subjective, or from other work) on the parameters of the model he is estimating intuition argues that he should use it. Indeed, this may be the only way of dealing with problems of multicollinearity (Solomon, 1980). This prior information should, of course, be made explicit but to ignore any information is to throw away a valuable asset. The problem is how to incorporate this prior, and presumably uncertain, information into the estimation. 1.1.4 THE IRRELEVANCE OF OLD DATA The usual (implicit) assumption of econometrics is that all the data used in estimating the model are of equal relevance to the estimates of the parameters. This may not be the case. If what is required are estimates of parameters as up-to-date as possible (for example, for policy decision-making) then more recent data are perhaps more relevant than older data. The practice of using data that do not go back beyond a certain point in time is often a crude expression of this (although sometimes this may simply reflect the fact that older data may not be available). The relevance of the data should not, however, be either zero or one, as this practice would imply. If the latest data have unit relevance, then older data should have a lower relevance which is some function of how out-of-date they are. This would of course vary from model to model, but how should this problem be treated? As well as data being out-of-date, there may be good reasons why some data are less relevant than other data. Again, a crude expression of this is found in the common practice of econometricians to omit data for wartime years from the estimation process. Again however, it is surely an over-simplification to assume that the relevance of data is either zero or one. There could be major unmodelled disturbances which, although not as violent as wars, may nevertheless lead one to reduce the relevance of these years in the model estimation process. Gilchrist (1967) argues that the hypothesis that all the data should be given equal weight is by no means self-evident. 1.1.5 RESIDUALS How should residuals be used, if at all? The residuals at the end of the data series are of particular interest, as they should perhaps influence the forecasts. Young (1982) states that it is common practice to use non-zero disturbances for forecast periods. He argues that this practice can improve any econometric forecast, especially as the period between sample and forecast increases. The reason for this (according to Young) is that when the residuals exhibit systematic bias, there must be information in the residuals, which can be "mined". This systematic bias may have two causes in the case of linear models; these are: Sampling error or bias; especially bias from a specification error, but also because the sample of data may not be unbiased. The commonest specification error is perhaps the omitted variable. Measurement error; in particular the large errors associated with preliminary data. The ideal cure for specification error is to respecify the model correctly; this may not always be possible, as it may not be known which variable has been omitted (or the variable may be unobservable). Suppose a model (solid line) is fitted to the observations (short dashes) above. Should the forecasts be the dotted line, or should we take notice of the most recent residuals, and forecast the long-dashed line? Or should we use some compromise between the two, such as starting on the dashed line and gradually moving towards the dotted line? If we use the dotted line, then the first forecast is for an abrupt rise, which may be contrary to what most economists (and indeed our model) would forecast. If we use the dashed line, are we not taking undue notice of what may be a temporary aberration? 1.2 CONVENTIONAL SOLUTIONS TO THE ABOVE PROBLEMS 1.2.1 PRELIMINARY DATA In the specification and estimation of econometric models the distinction between preliminary and final data is usually ignored (according to Howrey, 1978) and this leads to unnecessarily large forecast errors. It is fairly obvious that more accurate preliminary data would lead to better forecasts; Cole, 1969b, concludes that the use of preliminary rather than revised data (for all variables) in estimating Zellner's quarterly consumption function can lead to a doubling of forecast errors (see Cole, 1969b, p81). 70% of this increase in error was caused by the direct effect of the data error; 30% was due to the indirect effect of data errors on parameter estimation. It is less obvious that a more efficient treatment of preliminary data would lead to an improvement in forecasting accuracy (albeit less spectacular). Howrey, 1978, has investigated the effect of using a Kalman filter approach (which is the approach to be taken by this paper). He finds that in the case of two fairly simple models, the naive (i.e. treating preliminary data in the same way as final data) use of preliminary data results in forecast error variances that are up to 47% and 67% greater for the two models, compared with the same models estimated using the Kalman filter. The models he considers are based on simple autoregressive processes. Thus the answers to the five questions raised in 1.1.1 above are as follows: a) Should preliminary data be used when estimating the model? b) How should the lower accuracy be taken into account by the estimation method? Howrey (1978) thinks that the preliminary data should be used and suggests a method which assumes that the final data are free from observation error and that the data revisions exhibit bias and serial correlation. It is not clear, moreover, from his paper how his procedure can be applied to the estimation of causal (as opposed to autoregressive) models. As his method is based on the Kalman filter, however, it is very similar to the method to be discussed later in this paper. But Howrey tries to model the relationship between preliminary and final data explicitly, as additional equations in the model, whereas the models of this paper treat preliminary data as unbiased, but less precise estimates than the final data, and so need no additional equations. A more conventional method for dealing with this problem is to treat it as a problem of heteroscedacticity. One of the ordinary least squares (OLS) assumptions is that the residuals have a common variance; this is violated by the preliminary data which have a higher variance. Maddala (1977, P. 260) says that in this case the OLS estimator is unbiased, but is less efficient (has higher variance) than a weighted least squares (WLS) estimator. If the variances of the residuals are st2 = s2Zt2 where the Zt are known (i.e. the variances are known up to a multiplicative constant), the problem can be overcome as follows. In the case of one independent variable Yt = a + bxt + et the equation is divided by Zt, and becomes Yt a bxt et -- = -- + --- + -- Zt Zt Zt Zt The residuals et/Zt are now homoscedastic, and the equation can be estimated by OLS (regressing Y/Z on 1/Z and X/Z with no constant term). This is rather a lot of trouble to go to for the sake of one observation (or perhaps a few), and a literature survey has not revealed anyone using this method to deal with the lower accuracy of preliminary data. c) Should preliminary data be used when preparing forecasts? Howrey (1978) gives an example which show that if preliminary data are ignored the forecast error variances can be ten times as great as they would be if the preliminary data were treated in even a naive way. Obviously the reduction in error variances will depend on the quality of the preliminary data, but it does seem clear that the preliminary data should be used. Common practice is to use preliminary data in forecasting, but to treat them in the same way as final data. d) If the forecast is expected to be better (i.e. more accurate) than the preliminary data, should the preliminary data be ignored? Even if the preliminary data are less accurate than the forecasts is expected to be, they are nevertheless useful information, and should be treated as such. Suppose the forecasting model suggests a growth of 10% for the year just ended, with an expected error of + or -5% (one sigma limits) and the preliminary data suggest a fall of 20% with an expected error of + or -10%. Any forecasting procedure which ignored these preliminary data would be running grave risks of flying in the face of the evidence. Clearly preliminary data should not be ignored completely even when their accuracy is quite low. e) How should the preliminary data be used in forecasting? The literature offers the Kalman filter approach (Howrey, 1978) and is the approach used in this paper. 1.2.2 IMPRECISE DATA It should be recognised that the vast majority of the data used by econometricians is to some extent imprecise. Population statistics are generated from, in most countries, ten-yearly censuses. Nine out of ten years are, therefore, estimates arrived at by slightly indirect means. GNP statistics are calculated in three ways, and these three methods are never completely in agreement. Government production statistics are compiled from a census of firms larger than a certain size (and how much effort do these firms put into getting their returns accurate?). My own experience is mainly with textile production and consumption data. Often market analysts feel that the figures that are estimated are within 10-20% of the true value. Often an upper limit on the accuracy of consumer survey data can be calculated from the size of the sample of consumers responding to questionnaires (using the well known pq/n formula). Perhaps textile data are rather more imprecise than most other data, but the problem is surely more common than the literature would suggest; the problem is hardly ever treated. It should be noted that the problem of imprecise data is not the errors-in-variables problem covered by Johnston (1972 P. 281); the errors-in-variables problem arises from imprecise explanatory variables. Stapleton and Young (1981) say that "the problem of errors in the dependent variable has not been studied previously". When the data are not just imprecise, but of variable precision, the problems become much more complicated. Variable precision data can arise as follows. Old data (for example, pre 1970) could be less accurate than newer data, because of improved data collection methods. Last year's data may be less accurate still, because they may be preliminary. There may even be an estimate for the current year, based on several months' data, and even less precise. Pierce (1980) has attempted to classify and measure sources of error in economic data; he lists five major sources of error: 1. Conceptual error; what we are actually measuring is not what we are trying to measure. For example, we would like to measure consumption, but our figures are production + imports - exports. 2. Transitory error; an evanescent fluctuation in a data series due to causes extraneous to our concept of the series. For example, consumption figures calculated as above might be in error because of a stock rundown in one month. 3. Sampling error; for example when we measure something by a sample survey. 4. Seasonal adjustment error. 5. Reporting error; for example clerical errors. Pierce (1980) also distinguishes between preliminary and final data; this classification of error cuts across the five kinds listed above. Pierce concludes (P. 7) that the major sources of error are the seasonal adjustment error, the transitory error and the sampling error. The National Institute for Economic and Social Research point out the problem of data revision (NIER, 1981), and also express surprise at the size of some of the data revisions. They do not suggest any way of dealing with this problem, nor do they suggest that any special action need be taken. 1.2.3 INCORPORATION OF OTHER INFORMATION Most econometric models are estimated on just time-series data. Some others (e.g. Balestra and Nerlove, 1966 and Tobin, 1950) use a combination of cross-section data and time-series data. Maddala (1977, p192) states that in most practical applications the data are not combined correctly. The unwarranted assumption that an elasticity (or other parameter) estimated from cross-sectional data can be used as an exact restriction on the time-series estimation is made. This is incorrect because the cross-section analysis does not produce an estimate with zero variance, and also because the estimate from cross-section data may not be wholly appropriate for a time-series model, as pointed out by Kuh and Meyer (1957) and Solomon (1980). There are other ways of combining information from other sources (such as cross-section analysis, longitudinal analysis, analysis of other related time-series, experimental data) with the information in the time-series. All of these methods agree that the two lots of information must be treated together, rather than separately. Maddala (1971) describes a maximum likelihood (ML) estimator. The least squares with dummy variables (LSDV) is also used quite commonly (Maddala, 1977). Another frequently used estimator is the Generalized Least Squares (GLS) estimator (used by Thiel and Goldberger, 1961, and called by them a mixed estimator). The question is examined in detail by Buse (1979) and by Dielman and Wright (1977), who list and examine 13 methods of combining extraneous information with time-series analysis. The work that has been done on combining time-series with cross-section data is also applicable to combining time-series with other kinds of information (the parameters estimated from related data are assumed to be related to the parameters of the model to be estimated). Less work has been done on the combination of time-series data with subjective estimates of parameters. Ridge regression points towards another way of combining prior information with time-series data. This is a rather unusual use of the ridge technique, suggested by Maddala (1977, p384). Some writers are opposed to the use of ridge (e.g. Campbell and Smith, 1975, Newhouse and Oman, 1971 and Conniffe and Stone, 1973) on the grounds that it introduces an uncontrolled amount of bias into the parameter estimates. The idea suggested by Maddala is to control the bias. The OLS estimate is : BE = (XTX)-1 . XTY equation 1.2.3.1 The generalized ridge estimator is : B = (XTX + kQ)-1 XTY equation 1.2.3.2 where Q is a positive semidefinite matrix. Maddala (1977, p384) shows that if the prior distribution of B is normal(d,tD) then the posterior distribution of B is normal with mean (XTX + k . D-1 )-1 . ( XTY + k D-1d) equation 1.2.3.3 and variance s2(XTX + kD-1)-1, where k = s2/t2 equation 1.2.3.4 Thus the general ridge estimator is assuming a prior distribution with zero mean (if d = 0 equation 1.2.3.3 reduces to equation 1.2.3.2 with D-1 = Q). This may be appropriate in some cases, but when the prior information is that D is not equal to 0 it would be more appropriate to use equations 1.2.3.3 and 1.2.3.4 for the posterior mean and variance of B. Thus there are very many ways of combining information from other sources with time-series data to produce parameter estimates. The method developed later in this paper has the advantage of being more intuitively appealing and easy to understand than those discussed above. 1.2.4 THE IRRELEVANCE OF OLD DATA One possible way of dealing with the differing relevances of the data would be to use weighted least squares, and by choosing weights that are proportional to the relevance of the data. This could introduce heteroscedasticity (weighted least squares is often recommended as a solution to the heteroscedasticity problem, and if applied to homoscedastic data will cause heteroscedasticity). Furthermore weighted least squares has already been suggested (1.2.1) as a possible answer to the problem of preliminary data, and the weights needed to solve this problem may conflict with the weights needed to solve the data relevance problem. When we say that old (for example) data are less relevant to parameter estimation than new data, what we are implying is that we suspect that there may be some change in the parameters over the period covered by the time-series. Clearly the optimum strategy would be to change the form of the model to eliminate this drift. This may, however, be impossible. We may, for example, suspect that an income elasticity is falling, and over 20 years may have fallen from 1.0 to 0.9. If our data series is such that the standard deviation of the parameter estimate is 0.3, this change in elasticity can be regarded as small and impossible to verify, as it is well within the error margins of the model. An attempt to estimate elasticities for the first 10 years and the second 10 years separately, is unlikely to succeed in revealing a statistically significant difference in the elasticity. We would not want to drop the first 10 years' data from our estimation, as this would be throwing away relevant information. Intuitively, however, it would seem right to give the estimates from the older data a bit less weight than the newer data, if our main interest is the current elasticity, as is the case when forecasts or policy decisions have to be made. One could estimate an elasticity from the older data, increase the variance attached to this estimate, and then use this degraded elasticity estimate as prior information to be added to the analysis of the newer data, using one of the methods described in 1.2.3 above. This is a bit clumsy and complicated. If it is felt that some individual data points should be given less relevance (see 1.1.4 above), this would lead to a very cumbersome multi-stage procedure. Another problem would be, how to segment the data? Rouhaianen (1978) uses a method called discounted least squares, suggested by Tornqvist (1957) and discussed in more detail by Gilchrist (1967). In discounted least squares, the estimator BE is given by BE = (XTWX)-1XTWY where W is a diagonal matrix of weights w1, w2, w3,.....wt.;wi = pt-i. Thus, the weights decline exponentially as we move towards the past. Rouhiainen (1978) found that his forecasting error was greatly reduced (from 10.7% to 0.1% in one case, from 8.8% to 3.4% in another) by using p = 0.25 compared with p = 1 (which is ordinary least squares). One problem with this method is that the discounting is at the same rate for all the parameters of the model. There may, however, be problems where (for example) the income elasticity should use discounted weights, but not the price elasticity, as we may believe that older data are less relevant than newer data for income effects, but that older data are as relevant as newer data for price effects. In this case, we would want to discount for the income effect, but not for the price effect, and the procedure described by Rouhiainen (1978) would not cope. Another way to cope with the data relevancy problem is to use a model whose parameters change with time. Cooley and Prescott (1973b) describe a model in which the intercept is allowed to vary with time (but not the coefficients of the variables) and this model is tested in Cooley and Prescott (1973a). They found that the mean squared forecasting error is dramatically better than for a conventional fixed intercept model when the intercept is in fact changing, and only slightly worse when the intercept is in fact stationary. A more general approach is to allow all the parameters to vary over time. Rubin (1950) suggested this; this work has more recently been developed by Cooley and Prescott (in Berg, 1973 and in Cooley and Prescott, 1976) and by others (Berg,1973). The models can be divided into : 1. Random coefficient models. Coefficients are assumed to vary stochastically; this could be with a fixed mean or it could be with a mean that drifts (i.e. is autocorrelated). 2. Systematically changing coefficients. Here the coefficients change in some structured way (for example, by a Markov process, or as a function of some variables). 3. Coefficients that change systematically and stochastically. Swamy (1971) gives an exposition of random coefficient models, but is mainly concerned with analysis of a time-series of cross-sections. Singh et al. (1976) estimate a model in which the coefficients change systematically with calender time using three estimation methods; they find that such a model is often justified by the data. Tsurumi and Tsurumi (1980) use a varying parameter model to show that as colour TV's moved to the saturation stage of the product life cycle, the price elasticity increased substantially. Raj and Ullah (1981) gives a thorough treatment of the subject of coefficients that vary systematically and/or stochastically. They list five main causes of this variation. These are: Omitted variables Use of proxy variables Incorrect functional form Policy variables Aggregation over time or decision units They show that OLS is not the best linear unbiased estimator when coefficients vary stochastically; GLS (Generalised Least Squares) should be used. They also show that Bayesian estimation gives the same result as GLS, when diffuse priors are used. Raj and Ullah (1981) contains several empirical applications of their methods. - in a model linking changes in money wages to unemployment, changes in labour productivity and changes in money supply, they conclude that better forecasts are generated with stochastically varying coefficients than with fixed coefficients. - in a model of the consumption function for several countries, the coefficients are reported to have undergone a systematic shift. - in a model of the demand for money in Canada, some specifications of the model yield stable coefficients, some specifications yield shifting coefficients. One omission in Raj and Ullah's survey of varying-coefficient models is the lack of any treatment of Kalman filtering. 1.2.5. RESIDUALS The most common practice is to ignore the problem described in 1.1.5 and summarized below. Suppose a model (solid line) is fitted to the observations (short dashed line) (see figure 1 on page 18). Should the forecasts from this model be the dotted line, or should the most recent residuals be taken note of, and the forecast be the dashed line? Or should we use some compromise between the two, such as starting on the dashed line, and gradually moving towards the dotted line? The usual practice is to take no especial notice of the most recent residuals; the forecast is the dotted line (severe cases of autocorrelated residuals may be picked up by the Durbin-Watson test). But the fact that the latest residuals are above the model estimates may mean that there has been some kind of structural shift, which would lead us to forecast the dashed line. Some practitioners do this, either explicitly or implicitly. One practice is to use the model to forecast the percentage changes, and then apply these changes to the last observation; this is equivalent to forecasting along the dashed line. Other practitioners strike a compromise between these two positions; the first forecast is along the dashed line, and subsequent forecasts get closer to the dotted line. The National Institute for Economic and Social Research (NIESR) recognise the problem of residuals (NIER, 1981). They offer two suggestions for dealing with the problem : - a "mechanical" method, such as by averaging the last 16 residuals and adjusting the constant term in the equation accordingly. - "judgementally", by deciding what the cause of the residuals is, and then attributing a particular path to the effect of that cause. NIESR found that "judgemental" treatment of residuals produced better forecasts than the "mechanical" system described above, but point out that a different mechanical system might have produced better forecasts than the judgemental system. Muller (1982) uses an ARIMA process to deal with this problem. She finds that forecasts can be improved significantly when she includes mechanical adjustments based on single equation residuals of previous periods. It would seem, intuitively, that what should be done should depend partly on how accurate the recent data are, and partly on how relevant the recent data are to the future. 1.3 THE FIVE PROBLEMS TOGETHER The five problems listed above will often occur together, in the course of estimating one model. When they do come together, and especially if they do so for a simultaneous equation model, it becomes particularly difficult to treat them. The methods described below, however, provide a natural and simple way of dealing with them all, together with some other common model estimation problems. 1.4 COPING WITH UNCERTAINTY The discussion in this section is not meant to be a rigorous treatment (this is done later in this paper) but rather an explanation of why a solution to the five problems was sought in recursive Bayesian estimation. The five problems described above all have one thing in common; they are problems of uncertainty. The preliminary data problem is concerned with the fact that the preliminary data are less certain (i.e. less precise) than the revised data. The problems of imprecise data and data of variable precision are concerned with uncertainty in the data. The problem of incorporation of other information is one of uncertainty; it seems intuitively (and this will be proved later) that the different pieces of information should be weighted according to how certain they are. The problem of the irrelevance of old data has to do with the uncertainty of parameter estimates. the older data indicate one set of parameter estimates, and the newer data indicate another set. It seems intuitively that how the two should be combined depends on how uncertain the two sets of estimates are, and on how much uncertainty should be added to the old estimates on account of their age, and consequent out-of-dateness. This will be demonstrated more rigorously later in this paper. The problem of residuals is one of uncertainty attached to the most recent data together with the data relevance problem. Since all the five problems are connected with uncertainty, a key result will be given here without proof; the proof can be found in Lindley, 1965b. Suppose we have an uncertain estimate of something, mA . This could be an estimate of a parameter, or it could be a datum, or it could be a forecast. Suppose now we get another estimate of the same thing, mB. What should now be our best estimate, mC ? This depends on the uncertainty attached to mA and mB. Suppose the probability distributions attached to the two estimates are Normal; N(mA ,sA2) and N(mB ,sB2). Then the probability attached to the combined estimate is also Normal, N(mC ,sC2), where : mA/sA2 + mB/sB2 mC = --------------------- 1/sA2 + 1/sB2 1/sC2 = 1/sA2 + 1/sB2 This result is given by Lindley, 1965b, p2. Thus the combined estimate is the weighted arithmetic mean of the two old estimates, the weights being the reciprocals of the variances. Lindley (1965b,p8) calls the reciprocal of the variance "precision", and this nomenclature will be used in this paper. This result is central to this entire paper, which is why it is introduced so early. It shows how to combine two pieces of imprecise information to yield information of greater precision. 1.5 THE IMPORTANCE OF CAUSAL MODELS This paper concentrates on causal models and does not attempt to treat non-causal (also called time-series) models at all. This is because causal models are very important to econometrics. Econometric models are constructed for two main purposes. These are : a) Parameter estimation for policy decision-making or for deciding between alternative economic models, or simply to try to understand more about how the world works. b) Forecasting. For the first purpose, causal models are clearly necessary (by a causal model this paper means a model which attempts to capture cause-and-effect, a model in which changes in some variables are assumed to cause changes in others). For the second purpose, causal or non-causal models can be used. Armstrong (1978, p372) says that causal methods have advantages over non-causal (which Armstrong calls naive) methods for long range forecasting. For short range forecasting, Armstrong says that there is no difference between causal and non-causal methods. Thus, causal methods are better at long range forecasting, no worse than non-causal models at short range forecasting, and are able to provide information about the world (via the parameter estimates) that non-causal models cannot provide. The models that will be estimated later in this paper are causal models. 1.6 A GUIDE TO THIS THESIS After thinking for many years that there was no good answer to all the five problems described earlier, I realized that all these problems were connected with uncertainty (as explained in 1.4). This led me to the application of Bayes theorem, and to the development of the techniques described in chapters 3 and 4, which I named recursive Bayesian estimation; chapter 5 shows how recursive Bayesian estimation (rBe) can be applied. At that point, I found that the Kalman filter had already been invented, and that it looked very similar to my own rBe. Chapter 6 demonstrates this similarity, and looks at the Kalman filter from an econometrician's point of view. Although I found that much of my mathematical work covered ground already explored by other researchers, there had been very little application to econometric problems. Many of the papers (summarized in chapter 7) on past applications of the Kalman filter were published after the first draft of this paper was finished (e.g. Harvey, 1983). Chapter 8 presents a Monte Carlo simulation of the Kalman filter against OLS. Chapters 9 and 10 are models of energy and wool respectively; the wool demand model uses multiple dependent variables. Chapter 11 is a non-rigorous series of suggestions for future work, and chapter 12 presents the summary and conclusions. The appendices include the data used, and (more usefully) the FORTRAN programs that implement the Kalman filter.