'I'm afraid I ca'n't put it more clearly,' Alice replied very politely. Alice's Adventures in Wonderland, Lewis Carroll. 4. RECURSIVE BAYESIAN ESTIMATION; ONE DEPENDENT AND SEVERAL EXPLANATORY VARIABLES 4.1 THE NEED FOR MORE VARIABLES The estimation procedure developed in section 3 above suffers from a major drawback; only one parameter can be estimated. This may not be satisfactory for some applications, although an estimate of the constant in the equation Y = A + B.X could be made by estimating B from the differenced equation, and then estimating AE by : AE = E(Y) - BE . E(X) as in ordinary least squares. There is however a more pressing need for more variables than just the need for an estimate of the constant. Very few econometric models use only one explanatory variable; multiple regression is far more common than simple regression. Below we shall derive the formulae for recursive Bayesian estimation with one dependent but several explanatory variables. 4.2 THE MATRIX INVERSION LEMMA First we need an important result called the matrix inversion lemma. For any matrices A, B, C, of appropriate dimensions, and where B-1 and C-1 can exist, B-1 - B-1A(ATB-1A + C-1)-1ATB-1 = (B + ACAT)-1 eq. 4.2.1 This result is given by Bodewig (1956), p.189, and an elegant proof derived from a consideration of probability distributions is given by Lindley and Smith, 1972, p5. 4.3 THE RECURSIVE ESTIMATION PROCEDURE See appendix B for notation. Suppose a (scalar) variable Zt is related to a k-dimensional vector Xt by: Zt = Xt B ,t=1,2....n-1 Suppose that Zt cannot be observed directly; what can be observed is Yt which is an unbiased estimator for Zt but is subject to noise vt Yt = Zt + vt So Yt = Xt Bt + vt We shall write this as Y = XB + v equation 4.3.1 where Y is an (n-1) by 1 vector of observations X is an (n-1) by k matrix of n-1 observations of each of k explanatory variables B is a k by 1 vector of parameters to be estimated v is an (n-1) by 1 vector of residuals t = 1,2.....n-1 The following four assumptions about the vt will be made 1. E(v) = 0 2. Var(v) = Is2 3. v is independent of X; E(XTv) = 0 4. The explanatory variables are linearly independent; (XTX)-1 exists. Let BE be the estimate of B that minimizes the sum of squares of the errors. Using the usual result for OLS estimates, the estimate of B that minimizes the squared residuals, BE is given by BE = (XTX)-1XTY equation 4.3.2 The variance of BE is given by Var(BE) = (XTX)-1 XT E(vvT) X (XTX)-1 = s2 (XTX)-1, as E(vvT) = Is2 And so Prec(BE) = (XTX)/s2 equation 4.3.3 Suppose that an additional observation of X and Y becomes available, XO and YO, increasing the total number of observations to n. Define Sn-1 as (XTX)-1, calculated from n-1 observations Sn as (XTX)-1, calculated from n observations Tn-1 as (XTY)-1, calculated from n-1 observations Tn as (XTY)-1, calculated from n observations BEn-1 as B, calculated from n-1 observations BEn as B, calculated from n observations Then BEn-1 = Sn-1 . Tn-1 eq. 4.3.4 BEn = Sn . Tn eq. 4.3.5 Tn = Tn-1 + XOT . XO eq. 4.3.6 Sn-1 = Sn-1-1 + XOT . XO eq. 4.3.7 Taking the matrix inversion lemma quoted in 4.2 above, and replacing B with Sn-1-1 (B-1 with Sn-1), A with XOT (AT with XO) and C with 1 we get : (Sn-1-1 + XOTXO)-1 = Sn-1 - SnXOT(XOSn-1XOT + 1)-1XOSn-1 equation 4.3.8 = Sn And so BEn = Sn . Tn = [Sn-1 - Sn-1XOT(XOSn-1XOT + 1)-1XOSn-1](Tn-1 + XOTYO) = Sn-1Tn-1 + Sn-1XOTYO - Sn-1XOT(XOSn-1XOT +1)-1XOSn-1 (Tn-1 + XOTYO) Define Gn by : Gn = Sn-1XOT(XOSn-1XOT + 1)-1 eq. 4.3.9 = (XTX)-1XOT([XO(XTX)-1]XOT + 1)-1 Gn is called the gain (Young, 1974). It can be interpreted as a smoothing factor. So BEn = BEn-1 + Gn[(XOSn-1XOT + 1)YO - XOSn-1Tn-1 - XO Sn-1XOTYO] BE n = BE n-1 - Gn(XOBE n-1 - YO) eq. 4.3.10 The precision of BEn, Prec(BEn), is defined as Sn/s2 Prec(BEn) = (Sn-1-1 + XOTXO)/s2 = Prec(BEn-1) + XOTXO/s2 eq 4.3.11 This gives a recursive procedure for updating BE and Prec(BE) when a new observation is added to the time series. This result is given by Harvey (1981) (and called by him Recursive Least Squares) without the intermediate definition of G (which is brought out here to show the relationship with the Kalman Filter later. The result is reproduced here so that the relationship between this result and the results of chapter 3 can be seen, and also the relationship between this result and the Kalman Filter. More importantly, we can relax the assumptions about the parameters (see 4.5). 4.4 EQUIVALENCE OF THIS RESULT WITH RECURSIVE BAYESIAN ESTIMATION FOR ONE INDEPENDENT VARIABLE Equations 4.3.10 and 4.3.11 can be compared with equations 3.2.7 and 3.2.8. Equation 3.2.7 can be written (writing n for t, and using BFt = BEt-1, eq. 3.2.1) Prec(BEn-1) .BEn-1 + Prec(BOn) . BOn BEn = ------------------------------------- Prec(BEn-1) + Prec(BOn) BEn = (Prec(BEn-1) .BEn-1 + BEn-1.Prec(BOn) - BEn-1 . Prec(BOn) + Prec(BOn) . BOn)/(Prec(BEn-1) + Prec(BOn)) Prec(BOn).(BEn-1 - BOn) BEn = BEn-1 - ------------------------ Prec(BOn) + Prec(BEn-1) Prec(BOn).(XOn.BEn-1 - XOn.BOn) BEn = BEn-1 - ------------------------------- XOn.(Prec(BOn) + Prec(BEn-1)) But XOn.BOn = YOn Prec(BOn).(XOn.BEn-1 - YOn) So BEn = BEn-1 - ---------------------------- XOn.(Prec(BOn) + Prec(BEn-1)) which has the same form as equation 4.3.10, but with Prec(BOn) Gn = ---------------------- XOn.(Prec(BOn) + Prec(BEn-1)) But Prec(BOn) = XOn.XOn . Prec(YOn) So Prec(BOn) ---------------------- = XO n.(Prec(BO n)+Prec(BE n-1)) Prec(YOn)XOn ---------------------- XO n.Prec(YO n)XO n+Prec(BE n-1) = Var(BEn-1)Prec(YOn)XOn[XOnVar(BEn-1)XOnPrec(YOn)+1]-1 But Gn-1 = Sn-1XOn(XOnSn-1XOTn + 1)-1 And so the equivalence between 4.3.10 and 3.2.7 is complete, with Sn-1 in equation 4.3.11 for Var(BEn-1)Prec(YOn) in equation 3.2.7. So Prec(BEn-1) = Sn-1-1/s2 Sn-1 = Var(BEn-1)/s2 and we can also see the equivalence of Var(BE n-1)/s 2 for Var(BE n-1) . Prec(YO) Thus we have a procedure for recursive estimation for one dependent and several explanatory variables which reduces to, for one explanatory variable, the Bayesian procedure of 3.2 above. We shall call this procedure recursive Bayesian estimation, and later in this paper show its relationship to Kalman Filtering. 4.5 RELAXATION OF ASSUMPTIONS ABOUT PARAMETERS Again, the assumption that B is constant can be relaxed, as can the assumption that the data all have equal weight. This is done in a very similar way to the way that this was done in 3.7.1 and 3.6.4. First, the estimation procedure derived above will be summarized. Suppose we want to estimate B in the model Y = X . B + v Suppose that after n-1 observations, we have an estimate for B, BEn-1; Suppose that the nth observation is XOn, YOn. Then BE n = BE n-1 - Gn(XO nBE n-1 -YO n) eq. 4.3.10 where Gn = Sn-1 XOT n (XO n.S n-1.XOT n + 1)-1 eq. 4.3.9 Prec(BE n) = Prec(BE n-1) + XOT n.XO n/s2 eq. 4.3.11 Suppose YO n = BE n-1 . XO n Then XO n . BE n-1 - YO n = 0, and so BE n = BE n-1 Thus if the forecast of Y is equal to the observation of Y, BE is unchanged. BEn-1 must be the forecast of Bn, BFn. Thus the assumption of constant B is equivalent to BFn = BEn-1 If this assumption is relaxed, and instead we assume BFn = H . BEn-1, where H is a known matrix, then 4.3.9 is unaffected, as B does not appear in it. 4.3.10 becomes BEn = H . BEn-1 - Gn(XOnHBEn-1 -YOn) eq. 4.5.1 and 4.3.11 becomes Prec(BEn) = Prec(H.BEn-1) + XOTnXOn/s2 eq. 4.5.2 Let us relax the assumption about how B varies even more, and suppose that it also moves stochastically : BFn = H . BEn-1 + w, where w is N(0,W) 4.3.9 is still unaffected, and 4.5.1 is unaffected (as w has mean zero) but 4.3.11 becomes rather complicated, and the derivation of its equivalent will have to wait until 6.1, when an alternative way of expressing 4.3.11 will emerge which lends itself to this calculation.