Jump to ContentJump to Main Navigation
Causality in the Sciences$

Phyllis McKay Illari, Federica Russo, and Jon Williamson

Print publication date: 2011

Print ISBN-13: 9780199574131

Published to Oxford Scholarship Online: September 2011

DOI: 10.1093/acprof:oso/9780199574131.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2020. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use.  Subscriber: null; date: 27 September 2020

The error term and its interpretation in structural models in econometrics

The error term and its interpretation in structural models in econometrics

(p.361) 17 The error term and its interpretation in structural models in econometrics
Causality in the Sciences

Damien Fennell

Oxford University Press

Abstract and Keywords

This chapter explores what the error term represents in structural models in econometrics and the assumptions about the error terms that are used for successful statistical and causal inference. The error term is of particular interest because it acts as a coverall term for parts of the system that are not fully known about and not explicitly modelled. The chapter attempts to bring some of the key assumptions imposed on the error term for different purposes (statistical and causal inference) and to ask to what extent the conditions imposed on the error term can be empirically tested in some way.

Keywords:   error term, econometrics, structural models, causal inference


This chapter explores what the error term represents in structural models in econometrics and the assumptions about the error terms that are used for successful statistical and causal inference. The error term is of particular interest because it acts as a coverall term for parts of the system that are not fully known about and not explicitly modelled. The chapter attempts to bring some of the key assumptions imposed on the error term for different purposes (statistical and causal inference) and to ask to what extent the conditions imposed on the error term can be empirically tested in some way.

17.1 Introduction

Structural econometrics attempts the extremely difficult task of making causal inferences from non‐experimental data. Its core approach, which in its modern form dates from the ground‐breaking paper of Trygve Haavelmo (1944), is to postulate a statistical model that carries structural (or causal) content. The model may be postulated from theory, from observations and from other background knowledge. One then uses sample data to test its observable implications (to check it is adequacy) and to infer remaining unknown features (for example, to estimate parameters if the model is parametric).1

In a highly general form, we can denote a structural model in the following way. Denote the variables of interest to the econometrician as a vector of random variables Z, some of whose components (though not necessarily all) are observable. The probabilistic part of model postulates conditions on the joint probability distribution of Z. Structural content can be introduced in several ways. For example, some can be introduced with a partition of Zinto exogenousvariables, X, and endogenousvariables, Y. Though not all concepts (p.362)

Table 17.1 A general characterization of a structural model

Probabilistic assumptions

Z ~ D(Z) where the joint distribution D(Z) is assumed to have certain properties (e.g. independence among certain variables, a certain distributional form, etc.)

Structural assumptions

Z = (Y X), where Y is a vector of endogenous variables, X is a vector of exogenous random variables (where the exogeneity concept has structural not just probabilistic content) Structural content assigned to probabilistic relations (e.g. conditional independencies). G(Z, U) = 0, i.e. a set of functional relations (with structural interpretation) that hold among the components of Z, where U denotes an unobserveable vector of ‘error’ terms.

of exogeneity are structural,2 in structural models the exogeneity assumption often assumes (in some form) that endogenous variables are causally determined by other variables in the model while the exogenous variables are not. Further structural content can be introduced by interpreting conditional probabilistic independencies among the variables as indicative of causal relations3or assumed functional relations among the variables, where these functional relations carry some structural interpretation (as would typically be the case when the functional relations are derived from economic theory). Importantly, functional relations are also a way that error terms are introduced to models, since it is highly unlikely that our knowledge of functional relations will be so powerful as to permit us to claim that exact deterministic relations hold among (independently defined) random variables. Thus the functional relations will typically explicitly represent some omitted content using a vector of error terms, U, to sustain the deterministic relation postulated among the other random variables. In summary, the general structural model can be presented as shown in Table 17.1.

There is a lot of work in econometrics considering the problems of statistical inference — and to a lesser extent causal inference — for general models specified in a way similar to that above. For just two examples, see Hendry (1995) and Spanos (1999). As much of this work is highly technical, I believe there is space for a work that attempts to keep things simple, yet which nevertheless gives a taste of some of the difficult issues facing econo‐ metricians in causal inference. In this paper, therefore, I try to raise some of the important issues by looking at the error terms in simple, textbook models. Specifically, the chapter looks at the simplest linear models with errors‐in‐the‐ equations and, in keeping with their continued use in econometrics, it looks at simultaneous equations models. Of course, most structural equations models (p.363) in econometrics are more complicated than this, and as a result it could be argued that what follows is of little relevance. This I think is too uncharitable, as any structural model which postulates deterministic functional relations among observable random variables will include error terms to represent omitted content. Structural equation models of this sort are widely used in econometrics and the philosophical and methodological issues raised here are relevant mutatis mutandisto these more complex models.

Finally, it should be emphasized that what follows is not intended to present a way of doing econometric modelling. I am not claiming — though I believe that claims along these lines can be reasonably made — that looking at the error term is the right or best way to build or select an econometric model. To do this would require a more general and technical approach. Instead, the aim of this paper is to be expository, to highlight the kind of issues one ought to be aware of when doing econometrics or when trying to understand some of the philosophical challenges to doing structural modelling in econometrics. It does this by considering what the error term represents in very simple structural models in econometrics and explores some of assumptions about the error terms that are used for successful statistical and causal inference. Ultimately, this is important for understanding the scope of econometric models. The error term is also of particular interest because it acts as a cover‐all term for parts of the system that are not fully known about and not explicitly modelled. Therefore, there is always a danger that the error term hides important information which should have been modelled, and which may render the model inadequate in certain ways. That said, as the general model above shows, the error term is merely one part of a more general model, and tests on the error term are ultimately tests of the general model proposed. Thus, the more important point, emphasised by Spanos and others, is that one should test the general model assumed against observation.

The structure of the chapter is as follows. The first part presents a simple econometric model, like that found in introductory textbooks in econometrics, and sets out some of the assumptions imposed on the error term for successful statistical inference, in particular, those well‐known conditions required for the ordinary least squares (OLS) method of estimation to yield ‘good’ (consistent, unbiased, etc.) estimates. These assumptions are well‐known and widely discussed. The chapter then asks which of these assumptions can be tested from observations on the residuals4 for the estimated model. Given the central role of the identification problem for simultaneous models in econometrics, the second part of the chapter investigates what conditions may be imposed on the error term in order to have an identifiable model. (p.364) Again, as in the previous section, the chapter considers to what extent these conditions imposed on the error term can be tested. The third part of the chapter briefly presents a causal interpretation of the simultaneous equations model based on Herbert Simon (1953). Under this reading the error term is seen to denote the net impact of causal factors not explicitly modelled in a mechanism. In this section, the chapter also presents a restriction on error terms which is necessary for causal inference. To finish, the chapter concludes with an overview of conditions imposed on error term in (simple) econometric structural models. It notes that one assumption in particular, the orthogonality assumption, that the error terms are uncorrelated with the explanatory variables in a model, plays an important role for statistical and causal inference and for securing identifiability.

17.2 The role of the error term in estimation

To provide a concrete focus, consider the following simple ‘textbook’ simultaneous equations supply and demand model. This is a simultaneous equation model and as such represents the equilibrium relations between price and quantity as determined by underlying dynamic supply and demand mechanisms. For the purposes of this discussion, I assume that the equations have been chosen by appeal to theory and knowledge of the market being modelled.5

q = α p + β i + u 1
q = γ p + δ c + u 2 .

In this model, qdenotes the equilibrium quantity of a good transacted, pdenotes the equilibrium price for the good, idenotes consumer income, cdenotes production costs, u 1 and u 2are the error terms that denote factors not explicitly modelled in the supply and demand equations. In this simple model, assume that q, p, iand care observable. The error terms denote omitted factors; the error terms are unobservable. In this model, qand pare determined in terms of i, cand the error terms. The population parameters α, β, γ and δ are unknown. The functional form is assumed to hold and i, cand the error terms are assumed to follow a particular joint probability distribution.

To relate this model to the general structural model above, in this model we have Z=(XY)where X = (ic)and Y = (q p)and U=(u 1 u 2). The functional relations (17.1) and (17.2) are the two equations of G(Z, U) = 0. The probabilistic assumptions on Z, follow from the assumptions on the (p.365) distributions of X(iand c), those on Uand the assumed relationships of (17.1) and (17.2). The model is structural in virtue of the assumption that income and costs are not determined by the equilibrium quantity and price (the exogeneity assumption) and the assumption the two functional relations (17.1) and (17.2) hold in virtue of the demand and supply mechanisms that generate the equilibrium relations.

For the purposes that follows, I assume ‐ perhaps artificially ‐ that the model has been selected in a reasonable way (either from observation or by good background knowledge). At this stage then the econometrician's inferential problem is to infer the parameter values from sample observations of q, p, iand c.

The simplest estimation method for estimating linear equations in econometrics is the ordinary least squares (OLS) method. This approach picks estimates for the parameters that minimize the sum of the square deviations of the estimated values for the left‐hand variable from the observed values for that variable. Provided certain assumptions are met, OLS provides consistent, unbiased and efficient estimators.6 Of these assumptions, some directly involve the error terms. These are: (i) errors have a constant variance (are homoscedastic’); (ii) errors are normally distributed; (iii) errors are uncorrelated with the right‐hand variables (orthogonality assumption). So in the example above OLS cannot be applied because p,being determined in the model, is unlikely to be orthogonal to the error term in either equation.7Therefore, to estimate this model one first solves for the reduced form equations (the solutions for pand q):

p = ( δ c β i + u 2 u 1 ) / ( α γ ) = α c + β i + v 1
q = α ( δ c β i + u 2 u 1 ) / ( α γ ) + β i + u 1 = γ c + δ i + v 2


α = δ / ( α γ ) β = β / ( α γ ) v 1 = ( u 2 u 1 ) / ( α γ ) γ = α δ / ( α γ ) δ = β α β / ( α γ ) v 2 = u 1 + α ( u 2 u 1 ) / ( α γ ) .

Now, if u 1, u 2 are both uncorrelated with cand i, have mean zero, are normally distributed and have constant variance then it follows that v 1 and v 2 meet all of these assumptions also. Then OLS can be applied to the reduced form equations to yield good (consistent, unbiased) estimates for parameters α′, β′, γ′ and δ′. Then consistent estimates for α, β, γ and δ can be obtained by using formulae (*) above.8 This method of estimating parameters for simultaneous equation models is called ‘indirect least squares’ (ILS).

(p.366) What is important to note here is that although a different estimation method has been used for the simultaneous model (ILS rather than OLS) similar assumptions have been imposed on the error terms, u, as would have been if OLS were a feasible estimation technique. Therefore, the assumptions on the error terms that ensure OLS yield good estimates in the non‐ simultaneous equations model, are also assumptions on the error terms that ensure ILS yields consistent9 estimates in the simultaneous equations model.10 Of course, the desirable properties of ILS (and OLS) estimators depends on these assumptions being met. I now consider these assumptions, their significance, and how they might be tested empirically.

The first assumption is that the error terms have constant variance. If this assumption is not met then OLS estimates are no longer efficient, though they remain unbiased and consistent.11 There is a generalization of OLS, called ‘generalized least squares’ which may — provided there is information about the changing variance of the error — be used to provide efficient estimates. The second assumption is that the error terms are normally distributed. Interestingly, some desirable properties of OLS, such as consistency and unbiasedness hold independently of this assumption. Nevertheless, there are important advantages to the normality assumption, since it provides the basis for the distributions for a whole host of important test‐statistics. If the normality assumption is not met then the distributions of the test statistics and of the estimates will almost certainly differ.12 This is a practical problem, however, and in principle if non‐normal distributions were specified for error terms it would be possible to numerically construct new test statistics and new distributions for the estimates. In conclusion, though these two assumptions on the error terms are important, their failure does not jeopardise the most desirable properties of the OLS estimates. Though this sounds promising, however, one should worry whether one can infer that it is theseassumptions that have failed. I discuss this below.

The third and last assumption is the orthogonality assumption, that the right‐hand variables are uncorrelated with the error term. If this assumption (p.367) is not met then the OLS estimates of the slope coefficient of an explanatory variable will be biased and inconsistent. Therefore, it is a key assumption that the error term must meet in order for the OLS estimates to be acceptable.

Having set out the significance of these three assumptions on the error terms for OLS estimation, I now consider to what extent these assumptions can be empirically tested. When attempting to empirically investigate error terms, one place to look is at the residuals for an estimated model, that is, at the differences between the estimated value for the left‐hand variable and the actual value it takes. If all of the assumptions of the model are met then the set of residuals are a sample for the error terms. In this way, the model makes predictions about the likely samples of residuals. Investigating the sample residuals can then give useful information about the error terms and whether the assumptions about them hold. In the case of normality, if one could be sure that all the other assumptions were met, then one could infer from deviations from the normal distribution to likely failure of the normality assumption of the errors. Likewise with the constant variance assumption, if there were signs of changing variance, there one would — provided one were sure all the other assumptions of the model were met — have reason to suspect that this assumption for the errors had failed. However, there is a key problem here in that one does not typically know that all the other assumptions hold. Thus when one has a sample that would be highly unlikely under the assumed model (cf. a low p‐value for a null hypothesis) then one has reason to suspect that at least one of the assumptions of the model is false, but one cannot infer which has failed. This is a form of the Duhem—Quine problem (see Ariew 2007) that one cannot infer from a false implication of a hypothesis which assumption(s) of the hypothesis fail. This problem is one reason why writers in econometric methodology, like Spanos (1999, p. 739), stress the importance of model specification which tests the model as a whole. If one has an incorrectly specified model, then the assumptions of ones statistical tests will probably not hold and the tests will be unreliable guides to inference.

The orthogonality assumption is also difficult to test. This can be seen by considering the simplest model of all, a regression model with only one explanatory variable, x.

y = α x + u .

In this case, the sample correlation between residuals and the sample values of xwill be zero by definition of the OLS estimate for α. Hence, regardless of the sample, in this simple regression model, the residuals are always uncorrelated with the right‐hand variable. Therefore, in this case there is no way to test from residuals whether or not the assumption that xis uncorrelated with the error term in the model is met. In regression models with more than one right‐hand variable, the more general result is that the residuals are uncorrelated with the sum of the products of the right‐hand variables with the OLS estimates of their (p.368) slope coefficients. Therefore, in these cases the residuals may be correlated in the sample with the one or more of the right‐hand variables. Whether or not, however, such correlations can be used to make inferences about the covariance between the error term and the right‐hand variables also depends on whether there is any covariance between the right‐hand variables.

Econometricians have developed methods for dealing with this problem. They have developed tests for whether variables are suitably orthogonal to the error and methods for consistent estimation where an explanatory variable is not orthogonal to the error term (instrumental variable estimation). Crucially, these methods tend to work by augmenting the model in some way so that the variable whose orthogonality with the error is suspect, is itself modelled in terms of other variables. For instance, in the case where an explanatory variable is correlated with the error term, the instrumental variables method attempts to find an additional ‘instrumental’ variable which is correlated with the explanatory variable but not the error term, which can be used in place of the non‐orthogonal variable for estimation. In short, testing for orthogonality failures between regressors and error terms is difficult, and generally requires using methods more sophisticated than the mere analysis of residuals. Importantly, it appears to necessitate a specification testing, that is, trying to find out if some important variables have been omitted in the model, variables whose explicit inclusion could overcome a failure of orthogonality.

17.3 The error term and identifiability

Identifiability is an important condition for performing statistical and causal inference in econometrics. If a model has unknowns that cannot be inferred uniquely from observation, then (that part of the model) is said to be unidentifiable. The classic, historical example of non‐identifiability in economics is that of measuring supply and demand curves from observed market move‐ ments.13 The problem is that observations of price and quantity transacted in a market are the result of both supply and demand mechanisms acting together. Therefore, the identification problem in this case is how to attribute any observed shifts in observed price and quantity of goods sold to supply and/or demand changes. The solution to the problem is to introduce some additional background ‘a priori’14 constraints (using background knowledge) to further limit the number of possible models that are consistent with observation.

(p.369) The simple example presented above of a simultaneous equation model is identifiable because one can solve uniquely for the structural parameters from the reduced form parameters (the coefficients in equations (17.3) and (17.4) above) which can be estimated from observation. In this example, identifiability follows from the form of the equations (17.1) and (17.2) which have sufficiently few unknown parameters so that their values can be solved for from the estimates of coefficients in (17.3) and (17.4). Here identifiability is being secured by the a priori exclusionof variables from equations (17.1) and (17.2). This method of ensuring identifiability in simultaneous equation models is generalized in the well‐known Rank Conditionfor identification.15This is a condition on the matrix of parameters in the model which if and only if met ensures it can be solved for from the reduced form equations (which can be estimated using OLS). This condition is necessary and sufficient for identifiability by using exclusions of variables from equations. What is important about identifiability by exclusion here is that it does notimpose any conditions on the error terms in the model. All that matters for identifiability secured in this way is that there be sufficiently many exclusions of variables from the equations in the model.

This may seem to suggest that there is no interesting connection between identifiability and error terms. However, this is incorrect, since identifiability can also be secured by imposing constraints on the covariance matrix of the error terms in a model. A well‐known example is that of the general (non‐ simultaneous) recursive model:

x 1 = u 1 x 2 = α 21 x 1 + u 2 : : : x n = α n 1 x 1 + α n n 1 x n 1 + u n

In this model only the first equation is identifiable by exclusions.16 The other equations do not meet the rank condition and are thus not identifiable without some further constraint. The natural additional constraint to impose here is to assume that the error terms in the equations are orthogonal to each other (i.e. have a diagonal covariance matrix) with which the model becomes fully identifiable. So in this example identifiability of the model depends on an additional assumption that the error terms are orthogonal. Unlike the previous simultaneous equations example, identifiability here rests in part on the error terms meeting an orthogonality condition.17

(p.370) Interestingly, if one uses OLS to estimate the coefficients in the equations in this model then the required orthogonality assumption for OLS implies given the functional form of the model that the errors are orthogonal to one another. Therefore, using OLS to estimate this model implicitly assumes the orthogonality assumption for the error terms, and this renders the model identifiable. Unfortunately, however, this implicit orthogonality assumption is not testable from residuals in this case. This is because, as in the case of the two variable regression model above (17.5), the residuals that are generated by using an OLS estimation technique will be uncorrelated by constructionwith the right‐hand variables, and thus, given the assumed functional form, will be uncorrelated by construction with one another. Therefore, regardless of the data, the residuals that result from OLS will be mutually uncorrelated. Therefore, analysing the residuals will not give any indication as to the correctness or otherwise of the assumed orthogonality of the error terms in the model. Justification of the implicit orthogonality assumption which ensures identifiability must be provided in some other way.

This type of problem, that constraints used for identification are not directly testable from the observations used to parameterise the model, is common. In fact, it is unsurprising since constraints used to secure identifiability are provided to supplement the insufficient power of the observations for determining a unique model as the most empirically adequate (i.e. solve the identification problem). It is only where there is a surplus of identifying constraints, that is, where not all the identifiability constraints are required for inferring a unique model that the observations used to pick the unique model can be also used to test surplus identifying constraints. In such a situation, the model is said to be overidentified. The example just given is not overidentified, it is just identified, that is, the observations and the orthogonal error terms are together just sufficient to pick out one unique model. Therefore, the chosen model is tailor‐made to have uncorrelated residuals, since without this assumption, there would not be a unique model that fits observation. In short, with the exception of over‐identified models, testing identifiability restrictions requires some observations or information in addition to the observations used to parameterise the model. This is the case of the orthogonal errors assumption used to identify the recursive model presented here.

17.4 The causal interpretation of the error term and its role in causal inference

So far, the chapter has ignored the causal aspect of structural models. Yet what is distinctive about structural models, in contrast to forecasting models, is that they are supposed to be — when successfully supported by observation — informative about the impact of interventions in the economy. As such, they (p.371) carry causal content about the structure of the economy. Therefore, structural models do not model mere functional relations supported by correlations, their functional relations have causal content which support counterfactuals about what would happen under certain changes or interventions.

This suggests an important question: just what is the causal content attributed to structural models in econometrics? And, from the more restricted perspective of this paper, what does this imply with respect to the interpretation of the error term? What does the error term represent causally in structural equation models in econometrics? And finally, what constraints are imposed on the error term for successful causal inference? In order to begin to answer these, I first present a simple causal semantics, developed by Herbert Simon (1953) especially for the kind of simultaneous (and non‐ simultaneous) equations models looked at in this chapter.

In Simon (1953) a formal definition of casual order for structural equations models is presented. To obtain the causal order, one first distinguishes between two types of variables in the model, endogenous and exogenous.18The endogenous variables are those that are determined by the model (for example qand pin the simultaneous equations example above), while the exogenous variables (for example income, i, and cost of production, c) and the error terms have values that are taken as given, from outside the model. One then solves for the endogenous variables one‐by‐one using the fewest equations required to solve for them; this stipulates an order for the solution of the endogenous variables. Any variable used to solve for and solved for prior to another variable causally precedes it. One variable directly causally precedes another if it causally precedes it and if it appears in the same equation as the other variable. The resulting ordering among the variables is the causal order.19

Consider the earlier supply and demand example, where one categorizes qand pas endogenous and iand cas exogenous.

q = α p + β i + u 1 d e m a n d q = γ p + δ c + u 2 s u p p l y .

Here one constructs the causal order by noting that qand pcan only be solved for together in terms of iand cand the error terms, using both equations. Moreover, since both iand cappear in an equation with qand p,both are direct causes of pand q.Also, since pand qare both determined together (in the same minimal set of equations) they are ‘co‐determined’. Thus, the (p.372) causal order can be represented by (where the arrows denote direct causal precedence):The error term and its interpretation in structural models in econometrics

Simon's causal order yields an intuitive result for the example since it makes explicit that income and production costs are direct causes of equilibrium price and quantity, while equilibrium price and quantity are co‐determined, just what one would expect for an equilibrium model of supply and demand.

Although Simon's causal order helps to make explicit the content of the functional form of structural equations, it is limited progress because it is merely a formal relation among the variables defined from the functional form of the equations. Despite its ‘causal’ label, as it stands it says nothing about the content of ‘cause’. Luckily, Simon helps by briefly discussing how variables and equations should be interpreted. Simon states that the exogenous variables should be taken to denote factors that are directly controllable by an ‘experimenter’ or ‘nature’, and endogenous variables taken to denote factors that are indirectly controllable. Equations are taken to denote mechanisms and error terms are taken to denote the joint role of omitted directly controllable factors in a mechanism. The core idea is that the experimenter or nature has hypothetical20 direct access to the directly controllable factors and is free to change them. Changing these then has an impact on the other indirectly controllable factors in virtue of the mechanisms that connect the indirectly controllable factors to the directly controllable factors. Under this interpretation, the causal order arises from the joint action of mechanisms, and maps out how changes in a factor will ‘in general’ lead to changes in other factors. It sets out that changing a cause ‘in general’ changes its effects, whereas it is possible to change an effect without changing one of its causes.21 It is this (p.373) related series of possible changes under direct changes that is represented by the formal ordering relation.

With some causal semantics in place, the causal interpretation error term can be investigated in some more detail. According to the brief discussion above, the error term denotes the net impact of factors in a mechanism, those not explicitly modelled.22 Yet what does it mean? Can any variable be omitted from an equation and simply ‘brought into’ an error term? The answer is quite simply ‘no’, as the following example illustrates.

Suppose one starts with the earlier simultaneous equation example.

q = α p + β i + u 1 d e m a n d q = γ p + δ c + u 2 s u p p l y .

Now imagine that one were to ‘omit’ price from the demand equation, by bringing it into the error term (let u 1 = u 1 + α p ), to obtain a new first equation.

q = β i + u 1 n e w d e m a n d q = γ p + δ c + u 2 s u p p l y .

For these modified equations, the causal order obtained following Simon's method is:The error term and its interpretation in structural models in econometrics

This causal order is different from that of the original system, even though the first model was assumed in constructing the second. By omitting price from the demand equation and bring it, as an omitted factor, into the error term changes the causal meaning of the model. Most strikingly perhaps, there is no longer an equilibrium relation between price and quantity but instead price is a direct cause of quantity transacted. This raises a worry, that the error terms in a model should not include as omitted factors, factors like pbecause (p.374) if they do then the apparent causal semantics of the model may misrepresent the underlying system.

Therefore, the causal interpretation of the error term as the joint impact of factors that are simply ‘omitted’, i.e. not explicitly modelled is too weak since it does not rule out cases like that just presented. To clarify the causal interpretation of the error term, one could perform an analysis to attempt to find the weakest interpretation of the error term where the causal ordering relations among the explicitly modelled variables remain unchanged (if one were to bring out our bring in factors from the error term).23 However, for my purposes here, I will simply assume that the factors omitted in the error terms are such that if they were to be introduced explicitly into the model they would be denoted by exogenous variables. This requires that factors omitted from the model, whose net impact is represented in the error terms, be causally prior to the factors whose causes are being modelled by the equations. Though this is stronger than necessary, it is intuitive and avoids the difficulty presented above.

To finish this section, I now consider briefly a key constraint that may be necessary for the error term to meet to use the model for causal inference. To keep the discussion simple, I look only at the simplest model.

y = α x + u .

Interpreting this model using Simon, where xis exogenous and yendogenous, amounts to reading the right hand variable, x,as a direct cause of y, and udenoting the net impact of a set of omitted direct causes of y. Here the aim is (as in the problem of statistical inference) to infer the unknown value of agiven observations of yand x.In this case though, since the problem is one of causal inference, I consider a simple experiment as an ideal way of inferring a.

The obvious experiment that comes to mind is to vary x,to see by how much ychanges as a result. This sounds straightforward, one changes x, ychanges and one calculates aas follows

α = Δ y / Δ x .

Everything seems straightforward. However there is a concern since uis unobservable: how does one know that uhas not also changed in changing x? Suppose that u does change so that there is hidden in the change in ya change in u,that is, the change in yis incorrectly measured by

Δ y f a l s e = Δ y + Δ u .

And thus that ais falsely measured as

α f a l s e = Δ y f a l s e / Δ x = Δ y / Δ x + Δ u / Δ x = α + Δ u / Δ x .

(p.375) Therefore, in order for the experiment to give the correct measurement for ·, one needs either to know that uhas not also changed or to know by how much it has changed. Since uis unobservable this cannot be known by observation. This leaves as the only option to know — in virtue of the knowledge of how the change was brought about ‐ that in changing x, uhas not also been unwittingly changed. Intuitively, this requires that it is known that whatever cause(s) of xwhich are used to change x, they are not causes of any of factors hidden in u. This is to require that xhave what Cartwright (1989, chap. 1) calls an ‘open back path’ with respect to y, that is, a cause which only causes yvia x. The open back path provides a channel by which xcould be varied to measure its impact on y. Such an open back path provides a ‘clean’ way to intervene in xfor the purposes of causal inference.24

More generally, the example above shows a need to constrain the error term in the equation in a non‐simultaneous structural equation model as follows. It requires that the each right‐hand variable have a cause that causes ybut not via any factor hidden in the error term. This imposes a limit on the common causes the factors in the error term can have with those factors explicitly modelled.

To finish, consider briefly the testability of the assumptions brought to light in this section. Given these assumptions directly involve the factors omitted in the error term, testing these empirically seems impossible without information about what is hidden in the error term. But given the error term is unobserveable, this places the modeller in a difficult situation: how to know that some important factor has not been left out from the model undermining desired inferences in some way. It also shows that there will always be element of faith in the assumptions about the error term.

17.5 Conclusion: Many different error term assumptions? Or a few in many guises?

This chapter has attempted to draw out what the error term represents in structural models and some of the conditions it has imposed upon it for inferential purposes. In the analysis of statistical inference (the OLS method of estimation), it was assumed that the error was normally distributed, had constant variance and was orthogonal to the explanatory variables. In the discussion of identifiability, it was shown that though identifiability can be achieved purely by exclusions of variables from the equations, but that this is not always the case, and that constraints on the covariance of the errors, (p.376) such as mutual orthogonality, are also used to achieve identifiability. Finally, the paper briefly presented a causal interpretation of the error term, as the net impact of omitted causal factors from a mechanism, and showed that for causal inference purposes, it is important that there be a cause of any explicitly modelled causal factor that does not cause the effect of interest through a factor hidden in the error term.

Though this analysis seems to yield a large number of conditions the error term must meet, it is important not to assume that these conditions are unconnected. In particular, there is a strong connection between the orthogonality assumption, which is central for estimation, and the open‐back‐path requirement observed for causal inference. This can also be seen by adopting a principle which licenses a move from correlations to causes, for instance, Reichenbach's principle of the common cause that probabilistic dependencies imply a causal connection or common cause(s).25 This principle implies that if the orthogonality assumption between the error and an explanatory variable fails, then either there is a factor in the error that causes the factor denoted by the variable, vice versa, or that there is a common cause of an explicitly modelled factor and a factor hidden in the error term. In the first two cases, no open back path is possible, while in third the common causal factor is itself not an open back path. Therefore, given Reichenbach's principle, a failure of orthogonality suggests a non open back path is being varied, the situation which frustrated causal inference. Therefore, there appears to be an intimate connection between the orthogonality requirement and the open‐back‐path requirement. This intimate connection is also visible in the instrumental variables method for overcoming a failure of orthogonality, in which a vari‐ able(s) is found which is correlated with the variable that fails orthogonality but uncorrelated with the error term. Interpreting these correlations using Reichenbach's principle, the instrumental variable is a search for some causal structure by which variable (which failed orthogonality with the error) can be varied, without varying the error term. In short, it is a search for an open back path.26

In any event, the point here isn't to explore the important connections between the conditions on the error term required for causal inference and those for statistical inference, but rather to show that such connections exist and are fundamental. This, of course, should not be surprising since ultimately the problems of statistical and causal inference overlap greatly. After all, the estimation methods of structural methods aim to measure strengths of causal connection.

(p.377) The second point in highlighting the connection between the orthogonality and the open‐back‐path condition is to highlight the centrality of this kind of assumption for inference. As seen above, their failure frustrates inference, and also their testing is not a straightforward matter of analysing residuals. Therefore, this condition, and more specifically the relationship between what is hidden in the error term and what is explicitly modelled deserves careful scrutiny. In the econometrics literature, this condition is typically discussed under the term ‘exogeneity’ of the explanatory variables in a model, though there are many different definitions of exogeneity and disputes over which is correct for which purposes.27 The analysis of this chapter suggests, seen from the orthogonality assumption and its causal cousin the open‐back‐path requirement, is that such exogeneity assumptions can play different roles (estimation, causal inference) when viewed from different perspectives.


This research was supported by the AHRC ‘Contingency and Dissent in Science’ project at the CPNSS, London School of Economics. I am very grateful for their support. I am also very grateful to participants of the ERROR conference, June 2006,Virginia Tech, Blacksburg, Virginia for helpful feedback. Finally, I would like to thank Michel Mouchart and an anonymous referee for very helpful comments.


Bibliography references:

Ariew, R. (2007). ‘Pierre Duhem’, The Stanford Encyclopedia of Philosophy, E. N. Zalta (ed.), URL = 〈http://plato.stanford.edu/entries/duhem/=

Cartwright, N. (1989). Nature's Capacities and their Measurement, Oxford, Clarendon press.

Engle, R. F., D. F. Hendry and J.‐F. Richard (1983). Exogeneity, Econometrica, 51, 277–304.

Fennell, D. (2005). A Philosophical Analysis of Causality in Econometrics, PhD thesis, University of London.

Fisher, F. (1966). The Identification Problem in Econometrics, Huntington: Krieger Publishing Company.

Gujarati, D. (1995). Basic Econometrics, 3rd edition, New York: McGraw‐Hill.

Haavelmo, T. (1944). ‘The Probability Approach to Econometrics’, Econometrica, 12, suppl., 1–115.

Hendry, D. (1995). Dynamic Econometrics, Oxford University Press: Oxford.

(p.378) Hoover, K. (2001). Causality in Macroeconomics, Cambridge: Cambridge University Press.

Maddala, G. S. (2001). Introduction to Econometrics, 3rd edition, New York: John Wiley and Sons.

Morgan, M. (1990). The History of Econometric Ideas, Cambridge: Cambridge University Press.

Reiss, J. (2003). Practice ahead of theory: Instrumental variables, natural experiments and inductivism in econometrics. Causality: metaphysics and methods technical reports, CTR 12/03, Centre for the Philosophy of the Natural and Social Sciences, London School of Economics.

Simon, H. (1953). Causal ordering and identifiability, reprinted in H. Simon, Models of Man, New York: John Wiley and Sons.

Simon, H. (1954). Spurious causation: A causal interpretation, reprinted in H. Simon Models of Man, New York: John Wiley and sons.

Spanos, A. (1999). Probability Theory and Statistical Inference, Cambridge University Press: Cambridge.

Spirtes, P., C. Glymour and R. Scheines (1993). Causation, Prediction and Search, New York: Springer‐Verlag.

Woodward, J. (2003). Making Things Happen: A Theory of Causal Explanation, New York: Oxford University Press.


(1) In practice the process of model selection and inference of parameters will tend be interlinked, for example, inferring certain parameter values (e.g. zero's) may lead one to simplify the model.

(2) See, for example, weak exogeneity in Engle et al. (1983).

(3) This assumes a bridge principle from conditional independencies to causal relations. This is a key element in theories of probabilistic causality, and has in the last twenty years been developed to a highly sophisticated degree in Causal Bayes Net methods. See, for example, the faithfulness condition in Pearl (2000).

(4) In the chapter, I use ‘residuals’ to denote the sample of the error terms (assuming the model to hold) and ‘error terms’ to denote the population random variable in the model of which (provided the model were true) the residuals would be a sample.

(5) This glosses over a difficult part of model building, particularly in this case where one is trying to sustain claims that the equilibrium relation can be represented by two static equations as is done here.

(6) See Gujarati (1995, chap. 3).

(7) If OLS were used here then the estimates would be biased and inconsistent.

(8) This assumes that one can solve for the original, structural parameters in terms of the reduced form parameters. If this is possible and if the reduced form parameters are identifiable, then the model is identifiable. The example chosen here is identifiable so this is not a problem here. In the next section, identifiability and the conditions it may impose on the error term are considered in more detail.

(9) Note that ILS estimates need not be unbiased nor efficient.

(10) That said, there is an important difference, nevertheless. In OLS the orthogonality assumption between the right‐hand variables and the error need only hold between right‐hand variables in one equation and the error term in that equation. In contrast, in the ILS case since right‐hand variables from other equations may also appear on the right‐hand side of the reduced form equations, we have made the stronger assumption that each error term from each structural equation is orthogonal to every variable not determined in the model.

(11) See Gujarati (1995, chap. 11) for a more detailed discussion of the consequences of non‐constant variances of the error terms.

(12) Though if samples are large, a central limit theorem can be used to show that distributions of error terms will be approximately normal, see Gujarati (1995, p. 316–317).

(13) See Morgan (1990) for a historical account of the development of ideas in relation to identification in econometrics.

(14) The term ‘a priori’ is typically used to describe the knowledge used to secure identifiability. This is meant as knowledge prior to that provided by the observations used to parameterize the model. It does not mean that this knowledge is not in itself empirical.

(15) For more on the rank condition see Fisher (1966, p. 39–41), Gujarati (1995, p. 657–669) and Maddala 2001, p. 348–352).

(16) Though this isn't particularly useful since there is no parameter to estimate in the first equation!

(17) See Fisher (1966, chap. 4) for detailed discussion of identification conditions using both exclusions and covariance matrix (of the errors) constraints.

(18) The account given here is based on the more detailed analysis of Fennell (2005, chap. 2). Note that this version deviates slightly from that of Simon (1953). However, the differences are not significant here.

(19) Note that Simon's causal order depends on the functional form of the equations since the order of solution by which the causal order is defined depends on the equations in which the variables appear.

(20) It is important that the direct control here is hypothetical. Directly controllable factors need not in fact be directly controlled by some actualexperimenter. This is why I believe, Simon permited ‘interventions’ by nature. The point is rather that the causal relations are such that if the ‘direct controllable’ factor were intervened upon surgically ‐ by an agent or by nature ‐ then they would change directly in virtue of these interventions and the other indirectly controllable factors would change as a result. I take the idea here to be similar to Woodward's more developed (2003) analysis of the causal relation in terms of hypothetical interventions.

(21) Though this gives us some idea of how to causally interpret the structural equations models, there is much which is not discussed by Simon. For example, it is also important in his interpretation that mechanisms be invariant to changes brought about by the experimenter or nature. Otherwise, the equations expressing the mechanisms could be completely changed upon intervention and the equations would tell us nothing about what happens to the indirectly controllable factors as a result of changes to the directly controllable factors. Also, it is important that the directly controllable factors be independent of each other in the sense that the experimenter must be ‘free’ to change them. Fleshing out these issues is an important step ‐ one which I do not attempt here ‐ in setting out a clear interpretation of the structural models. In addition, I do not argue for Simon's semantics over other possibilities, though this too is an important work to do.

(22) This type of interpretation of the error term is widespread. For example, according to Kevin Hoover ‘error terms might be thought to represent those INUS conditions that, though they help to determine the effects and are not constant, are not explicitly measured or modelled’ (2001, p. 50). While Herbert Simon states that “‘error terms”….measure the net effects of all other variables (not introduced explicitly) upon the system’ (1954, p. 40). Nancy Cartwright (1989, p. 29) states that the error terms are ‘supposed to represent the unknown or unobservable factors that may have an effect’.

(23) See Fennell (2005, chap. 3) for an analysis of this kind.

(24) Similar conditions to the open back path requirement appear widely in the literature. For instance, James Woodward incorporates a similar condition into his definition of an intervention variable, see Woodward (2003, p. 98).

(25) There are related principles such as the Causal Markov condition which also allow one to make inferences from correlations to causes. See Spirtes, Glymour and Scheines (1993) for more details.

(26) See Reiss (2003) for a causal discussion of instrumental variables.

(27) See Engle et al. (1983).