Statistical ramblings of a Moonbat

I will try to use this medium to talk about my statistical musings

Granger Causality

leave a comment »

Granger causality is one of several tools that are extensively used to figure out cause and effect (causal) relationship from data. However there are some strong assumptions on data that limits the applicability of the Granger causality which will be listed later. Let us now formally introduce this Causality test for a simple linear model.

Let X be a variable that is of predictive interest to you. Assume, you have access to historical values of X starting from some past time p till the current time t . Additionally, let our system model currently have N control variables represented by the matrix W whose historical values, represented as W_{p:t} , are available from time p to t . These control variables W_{p:t} and X_{p:t} are assumed to form a linear predictive model for future values of X represented by X_{t+1} as

X_{t+1}=\sum_{i=p}^{t}{\left\{\sum_{j=1}^{N}{\left[\alpha_{ij}*W_{ij}\right]} + \beta_i*X_i\right\}} .

In the above equation, \alpha_{ij};i\in\left\{p,p+1,\cdots,t\right\}, j\in\left\{1,2,\cdots,N\right\} and \beta_i;i\in\left\{p,p+1,\cdots,t\right\} are coefficients of this linear predictive model that are estimated using linear regression.  Now if we would like to verify if a new control variable Y with its historical values Y_{p:t} can improve the prediction of  X_{t+1} then, the best way to test this hypothesis is to expand the above linear model as

\hat{X}_{t+1}=\sum_{i=p}^{t}{\left\{\sum_{j=1}^{N}{\left[\alpha_{ij}*W_{ij}\right]} + \beta_i*X_i + + \gamma_i*Y_i \right\}}

where \gamma_i:i\in\left\{p,p+t,\cdots,t\right\} are the coefficients of the control variable. If this new predicted value represented by \hat{X}_{t+1} has a statistically significant quality metric such as variance lower than X_{t+1} then Y has some magical cause and effect relationship with X . The best way to check lower variance would be to verify the hypothesis \gamma_i=0;i\in\left\{p,p+1,\cdots,t\right\} simultaneously. If the residuals from both the above linear models are normally distributed then an F-test would reveal if the addition of coefficients \gamma_i;\left\{p:t\right\} are has resulted in reducing the variance.

However, some of the caveats of the above method are:

  • The linear system model assumption for the output variable X
  • The normality of the residuals from linear models. F-test is particularly sensitive to normality requirement. Non-normal residuals can skew the results from an F-test.
  • Finally, the minimum number of data samples that are required to verify this causality could be significantly large depending on the number of control variables N and the number past values (lag) terms \left(t-p\right) of those control variables .

Written by ranabasheer

December 28, 2012 at 1:17 pm

Posted in Statistics, tutorial

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: