## Granger Causality

Granger causality is one of several tools that are extensively used to figure out cause and effect (causal) relationship from data. However there are some strong assumptions on data that limits the applicability of the Granger causality which will be listed later. Let us now formally introduce this Causality test for a simple linear model.

Let be a variable that is of predictive interest to you. Assume, you have access to historical values of starting from some past time till the current time . Additionally, let our system model currently have control variables represented by the matrix whose historical values, represented as , are available from time to . These control variables and are assumed to form a linear predictive model for future values of represented by as

.

In the above equation, and are coefficients of this linear predictive model that are estimated using linear regression. Now if we would like to verify if a new control variable with its historical values can improve the prediction of then, the best way to test this hypothesis is to expand the above linear model as

where are the coefficients of the control variable. If this new predicted value represented by has a statistically significant quality metric such as variance lower than then has some magical cause and effect relationship with . The best way to check lower variance would be to verify the hypothesis simultaneously. If the residuals from both the above linear models are normally distributed then an F-test would reveal if the addition of coefficients are has resulted in reducing the variance.

However, some of the caveats of the above method are:

- The linear system model assumption for the output variable
- The normality of the residuals from linear models. F-test is particularly sensitive to normality requirement. Non-normal residuals can skew the results from an F-test.
- Finally, the minimum number of data samples that are required to verify this causality could be significantly large depending on the number of control variables and the number past values (lag) terms of those control variables .

## Leave a Reply