# Statistical ramblings of a Moonbat

I will try to use this medium to talk about my statistical musings

## Granger Causality

Granger causality is one of several tools that are extensively used to figure out cause and effect (causal) relationship from data. However there are some strong assumptions on data that limits the applicability of the Granger causality which will be listed later. Let us now formally introduce this Causality test for a simple linear model.

Let $X$ be a variable that is of predictive interest to you. Assume, you have access to historical values of $X$ starting from some past time $p$ till the current time $t$. Additionally, let our system model currently have $N$ control variables represented by the matrix $W$ whose historical values, represented as $W_{p:t}$, are available from time $p$ to $t$. These control variables $W_{p:t}$ and $X_{p:t}$ are assumed to form a linear predictive model for future values of $X$ represented by $X_{t+1}$ as

$X_{t+1}=\sum_{i=p}^{t}{\left\{\sum_{j=1}^{N}{\left[\alpha_{ij}*W_{ij}\right]} + \beta_i*X_i\right\}}$.

In the above equation, $\alpha_{ij};i\in\left\{p,p+1,\cdots,t\right\}, j\in\left\{1,2,\cdots,N\right\}$ and $\beta_i;i\in\left\{p,p+1,\cdots,t\right\}$ are coefficients of this linear predictive model that are estimated using linear regression.  Now if we would like to verify if a new control variable $Y$ with its historical values $Y_{p:t}$ can improve the prediction of  $X_{t+1}$ then, the best way to test this hypothesis is to expand the above linear model as

$\hat{X}_{t+1}=\sum_{i=p}^{t}{\left\{\sum_{j=1}^{N}{\left[\alpha_{ij}*W_{ij}\right]} + \beta_i*X_i + + \gamma_i*Y_i \right\}}$

where $\gamma_i:i\in\left\{p,p+t,\cdots,t\right\}$ are the coefficients of the control variable. If this new predicted value represented by $\hat{X}_{t+1}$ has a statistically significant quality metric such as variance lower than $X_{t+1}$ then $Y$ has some magical cause and effect relationship with $X$. The best way to check lower variance would be to verify the hypothesis $\gamma_i=0;i\in\left\{p,p+1,\cdots,t\right\}$ simultaneously. If the residuals from both the above linear models are normally distributed then an F-test would reveal if the addition of coefficients $\gamma_i;\left\{p:t\right\}$ are has resulted in reducing the variance.

However, some of the caveats of the above method are:

• The linear system model assumption for the output variable $X$
• The normality of the residuals from linear models. F-test is particularly sensitive to normality requirement. Non-normal residuals can skew the results from an F-test.
• Finally, the minimum number of data samples that are required to verify this causality could be significantly large depending on the number of control variables $N$ and the number past values (lag) terms $\left(t-p\right)$ of those control variables .

Written by ranabasheer

December 28, 2012 at 1:17 pm

Posted in Statistics, tutorial