[Statistics] Path Analysis

This post covers path analysis in a theoretical approach.

1. Path Analysis Model

Path analysis is a statistical modeling technique that allows researchers to explore and quantify the relationships between variables. It is a extension of multiple regression analysis that enables the investigation of complex causal relationships among variables by incorporating both direct and indirect effects.

In a path analysis model, variables are represented as nodes or boxes, and arrows or paths indicate the hypothesized relationships between them. The analysis aims to estimate the strength and direction of these relationships and determine the overall fit of the model to the data. Path analysis can be seen as a combination of regression analysis and structural equation modeling (SEM), as it incorporates both predictive and structural elements.

We can also say that path analysis model is a model that consists only of observed variables, which makes it the simplest model among SEMs. That is, it is assumed that there is no measurement error (observational error) and it is not considered. Also, the model can analyze not only direct effect but also indirect effect in a single model.

2. Key components of a path analysis model

Exogenous and Endogenous Variables:
Exogenous variables, also known as independent variables or predictors, are those that are not influenced by other variables in the model. Endogenous variables, on the other hand, are influenced by one or more variables in the model and are often the outcome or dependent variables of interest.

Direct and Indirect Effects:
Path analysis allows for the estimation of both direct effects (the direct influence of one variable on another) and indirect effects (the influence of one variable on another through one or more mediating variables). These effects can be quantified and tested for significance.

Covariates and Error Terms:
Covariates, also known as control variables, are additional variables that are included in the model to account for potential confounding factors. Error terms represent the unexplained variance or random factors in the model.

Model Fit Assessment:
The fit of the path analysis model to the data is assessed using various fit indices, such as the chi-square test, comparative fit index (CFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). These indices indicate how well the model fits the observed data.

3. Understanding Indirect Effect

Total effect is a sum of direct effect and inderect effect.

1) Model with 1 mediating variable

a = direct effect of X on Y (X -> Y)

b*c = indirect effect of X on Y (X -> M -> Y)

a + b*c = total effect of X on Y

2) Model with 2 mediating variables

a = direct effect of X on Y

b1*c1 = #1 indirect effect of X on Y (X -> M1 -> Y)

b2*c2 = #2 indirect effect of X on Y (X -> M2 -> Y)

b1*c1 + b2*c2 = total indirect effect of X on Y

a + b1*c1 + b2*c2 = total effect of X on Y

4. Testing the Indirect Effect

1) Sobel test

Sobel test and Bootstrapping are among the well-known methods for testing the statistical significance of the indirect effect.

Sobel test is a more classical method and uses the formula below.

$ Z_{bc}= \frac{b \times c}{\sqrt{{b^2} \times se(c)^2+{c^2} \times se(b)^2} } $

(Right click to adjust the font size)

The main drawback of Sobel test is that it is based on the assumption that the estimate of the indirect effect (b*c) follows a normal distribution.

For example, if we sample 300 people out of the population, it assumes that the b*c values of each sample constructs a normal distribution.

Nevertheless, the b*c values in fact often do not follow the normal distribution, and many researchers now consider that the assumption is rather unrealistic.

As a result, it has given more popularity to Bootstrapping method than Sobel test.

2) Bootstrapping

Unlike Sobel test, bootstrapping does not require any assumption, which is the greatest advantage of the method.

The core idea of bootstrapping is creating multiple samples from the samplen not from the population.

This is called re-sampling, and the sampling has to be done with replacement.

For example, suppose we have a data set with 300 cases.

We create a new samples of 300 cases, each case being randomly selected from the original 300 cases with replacement.

There is no strict criterion in terms of the number of the samples, but many articles do 1,000 or 5,000

Next, calculate the distribution of b*c values derived from each sample.

The standard error and confidence interval are based on the distribution.

In a nut shell, this is an empirical approach that nees no assumptions regarding the distribution.

(Thumbnail image: photo by Mika Baumeister on Unsplash)

[Statistics] Path Analysis

1. Path Analysis Model

2. Key components of a path analysis model

3. Understanding Indirect Effect

1) Model with 1 mediating variable

2) Model with 2 mediating variables

4. Testing the Indirect Effect

1) Sobel test

2) Bootstrapping

Post a Comment

0 Comments

Categories

Search

Popular Posts

[R] Data Import

[Statistics] Central Limit Theorem

[R] Calculation