[SPSS] Ordinal Logistic Regression Analysis

This post covers how to perform ordinal logistic regression analysis with SPSS.

1. Preparation

For the analysis, we will use 'demo.sav' file that is one of the example datasets that SPSS provides.

The research question is whether 1) gender, 2) education level, or 3) age is associated with the job satisfaction.

Since 'gender' is a string variable (gender: f = Female, m = Male), we will recode it into a numeric variable (female: 0 = Male, 1 = Female).

RECODE gender ('f'=1) ('m'=0) (MISSING=SYSMIS) INTO female.
VARIABLE LABELS  female 'Gender (0 = Male, 1 = Female)'.
VALUE LABELS female 0 'Male', 1 'Female'.
FORMATS female (f8.0).
EXECUTE.

2. Assumption check

The dependent variable is measured as an ordinal variable.

Multicolinearity: we should check the VIF, and one way is to run a linear regression analysis.

Parallel lines: the effect of independent variables should be the same across the variable values of the outcome variables.

3. Analysis

Analyze - Regression - Ordinal Logistic

Put the dependent variable to 'Dependent', categorical independent variables into 'Factor(s)', numeric independent variables into 'Covariate(s)'.

Here, level of education is measured as a scale variable, but we can assume that it is a categorical variable.

Similarly, job satisfaction is also measured as a scale variable, but we can assume that it is an ordinal variable.

Under 'Output' option, we want to check the box for 'Test of parallel lines' to test the assumption.

Below is the syntax.

PLUM jobsat BY female ed WITH age
  /CRITERIA=CIN(95) DELTA(0) LCONVERGE(0) MXITER(100) MXSTEP(5) PCONVERGE(1.0E-6) SINGULAR(1.0E-8)
  /LINK=LOGIT
  /PRINT=FIT PARAMETER SUMMARY TPARALLEL.

4. Interpretation

We want to look at the very last table first, which shows us the results of the test of parallel lines.

p<.05 means that it is likely that the data violate the parallel lines assumption.

In such a scenario, we would consider using multinomial logistic regression analysis.

For the sake of explanation, we will assume that the data met the parallel lines assumption (p-value is higher than .05) and continue interpretation.

At the very top of the SPSS output, we see a warning message.

The message warns us that some cells are empty, which relates to the issue of sparseness.

For example, for those who are 'male', 'did not complete highschool', and 'age of 18', we only have job satisfaction of 'Somewhat satisfied' but no other variable values.

You can see the detailed information by checking the 'Cell information' at the 'Output menu', where we checked the box of Test of parallel lines, and you will get a long table of intricate crosstab.

We then have the case processing summary where we can check the number of valid cases.

For Model information, we want p-value less than .05, which means that the model is better than the null-model (intercept-only model).

For Goodness of fit, however, we want p-value above .05.

In this case, it is less than .05, so we should consider rebuilding the model.

Personally, I do not pay much attention to the pseudo-R square unless it is significantly small or model comparison is required.

It might be useful when comparing multiple models.

Lastly, the parameter esimates show the significance (p-value) and the log odds ratio.

If the log odds ratios, which are different to odds ratio, have + sign, it means there is a positive association with the dependent variable, and if - sign, a negative association.

In terms of gender, we found no evidence that there is statistically significant difference regarding job satisfaction.

Compared to people with post-undergraduate degree (level of education=5), people who did not finish their highshcool degree (level of education=1) has a higher likelihood of falling at a higher level of job satisfaction (Log OR: 0.435, 95% CI:0.228-0.642, p<.000).

We can see that higher age is associated with higher job satisfaction.

Specifically, for every one unit (year) increase in age, there is a predicted increase of 0.048 log odds ratio falling at a higher level of job satisfaction (0.048 log odds ratio = 1.0492 odds ratio).

Note that the log odds ratio is not odds ratio that we see in binary or multinomial logistic regression.

However, using the log odds ratio value can be difficult to grasp, so we may want to transform it into regular log odds.

SPSS does not provide a convenient function of the transformation, so we may use caculation websites such as the following: http://vassarstats.net/tabs_odds.html

You can type in the log odds and then click 'Calculate' to obtain the regular odds.

In this way, you can convert all the log odds to regular odds, including the 95% CI.

For example, we can say that compared to people with post-undergraduate degree (level of education=5), people who did not finish their highshcool degree (level of education=1) has a higher likelihood of falling at a higher level of job satisfaction (OR: 1.55, 95% CI:1.26-1.90, p<.000).

For your information, the log transformation here refers to changing the equation of ln(k)=x into e^x=k, and the x is the log odds that SPSS provides and the k is the regular odds that we want.