[SPSS] Multinomial Logistic Regression Analysis

This post covers how to perform multinomial logistic regression analysis with SPSS.

    1. Preparation

    For the analysis, we will use 'demo.sav' file that is one of the example datasets that SPSS provides.

    The research question is whether 1) gender, 2) education level, or 3) age is associated with the price category of primary vehicle that people use.

    The variable 'gender' is a string variable, so it needs a recoding into a numeric variable (female: 0 = Male, 1 = Female).

    RECODE gender ('f'=1) ('m'=0) (MISSING=SYSMIS) INTO female.
    VARIABLE LABELS  female 'Gender (0 = Male, 1 = Female)'.
    VALUE LABELS female 0 'Male', 1 'Female'.
    FORMATS female (f8.0).
    EXECUTE.
    

    2. Assumption check

    Spareseness (Sparsity): there should be enough cases for each cell, so if not we should consider merging categories or even dropping some.

    Outlier: there should be no case where the value is exteremely large or small.

    Multicolinearity: we should check the VIF, and one way is to run a linear regression analysis.


    3. Analysis

    Analyze - Regression - Multinomial Logistic

    Put the dependent variable to 'Dependent', categorical independent variables into 'Factor(s)', numeric independent variables into 'Covariate(s)'.

    Here, level of education is measured as a scale variable, but we can assume that it is a categorical variable.


    We can also change the referenece category of the dependent variable.


    Under 'Statistics', we can choose a few more options, and 'Goodness of fit' is recommended.


    Below is the syntax.

    NOMREG carcat (BASE=LAST ORDER=ASCENDING) BY female ed WITH age
      /CRITERIA CIN(95) DELTA(0) MXITER(100) MXSTEP(5) CHKSEP(20) LCONVERGE(0) PCONVERGE(0.000001) SINGULAR(0.00000001)
      /MODEL
      /STEPWISE=PIN(.05) POUT(0.1) MINEFFECT(0) RULE(SINGLE) ENTRYMETHOD(LR) REMOVALMETHOD(LR)
      /INTERCEPT=INCLUDE
      /PRINT=FIT PARAMETER SUMMARY LRT CPS STEP MFI.
    

    4. Interpretation

    The first result is the summary.

    For Model information, we want p less than .05, which means that the model is better than the null-model (intercept-only model).

    However, we want p above .05 for Goodness-of-fit. In this case, it is less than .05, so we should consider rebuilding the model.

    Personally, I do not care much about the pseudo-R square unless it is significantly small or model comparison is required.

    Lastly, the likelihood ration tells us which variables make significant impact of the model in terms of explaining the outcome variable.


    The parameter esimates show the significance (p-value) and the odds ratio.

    As age increases, people are less likely to use economy vehicle compared to luxury vehicle (OR: 0.92, 95% CI: 0.92-0.93, p<.000).

    Notice that 2 years of age increase is equal to decrease of likelihood of 0/96^2, not -4%*2.

    In terms of gender, we found no evidence that there is statistically different probability in terms of using primary vehicle between the two genders.

    Compared to people with post-undergraduate degree (level of education=5), people with highshcool degree (level of education=2) has 176% higher likelihood of using economy vehicle, compared to luxury vehicle (OR: 2.76, 95% CI:2.02-3.78, p<.000).

    Post a Comment

    0 Comments