Steps of conducting Simple Linear Regression

Research question: Does Time since PhD predicts Number of Publications?

Open SPSS:

  • Select ANALYZE -> REGRESSION -> LINEAR.
  • Move PUBS (dependent variable) into DEPENDENT and TIME (independent variable) into INDEPENDENT.

Click on Statistics -> Select CONFIDENCE INTERVALS and DESCRIPTIVES.

  • Click on Continue.
  • Click on Save -> Select Mean and Individual below Prediction Intervals.
  • Click on Continue -> Click on Paste to get the following syntax.

REGRESSION

  /DESCRIPTIVES MEAN STDDEV CORR SIG N

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS CI R ANOVA

  /CRITERIA=PIN(.05) POUT(.10) CIN(95)

  /NOORIGIN

  /DEPENDENT PUBS

  /METHOD=ENTER TIME

  /SAVE MCIN ICIN. 

  • Run the syntax and you will get the following output:

The above table gives Pearson correlation coefficient and the p-value for one-tailed test . For the p-value for two-tailed test , double the p-value for the one-tailed test. For this example, the two-tailed test p-value is .008.

  • R is the bivariate correlation between X and Y in simple linear regression. It has the same values the one in the table titled “correlations”.
  • RSQUARE is the coefficient of determination which is the proportion of the Y variability explained by the set of predictor variables in the sample.  Here that 43.1% of the variability of the number of publications is explained by time since PhD.
  • Adjusted R square is the estimated proportion of variability in the dependent variable explained by the set of independent variables in the population (adjusted for sample size and number of predictors).
  • Std. Error of estimate is the square root of MSE as in the following table titled “ANOVA”.
  • We got the two estimates of intercept (denoted by (constant)) and coefficient of years since PhD (i.e., slope) under UNSTANDARDIZED COEFFICIENTS B column.  The estimated regression line would be Y=4.731 + 1.983X. 
  • Std. Error of the unstandardized coefficients is the standard deviation of the sampling distribution of the regression coefficients. A t-statistic is used to test the significance of the regression coefficient and the resulting p-value is reported under Sig. column. In this example, the regression coefficient related to time since PhD (i.e., the slope) is significantly different from zero (t13=3.139, p=.008). The constant (i.e., the intercept) is not different from zero (t13=.846, p=.413)
  • Standardized coefficient (Beta) is .657. Note that this is the same as correlation (This is the case only for simple linear regression). Note the relationship between B (unstandardized) and Beta (standardized) is as the following. This relationship holds also in multiple linear regression.
  • From the Data window, and there are four additional columns in the dataset. The two columns named  LMCI_1 and UMCI_1 contain the lower bound and upper bound of the confidence interval, and the two columns named  LICI_1 and UICI_1 contain the lower bound and upper bound of the prediction interval . For example, when number of years since PhD is 9 (see row 13 of the dataset), the 95% CI of the mean number of publications is between 16.28 and 28.88, and the 95% PI of the number of publication is between -1.63 and 46.78.

Graph of Confidence and Prediction Band

SPSS can produce graph showing confidence intervals around regression line and prediction intervals for the predicted value of Y using the following syntax:

IGRAPH

  /VIEWNAME=’Scatterplot’

  /X1=VAR(TIME) TYPE=SCALE

  /Y=VAR(PUBS) TYPE=SCALE

  /COORDINATE=VERTICAL

  /FITLINE METHOD=REGRESSION LINEAR INTERVAL(95.0)=MEAN INDIVIDUAL LINE=TOTAL SPIKE=OFF

  /YLENGTH=5.2

  /X1LENGTH=6.5

  /CHARTLOOK=’NONE’

  /SCATTER COINCIDENT=NONE.

Write-up (APA Format):

Simple regression was conducted to investigate how well the number of years since PhD predicts the number of publications. The results were statistically significant F(1,13)=9.86, p<.01. The identified equation to understand this relationship was number of publications=4.73+1.98*(number of years since PhD). The adjusted R2 was .387. This indicates that 38.7% of the variance in number of publications was explained by the number of years since PhD. According to Cohen (1988), this is a large effect.