Regression Tests and Models

Durbin Watson Test: Formula, Interpretation, R, Python, SPSS and Excel Guide 1

Durbin Watson Test regression residual autocorrelation guide with formula, 0 to 4 interpretation scale, R, Python, SPSS and Excel results
Advertisement
Google AdSense Top placement reserved here

Regression Diagnostics and Residual Autocorrelation

Durbin Watson Test is a regression diagnostic used to check whether consecutive residuals are autocorrelated. This complete guide explains the Durbin Watson statistic, formula, null hypothesis, 0–4 interpretation, critical-value logic, R workflow, Python workflow, SPSS output, Excel calculation, charts and verified results from the student-por.csv dataset.

Advertisement
Google AdSense top placement reserved here

Quick Answer: Durbin Watson Test Result

A Durbin Watson Test was conducted on residuals from the main G3 regression model. The verified statistic was d = 1.8615. The lag-1 residual correlation was about 0.0685. Since the statistic is close to 2 and remains inside the common 1.5 to 2.5 rule-of-thumb range, the model does not show serious first-order autocorrelation.

Main outcomeG3
Sample size649
DW statistic1.8615
ConclusionNo serious issue

Final report sentence: A Durbin Watson Test was used to examine first-order autocorrelation in the residuals of the G3 regression model. The result was d = 1.8615, with lag-1 residual correlation of about 0.0685. Because the statistic is near 2 and falls within the 1.5–2.5 rule-of-thumb range, the residuals do not show serious first-order autocorrelation.

What Is the Durbin Watson Test?

The Durbin Watson Test is used after fitting a regression model. It examines whether the residual from one observation is related to the residual from the previous observation. In time-series regression, this matters because errors may follow a pattern across time. If residuals are positively autocorrelated, a positive error today may be followed by another positive error tomorrow. If residuals are negatively autocorrelated, a positive error may be followed by a negative error.

Most short online explanations stop at the statement “2 means no autocorrelation.” That is not enough for a serious data-analysis post. A better explanation must show the regression model, residuals, formula components, lagged residual plot, residual ACF, software code, and a careful warning about observation order. This guide does that using R, Python and SPSS verification.

Important statistical note: this test is most meaningful when observations have a real order, such as time order, production order, spatial route order or another sequence. The student-por.csv dataset is cross-sectional, so the row order is used here as a reproducible teaching order. That makes this a useful tutorial example, but not proof of a real time-series process.

If you are learning normality and variance-testing workflows alongside regression diagnostics, you may also find these related guides useful: DAgostino Pearson Test for skewness-kurtosis normality checking, Cramer von Mises Test for goodness-of-fit, Cochran C Test for checking the largest variance, and Brown Forsythe Test for robust equality-of-variance analysis.

Durbin Watson Test Formula

The Durbin Watson statistic is calculated from regression residuals. If the residual at observation t is written as et, the statistic is:

d = Σ(e[t] - e[t-1])² / Σe[t]²

The numerator measures how much consecutive residuals change. The denominator measures the total squared residual size. If residuals change randomly around zero, the statistic is usually close to 2. If consecutive residuals are too similar, the numerator becomes smaller and the statistic moves below 2. If consecutive residuals tend to alternate signs, the numerator becomes larger and the statistic moves above 2.

DW statistic range General interpretation Practical meaning
Near 2 No clear first-order autocorrelation Consecutive residuals are not strongly related.
Below 2 Possible positive autocorrelation Residuals may move in the same direction across adjacent observations.
Above 2 Possible negative autocorrelation Residuals may alternate direction across adjacent observations.
0 to 4 Full possible range 0 is extreme positive serial correlation; 4 is extreme negative serial correlation.

Rule-of-thumb interpretation

A common practical rule is that values between about 1.5 and 2.5 usually do not indicate a serious first-order autocorrelation problem. This rule is helpful for a quick diagnostic, but formal decisions should consider the number of observations, number of predictors, model structure and critical values.

Critical-value interpretation

The formal table method uses lower and upper bounds, often called dL and dU. For a positive autocorrelation test, if the statistic is below dL, there is evidence of positive autocorrelation. If it is above dU, the test does not reject no autocorrelation. If it falls between dL and dU, the result is inconclusive. For negative autocorrelation, the same logic is applied to 4 − d.

Durbin Watson Test Null Hypothesis and Alternative Hypothesis

Hypothesis Meaning Applied to this example
H0 The regression disturbances/residuals have no first-order autocorrelation. The G3 regression residuals are not seriously autocorrelated.
H1 The regression disturbances/residuals have first-order autocorrelation. The G3 regression residuals show positive or negative serial dependence.
Advertisement
Google AdSense middle placement reserved here

Dataset and Regression Model Used

This worked example uses the student-por.csv student performance dataset. The main outcome variable is G3, the final grade. The regression model predicts G3 from earlier grades, study-related variables and family/academic background variables.

Item Verified value Explanation
Rows 649 Total student observations used in the model.
Main outcome G3 Final grade, ranging from 0 to 20.
Main model predictors G1, G2, studytime, failures, absences, age, Medu, Fedu Academic, study and background predictors.
Order used Existing row order Used as a reproducible demonstration order because the dataset is cross-sectional.

External dataset source: UCI Machine Learning Repository: Student Performance dataset.

Verified Results in R, Python and SPSS

The analysis was reproduced in three environments. R used an OLS model and the lmtest workflow. Python used statsmodels OLS residuals and a permutation check. SPSS imported the clean CSV, saved predicted values and residuals, and manually computed the statistic from consecutive residual differences.

Main Model Result

Software DW statistic Lag-1 residual correlation p-value / diagnostic result Interpretation
R 1.8615 0.0686 lmtest two-sided p ≈ 0.0672 No serious first-order autocorrelation by the 1.5–2.5 rule.
Python 1.861535 0.068590 Permutation two-sided p ≈ 0.070746 No serious first-order autocorrelation by the 1.5–2.5 rule.
SPSS 1.861535 0.068491 Manual SPSS residual calculation No serious first-order autocorrelation by the 1.5–2.5 rule.

SPSS Regression Model Summary

Model statistic Value Meaning
R 0.922 Strong association between fitted and observed G3.
R Square 0.851 The model explains about 85.1% of G3 variation.
Adjusted R Square 0.849 Adjusted explanatory power after accounting for predictors.
Std. Error of Estimate 1.256 Typical prediction error size in grade units.
F statistic 456.111 Overall model significance test.
Model Sig. .000 The model is statistically significant overall.

Main Regression Coefficients from SPSS

Predictor B Std. Error Beta t Sig. Short interpretation
Constant -0.501 0.774 -0.648 0.518 Not significant.
G1 0.143 0.037 0.122 3.910 0.000 Earlier grade G1 positively predicts G3.
G2 0.885 0.034 0.798 25.744 0.000 G2 is the strongest predictor of G3.
studytime 0.097 0.062 0.025 1.556 0.120 Not significant after controlling for other variables.
failures -0.235 0.095 -0.043 -2.471 0.014 Previous failures negatively predict G3.
absences 0.023 0.011 0.033 2.085 0.038 Small positive coefficient in this controlled model.
age 0.023 0.044 0.009 0.520 0.604 Not significant.
Medu -0.045 0.058 -0.016 -0.776 0.438 Not significant.
Fedu 0.022 0.059 0.007 0.371 0.711 Not significant.

Model Comparison

Model Predictors DW statistic Interpretation
Main model G1, G2, studytime, failures, absences, age, Medu, Fedu 1.861535 Close to 2; no serious first-order autocorrelation.
Simple model G1, G2 1.851560 Still close to 2; no serious first-order autocorrelation.
Background model studytime, failures, absences, age, Medu, Fedu, traveltime, health 1.807975 Slightly more positive residual dependence, but still not severe by the rule of thumb.

Durbin Watson Test Charts and Interpretation

1. Actual vs Fitted G3

Durbin Watson Test actual versus fitted G3 regression model chart
Actual versus fitted G3 values for the regression model used before checking residual autocorrelation.

This chart shows whether the regression model predicts G3 reasonably well. The points follow the diagonal trend closely, which agrees with the high R Square value of about 0.851. Some low-grade outliers remain, but the model captures most of the final-grade pattern.

2. Regression Residuals by Observation Order

Durbin Watson Test regression residuals by observation order
Regression residuals plotted in the row order used for the diagnostic.

The residuals fluctuate around zero, which is what we want to see. There are some spikes, especially around outlier cases, but the plot does not show a strong smooth trend. This supports the conclusion that there is no serious first-order residual autocorrelation.

3. Consecutive Residual Differences

Durbin Watson Test consecutive residual differences chart
Consecutive residual differences, e[t] − e[t−1], used in the numerator of the statistic.

This chart shows how much each residual changes from the previous residual. Most changes remain near zero, with occasional large jumps. These large jumps contribute to the numerator of the statistic, but they do not create a severe autocorrelation pattern by themselves.

4. Durbin Watson Numerator Contributions

Durbin Watson Test numerator contribution chart showing squared consecutive residual differences
Squared consecutive residual differences showing which adjacent observations contribute most to the numerator.

The tallest spikes identify observation positions where adjacent residuals changed sharply. This chart is useful because it opens the formula visually: the statistic is not a black-box number; it is built from these consecutive residual changes divided by total squared residuals.

5. Lagged Residual Scatter

Durbin Watson Test lagged residual scatter plot
Current residual plotted against the previous residual.

This is one of the most important plots. If strong positive autocorrelation existed, the points would form a clear upward-sloping pattern. Here the fitted trend is only slightly upward, matching the small lag-1 residual correlation of about 0.0685.

6. Durbin Watson Statistic Across Regression Models

Durbin Watson Test model comparison chart for main simple and background regression models
DW statistics for the main model, G1+G2 model and background model.

All three models produce statistics below 2 but still near 2. The background model has the lowest value, suggesting slightly more positive dependence, but all three remain within the common 1.5–2.5 range.

7. Permutation Null Distribution

Durbin Watson Test permutation null distribution with observed DW statistic
Permutation null distribution showing the observed statistic slightly left of the DW = 2 reference.

The observed statistic is a little left of 2, which is consistent with weak positive autocorrelation. However, it is not extremely far from the null distribution center. The Python permutation two-sided p-value of about 0.0707 agrees with the R two-sided p-value of about 0.0672.

8. Residual Autocorrelation by Lag

Durbin Watson Test residual autocorrelation by lag chart
Residual autocorrelation values from lag 1 to lag 20.

The lag bars are mostly small. A few lags show mild positive values, but the chart does not show a very strong autocorrelation pattern. Since the statistic mainly targets lag-1 autocorrelation, the first bar is especially important and remains small.

How to Run the Durbin Watson Test in R, Python, SPSS and Excel

Durbin Watson Test in R

In R, the easiest method is to fit a linear model and use dwtest() from the lmtest package. You can also manually compute the statistic from residuals.

install.packages("lmtest")
library(lmtest)

student <- read.csv("student-por.csv", sep = ";", stringsAsFactors = FALSE)

student$G1 <- as.numeric(student$G1)
student$G2 <- as.numeric(student$G2)
student$G3 <- as.numeric(student$G3)
student$studytime <- as.numeric(student$studytime)
student$failures <- as.numeric(student$failures)
student$absences <- as.numeric(student$absences)
student$age <- as.numeric(student$age)
student$Medu <- as.numeric(student$Medu)
student$Fedu <- as.numeric(student$Fedu)

model <- lm(G3 ~ G1 + G2 + studytime + failures + absences + age + Medu + Fedu,
            data = student)

dwtest(model, alternative = "two.sided")
dwtest(model, alternative = "greater")

res <- resid(model)
dw_manual <- sum(diff(res)^2) / sum(res^2)
lag1_cor <- cor(res[-length(res)], res[-1])

dw_manual
lag1_cor

Durbin Watson Test in Python

In Python, use statsmodels to fit the regression and calculate the Durbin Watson statistic from the residuals.

import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson

student = pd.read_csv("student-por.csv", sep=";")

cols = ["G1", "G2", "G3", "studytime", "failures", "absences", "age", "Medu", "Fedu"]
for col in cols:
    student[col] = pd.to_numeric(student[col], errors="coerce")

data = student[cols].dropna()

y = data["G3"]
X = data[["G1", "G2", "studytime", "failures", "absences", "age", "Medu", "Fedu"]]
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
residuals = model.resid

dw_statistic = durbin_watson(residuals)
lag1_correlation = residuals[:-1].corr(residuals[1:])

print(model.summary())
print("Durbin Watson statistic:", dw_statistic)
print("Lag-1 residual correlation:", lag1_correlation)

Durbin Watson Test in SPSS

SPSS can report the statistic through linear regression output, but the verified workflow below also saves residuals and calculates the statistic manually.

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF R ANOVA COLLIN
  /DEPENDENT G3
  /METHOD=ENTER G1 G2 studytime failures absences age Medu Fedu
  /SAVE PRED(prd_main) RESID(res_main).

SORT CASES BY caseid(A).
EXECUTE.

COMPUTE lag_res_main = LAG(res_main).
IF ($CASENUM = 1) lag_res_main = $SYSMIS.
COMPUTE res_diff_main = res_main - lag_res_main.
COMPUTE res_diff2_main = res_diff_main ** 2.
COMPUTE res2_main = res_main ** 2.
EXECUTE.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=
  /n_main = N(res_main)
  /sum_diff2_main = SUM(res_diff2_main)
  /sum_res2_main = SUM(res2_main).

COMPUTE dw_main = sum_diff2_main / sum_res2_main.
EXECUTE.

Durbin Watson Test in Excel

Excel does not need a special function for the statistic. You can compute it directly from a residual column.

Excel column Content Example formula
A Observation order 1, 2, 3, …
B Actual G3 Observed final grade
C Fitted G3 Predicted final grade from regression
D Residual =B2-C2
E Residual difference =D3-D2
F Squared residual difference =E3^2
G Squared residual =D2^2
Durbin Watson statistic = SUM(F3:F650) / SUM(G2:G650)

For this dataset, the Excel calculation should return approximately 1.8615 if the same residuals and same observation order are used.

How to Report the Result

A good report should mention the regression model, the statistic, the direction of possible autocorrelation, and the final interpretation. Avoid writing only “DW = 1.86” without explaining what it means.

APA-style report: A Durbin Watson Test was conducted to evaluate first-order autocorrelation in the residuals of a regression model predicting G3 final grade from G1, G2, studytime, failures, absences, age, mother’s education and father’s education. The statistic was d = 1.8615. This value is close to 2 and falls within the 1.5–2.5 rule-of-thumb range, indicating no serious first-order autocorrelation in the residuals.

Plain-language report: The regression residuals do not show a serious pattern from one observation to the next. The statistic is slightly below 2, so the residuals have weak positive dependence, but not enough to suggest a major autocorrelation problem.

When Should You Use This Test?

Use this diagnostic when your regression residuals have a meaningful order. The most common case is time-series regression, where observations are arranged by day, month, year or another time unit. It can also be used when observations follow a production sequence, repeated measurement order, route order or another meaningful sequence.

Situation Use it? Reason
Monthly sales regression Yes Months have a natural order.
Stock return regression Yes, with caution Financial observations are time ordered.
Machine output by production order Yes Production sequence may create correlated errors.
Random cross-sectional survey Usually no Row order may be arbitrary and not meaningful.
Model with lagged dependent variable Use caution Other diagnostics such as Breusch-Godfrey may be more appropriate.

Common Mistakes

1. Using row order without explaining it

The statistic depends on the order of residuals. If the row order has no real meaning, the result can be misleading. In this guide, the row order is clearly described as a demonstration order.

2. Treating 1.5 to 2.5 as a universal law

The 1.5–2.5 range is a practical rule of thumb, not a substitute for formal critical values or a p-value from a proper test implementation.

3. Ignoring visual diagnostics

A statistic alone is not enough. Residual sequence plots, lagged residual scatter plots and ACF charts help explain what the statistic is detecting.

4. Forgetting that the test targets first-order autocorrelation

The statistic mainly focuses on lag-1 residual autocorrelation. If you suspect higher-order serial correlation, use additional tests such as Breusch-Godfrey or Ljung-Box.

5. Confusing autocorrelation with model fit

A model can have a high R Square and still have autocorrelated residuals. Similarly, a model can have weak fit but no residual autocorrelation. These are different diagnostics.

Download SPSS Output and Verification Files

The SPSS PDF verifies the main model import, regression output, saved residuals and manual calculation of the statistic.

FAQs About the Durbin Watson Test

What does the Durbin Watson Test check?

It checks whether consecutive regression residuals are autocorrelated, especially at lag 1.

What does a Durbin Watson statistic of 2 mean?

A value close to 2 usually indicates no serious first-order autocorrelation in the residuals.

What does a value below 2 mean?

A value below 2 suggests possible positive autocorrelation, meaning adjacent residuals may tend to move in the same direction.

What does a value above 2 mean?

A value above 2 suggests possible negative autocorrelation, meaning adjacent residuals may tend to alternate direction.

What was the result in this example?

The main model produced d = 1.8615, so the residuals did not show serious first-order autocorrelation by the 1.5–2.5 rule-of-thumb interpretation.

Can the test be done in R?

Yes. In R, it can be run with the lmtest package using the dwtest() function, or calculated manually from residuals.

Can the test be done in Python?

Yes. In Python, statsmodels provides a durbin_watson function for regression residuals.

Can the test be done in SPSS?

Yes. SPSS can save regression residuals and the statistic can be calculated from those residuals, as shown in this guide.

Can the test be done in Excel?

Yes. If residuals are available, Excel can calculate the statistic using the squared differences of consecutive residuals divided by total squared residuals.

Is the test suitable for cross-sectional data?

It is most meaningful when observations have a real order. For ordinary cross-sectional data, row order may be arbitrary, so the result must be interpreted cautiously.

Advertisement
Google AdSense bottom placement reserved here

Advertisement
Google AdSense Bottom placement reserved here

Need Data Analysis Help?

Send your project details and get ethical tutoring, interpretation or dashboard support.

Request Data Analysis Help

About the author

Online Internet Cafe publishes practical guides for statistics, research methods, data analysis tools and ethical project support.

Related articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Request QuoteWhatsApp