Regression Diagnostics and Residual Autocorrelation
Durbin Watson Test is a regression diagnostic used to check whether consecutive residuals are autocorrelated. This complete guide explains the Durbin Watson statistic, formula, null hypothesis, 0–4 interpretation, critical-value logic, R workflow, Python workflow, SPSS output, Excel calculation, charts and verified results from the student-por.csv dataset.
Google AdSense top placement reserved here
Quick Answer: Durbin Watson Test Result
A Durbin Watson Test was conducted on residuals from the main G3 regression model. The verified statistic was d = 1.8615. The lag-1 residual correlation was about 0.0685. Since the statistic is close to 2 and remains inside the common 1.5 to 2.5 rule-of-thumb range, the model does not show serious first-order autocorrelation.
Final report sentence: A Durbin Watson Test was used to examine first-order autocorrelation in the residuals of the G3 regression model. The result was d = 1.8615, with lag-1 residual correlation of about 0.0685. Because the statistic is near 2 and falls within the 1.5–2.5 rule-of-thumb range, the residuals do not show serious first-order autocorrelation.
What Is the Durbin Watson Test?
The Durbin Watson Test is used after fitting a regression model. It examines whether the residual from one observation is related to the residual from the previous observation. In time-series regression, this matters because errors may follow a pattern across time. If residuals are positively autocorrelated, a positive error today may be followed by another positive error tomorrow. If residuals are negatively autocorrelated, a positive error may be followed by a negative error.
Most short online explanations stop at the statement “2 means no autocorrelation.” That is not enough for a serious data-analysis post. A better explanation must show the regression model, residuals, formula components, lagged residual plot, residual ACF, software code, and a careful warning about observation order. This guide does that using R, Python and SPSS verification.
Important statistical note: this test is most meaningful when observations have a real order, such as time order, production order, spatial route order or another sequence. The student-por.csv dataset is cross-sectional, so the row order is used here as a reproducible teaching order. That makes this a useful tutorial example, but not proof of a real time-series process.
If you are learning normality and variance-testing workflows alongside regression diagnostics, you may also find these related guides useful: DAgostino Pearson Test for skewness-kurtosis normality checking, Cramer von Mises Test for goodness-of-fit, Cochran C Test for checking the largest variance, and Brown Forsythe Test for robust equality-of-variance analysis.
Durbin Watson Test Formula
The Durbin Watson statistic is calculated from regression residuals. If the residual at observation t is written as et, the statistic is:
d = Σ(e[t] - e[t-1])² / Σe[t]²
The numerator measures how much consecutive residuals change. The denominator measures the total squared residual size. If residuals change randomly around zero, the statistic is usually close to 2. If consecutive residuals are too similar, the numerator becomes smaller and the statistic moves below 2. If consecutive residuals tend to alternate signs, the numerator becomes larger and the statistic moves above 2.
| DW statistic range | General interpretation | Practical meaning |
|---|---|---|
| Near 2 | No clear first-order autocorrelation | Consecutive residuals are not strongly related. |
| Below 2 | Possible positive autocorrelation | Residuals may move in the same direction across adjacent observations. |
| Above 2 | Possible negative autocorrelation | Residuals may alternate direction across adjacent observations. |
| 0 to 4 | Full possible range | 0 is extreme positive serial correlation; 4 is extreme negative serial correlation. |
Rule-of-thumb interpretation
A common practical rule is that values between about 1.5 and 2.5 usually do not indicate a serious first-order autocorrelation problem. This rule is helpful for a quick diagnostic, but formal decisions should consider the number of observations, number of predictors, model structure and critical values.
Critical-value interpretation
The formal table method uses lower and upper bounds, often called dL and dU. For a positive autocorrelation test, if the statistic is below dL, there is evidence of positive autocorrelation. If it is above dU, the test does not reject no autocorrelation. If it falls between dL and dU, the result is inconclusive. For negative autocorrelation, the same logic is applied to 4 − d.
Durbin Watson Test Null Hypothesis and Alternative Hypothesis
| Hypothesis | Meaning | Applied to this example |
|---|---|---|
| H0 | The regression disturbances/residuals have no first-order autocorrelation. | The G3 regression residuals are not seriously autocorrelated. |
| H1 | The regression disturbances/residuals have first-order autocorrelation. | The G3 regression residuals show positive or negative serial dependence. |
Google AdSense middle placement reserved here
Dataset and Regression Model Used
This worked example uses the student-por.csv student performance dataset. The main outcome variable is G3, the final grade. The regression model predicts G3 from earlier grades, study-related variables and family/academic background variables.
| Item | Verified value | Explanation |
|---|---|---|
| Rows | 649 | Total student observations used in the model. |
| Main outcome | G3 | Final grade, ranging from 0 to 20. |
| Main model predictors | G1, G2, studytime, failures, absences, age, Medu, Fedu | Academic, study and background predictors. |
| Order used | Existing row order | Used as a reproducible demonstration order because the dataset is cross-sectional. |
External dataset source: UCI Machine Learning Repository: Student Performance dataset.
Verified Results in R, Python and SPSS
The analysis was reproduced in three environments. R used an OLS model and the lmtest workflow. Python used statsmodels OLS residuals and a permutation check. SPSS imported the clean CSV, saved predicted values and residuals, and manually computed the statistic from consecutive residual differences.
Main Model Result
| Software | DW statistic | Lag-1 residual correlation | p-value / diagnostic result | Interpretation |
|---|---|---|---|---|
| R | 1.8615 | 0.0686 | lmtest two-sided p ≈ 0.0672 | No serious first-order autocorrelation by the 1.5–2.5 rule. |
| Python | 1.861535 | 0.068590 | Permutation two-sided p ≈ 0.070746 | No serious first-order autocorrelation by the 1.5–2.5 rule. |
| SPSS | 1.861535 | 0.068491 | Manual SPSS residual calculation | No serious first-order autocorrelation by the 1.5–2.5 rule. |
SPSS Regression Model Summary
| Model statistic | Value | Meaning |
|---|---|---|
| R | 0.922 | Strong association between fitted and observed G3. |
| R Square | 0.851 | The model explains about 85.1% of G3 variation. |
| Adjusted R Square | 0.849 | Adjusted explanatory power after accounting for predictors. |
| Std. Error of Estimate | 1.256 | Typical prediction error size in grade units. |
| F statistic | 456.111 | Overall model significance test. |
| Model Sig. | .000 | The model is statistically significant overall. |
Main Regression Coefficients from SPSS
| Predictor | B | Std. Error | Beta | t | Sig. | Short interpretation |
|---|---|---|---|---|---|---|
| Constant | -0.501 | 0.774 | — | -0.648 | 0.518 | Not significant. |
| G1 | 0.143 | 0.037 | 0.122 | 3.910 | 0.000 | Earlier grade G1 positively predicts G3. |
| G2 | 0.885 | 0.034 | 0.798 | 25.744 | 0.000 | G2 is the strongest predictor of G3. |
| studytime | 0.097 | 0.062 | 0.025 | 1.556 | 0.120 | Not significant after controlling for other variables. |
| failures | -0.235 | 0.095 | -0.043 | -2.471 | 0.014 | Previous failures negatively predict G3. |
| absences | 0.023 | 0.011 | 0.033 | 2.085 | 0.038 | Small positive coefficient in this controlled model. |
| age | 0.023 | 0.044 | 0.009 | 0.520 | 0.604 | Not significant. |
| Medu | -0.045 | 0.058 | -0.016 | -0.776 | 0.438 | Not significant. |
| Fedu | 0.022 | 0.059 | 0.007 | 0.371 | 0.711 | Not significant. |
Model Comparison
| Model | Predictors | DW statistic | Interpretation |
|---|---|---|---|
| Main model | G1, G2, studytime, failures, absences, age, Medu, Fedu | 1.861535 | Close to 2; no serious first-order autocorrelation. |
| Simple model | G1, G2 | 1.851560 | Still close to 2; no serious first-order autocorrelation. |
| Background model | studytime, failures, absences, age, Medu, Fedu, traveltime, health | 1.807975 | Slightly more positive residual dependence, but still not severe by the rule of thumb. |
Durbin Watson Test Charts and Interpretation
1. Actual vs Fitted G3

This chart shows whether the regression model predicts G3 reasonably well. The points follow the diagonal trend closely, which agrees with the high R Square value of about 0.851. Some low-grade outliers remain, but the model captures most of the final-grade pattern.
2. Regression Residuals by Observation Order

The residuals fluctuate around zero, which is what we want to see. There are some spikes, especially around outlier cases, but the plot does not show a strong smooth trend. This supports the conclusion that there is no serious first-order residual autocorrelation.
3. Consecutive Residual Differences

This chart shows how much each residual changes from the previous residual. Most changes remain near zero, with occasional large jumps. These large jumps contribute to the numerator of the statistic, but they do not create a severe autocorrelation pattern by themselves.
4. Durbin Watson Numerator Contributions

The tallest spikes identify observation positions where adjacent residuals changed sharply. This chart is useful because it opens the formula visually: the statistic is not a black-box number; it is built from these consecutive residual changes divided by total squared residuals.
5. Lagged Residual Scatter

This is one of the most important plots. If strong positive autocorrelation existed, the points would form a clear upward-sloping pattern. Here the fitted trend is only slightly upward, matching the small lag-1 residual correlation of about 0.0685.
6. Durbin Watson Statistic Across Regression Models

All three models produce statistics below 2 but still near 2. The background model has the lowest value, suggesting slightly more positive dependence, but all three remain within the common 1.5–2.5 range.
7. Permutation Null Distribution

The observed statistic is a little left of 2, which is consistent with weak positive autocorrelation. However, it is not extremely far from the null distribution center. The Python permutation two-sided p-value of about 0.0707 agrees with the R two-sided p-value of about 0.0672.
8. Residual Autocorrelation by Lag

The lag bars are mostly small. A few lags show mild positive values, but the chart does not show a very strong autocorrelation pattern. Since the statistic mainly targets lag-1 autocorrelation, the first bar is especially important and remains small.
How to Run the Durbin Watson Test in R, Python, SPSS and Excel
Durbin Watson Test in R
In R, the easiest method is to fit a linear model and use dwtest() from the lmtest package. You can also manually compute the statistic from residuals.
install.packages("lmtest")
library(lmtest)
student <- read.csv("student-por.csv", sep = ";", stringsAsFactors = FALSE)
student$G1 <- as.numeric(student$G1)
student$G2 <- as.numeric(student$G2)
student$G3 <- as.numeric(student$G3)
student$studytime <- as.numeric(student$studytime)
student$failures <- as.numeric(student$failures)
student$absences <- as.numeric(student$absences)
student$age <- as.numeric(student$age)
student$Medu <- as.numeric(student$Medu)
student$Fedu <- as.numeric(student$Fedu)
model <- lm(G3 ~ G1 + G2 + studytime + failures + absences + age + Medu + Fedu,
data = student)
dwtest(model, alternative = "two.sided")
dwtest(model, alternative = "greater")
res <- resid(model)
dw_manual <- sum(diff(res)^2) / sum(res^2)
lag1_cor <- cor(res[-length(res)], res[-1])
dw_manual
lag1_cor
Durbin Watson Test in Python
In Python, use statsmodels to fit the regression and calculate the Durbin Watson statistic from the residuals.
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
student = pd.read_csv("student-por.csv", sep=";")
cols = ["G1", "G2", "G3", "studytime", "failures", "absences", "age", "Medu", "Fedu"]
for col in cols:
student[col] = pd.to_numeric(student[col], errors="coerce")
data = student[cols].dropna()
y = data["G3"]
X = data[["G1", "G2", "studytime", "failures", "absences", "age", "Medu", "Fedu"]]
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
residuals = model.resid
dw_statistic = durbin_watson(residuals)
lag1_correlation = residuals[:-1].corr(residuals[1:])
print(model.summary())
print("Durbin Watson statistic:", dw_statistic)
print("Lag-1 residual correlation:", lag1_correlation)
Durbin Watson Test in SPSS
SPSS can report the statistic through linear regression output, but the verified workflow below also saves residuals and calculates the statistic manually.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF R ANOVA COLLIN
/DEPENDENT G3
/METHOD=ENTER G1 G2 studytime failures absences age Medu Fedu
/SAVE PRED(prd_main) RESID(res_main).
SORT CASES BY caseid(A).
EXECUTE.
COMPUTE lag_res_main = LAG(res_main).
IF ($CASENUM = 1) lag_res_main = $SYSMIS.
COMPUTE res_diff_main = res_main - lag_res_main.
COMPUTE res_diff2_main = res_diff_main ** 2.
COMPUTE res2_main = res_main ** 2.
EXECUTE.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=
/n_main = N(res_main)
/sum_diff2_main = SUM(res_diff2_main)
/sum_res2_main = SUM(res2_main).
COMPUTE dw_main = sum_diff2_main / sum_res2_main.
EXECUTE.
Durbin Watson Test in Excel
Excel does not need a special function for the statistic. You can compute it directly from a residual column.
| Excel column | Content | Example formula |
|---|---|---|
| A | Observation order | 1, 2, 3, … |
| B | Actual G3 | Observed final grade |
| C | Fitted G3 | Predicted final grade from regression |
| D | Residual | =B2-C2 |
| E | Residual difference | =D3-D2 |
| F | Squared residual difference | =E3^2 |
| G | Squared residual | =D2^2 |
Durbin Watson statistic = SUM(F3:F650) / SUM(G2:G650)
For this dataset, the Excel calculation should return approximately 1.8615 if the same residuals and same observation order are used.
How to Report the Result
A good report should mention the regression model, the statistic, the direction of possible autocorrelation, and the final interpretation. Avoid writing only “DW = 1.86” without explaining what it means.
APA-style report: A Durbin Watson Test was conducted to evaluate first-order autocorrelation in the residuals of a regression model predicting G3 final grade from G1, G2, studytime, failures, absences, age, mother’s education and father’s education. The statistic was d = 1.8615. This value is close to 2 and falls within the 1.5–2.5 rule-of-thumb range, indicating no serious first-order autocorrelation in the residuals.
Plain-language report: The regression residuals do not show a serious pattern from one observation to the next. The statistic is slightly below 2, so the residuals have weak positive dependence, but not enough to suggest a major autocorrelation problem.
When Should You Use This Test?
Use this diagnostic when your regression residuals have a meaningful order. The most common case is time-series regression, where observations are arranged by day, month, year or another time unit. It can also be used when observations follow a production sequence, repeated measurement order, route order or another meaningful sequence.
| Situation | Use it? | Reason |
|---|---|---|
| Monthly sales regression | Yes | Months have a natural order. |
| Stock return regression | Yes, with caution | Financial observations are time ordered. |
| Machine output by production order | Yes | Production sequence may create correlated errors. |
| Random cross-sectional survey | Usually no | Row order may be arbitrary and not meaningful. |
| Model with lagged dependent variable | Use caution | Other diagnostics such as Breusch-Godfrey may be more appropriate. |
Common Mistakes
1. Using row order without explaining it
The statistic depends on the order of residuals. If the row order has no real meaning, the result can be misleading. In this guide, the row order is clearly described as a demonstration order.
2. Treating 1.5 to 2.5 as a universal law
The 1.5–2.5 range is a practical rule of thumb, not a substitute for formal critical values or a p-value from a proper test implementation.
3. Ignoring visual diagnostics
A statistic alone is not enough. Residual sequence plots, lagged residual scatter plots and ACF charts help explain what the statistic is detecting.
4. Forgetting that the test targets first-order autocorrelation
The statistic mainly focuses on lag-1 residual autocorrelation. If you suspect higher-order serial correlation, use additional tests such as Breusch-Godfrey or Ljung-Box.
5. Confusing autocorrelation with model fit
A model can have a high R Square and still have autocorrelated residuals. Similarly, a model can have weak fit but no residual autocorrelation. These are different diagnostics.
Download SPSS Output and Verification Files
The SPSS PDF verifies the main model import, regression output, saved residuals and manual calculation of the statistic.
FAQs About the Durbin Watson Test
What does the Durbin Watson Test check?
It checks whether consecutive regression residuals are autocorrelated, especially at lag 1.
What does a Durbin Watson statistic of 2 mean?
A value close to 2 usually indicates no serious first-order autocorrelation in the residuals.
What does a value below 2 mean?
A value below 2 suggests possible positive autocorrelation, meaning adjacent residuals may tend to move in the same direction.
What does a value above 2 mean?
A value above 2 suggests possible negative autocorrelation, meaning adjacent residuals may tend to alternate direction.
What was the result in this example?
The main model produced d = 1.8615, so the residuals did not show serious first-order autocorrelation by the 1.5–2.5 rule-of-thumb interpretation.
Can the test be done in R?
Yes. In R, it can be run with the lmtest package using the dwtest() function, or calculated manually from residuals.
Can the test be done in Python?
Yes. In Python, statsmodels provides a durbin_watson function for regression residuals.
Can the test be done in SPSS?
Yes. SPSS can save regression residuals and the statistic can be calculated from those residuals, as shown in this guide.
Can the test be done in Excel?
Yes. If residuals are available, Excel can calculate the statistic using the squared differences of consecutive residuals divided by total squared residuals.
Is the test suitable for cross-sectional data?
It is most meaningful when observations have a real order. For ordinary cross-sectional data, row order may be arbitrary, so the result must be interpreted cautiously.
Google AdSense bottom placement reserved here


