Goodness-of-Fit and Normality Assessment
Cramer von Mises Test is a goodness-of-fit method used to compare an empirical distribution with a theoretical distribution. In this guide, the test is used to check whether G3 final grades from student-por.csv follow a fitted normal distribution. The article explains the formula, W2 statistic, null hypothesis, Monte Carlo p-value, chart interpretation, R workflow, Python workflow, SPSS verification and Excel method.
Google AdSense top placement reserved here
Quick Answer: Cramer von Mises Test Result
A Cramer von Mises Test was conducted to evaluate whether G3 final grades follow a fitted normal distribution. The fitted normal model used mean = 11.9060 and SD = 3.2307. The verified result was W2 = 1.1241, with a Monte Carlo p-value of about 0.00002. Since p < 0.05, the analysis rejects the fitted normal distribution for G3.
Cramer von Mises Test Overview
The Cramer von Mises Test measures how far an empirical cumulative distribution function is from a theoretical cumulative distribution function. In simple terms, it checks whether the observed data curve follows the curve expected under a chosen distribution.
In this post, the chosen theoretical distribution is a normal distribution fitted to G3 final grades. The mean and standard deviation are estimated from the data, then the empirical distribution of G3 is compared with that fitted normal curve. Because the parameters are estimated from the same sample, this workflow uses Monte Carlo simulation to estimate the p-value.
The conclusion is consistent across R, Python and SPSS. R and Python estimate the Monte Carlo p-value, while SPSS manually verifies the W2 calculation from sorted values, fitted normal CDF values, empirical midpoints and squared differences.
What Is the Cramer von Mises Test?
The test is a distribution-fitting method. It is often used for goodness-of-fit analysis, where the analyst asks whether observed data plausibly come from a specified distribution. Compared with a visual histogram alone, the W2 statistic gives a formal numerical summary of distributional mismatch.
In simple language: the test compares the observed cumulative pattern of the data with the cumulative pattern expected from a chosen theoretical distribution.
For G3 final grades, the method shows that the fitted normal distribution is not a good description of the data. The histogram, empirical CDF, Q-Q plot and contribution chart all support the same conclusion.
Cramer von Mises Test Formula
The one-sample W2 statistic can be written as:
W² = 1/(12n) + Σ [F(xᵢ) - (2i - 1)/(2n)]²
Here, n is the sample size, xᵢ is the sorted observation, F(xᵢ) is the fitted theoretical CDF value, and (2i – 1)/(2n) is the empirical midpoint for the sorted observation.
The statistic becomes larger when the fitted distribution and empirical distribution differ more strongly. In this guide, the W2 value for G3 is 1.1241, which is far larger than the simulated 95% critical value under the fitted-normal Monte Carlo workflow.
Cramer von Mises Test Null Hypothesis and Alternative Hypothesis
The hypotheses depend on the chosen theoretical distribution. In this example, the theoretical model is a fitted normal distribution.
| Hypothesis | Meaning | Decision rule |
|---|---|---|
| H0 | G3 final grades follow the fitted normal distribution closely enough. | If p-value is 0.05 or greater, do not reject the fitted distribution. |
| H1 | G3 final grades do not follow the fitted normal distribution. | If p-value is less than 0.05, reject the fitted distribution. |
Because the Monte Carlo p-value is about 0.00002, the analysis rejects the fitted normal distribution. This means the G3 distribution has a statistically meaningful mismatch with the normal model.
Google AdSense middle placement reserved here
Dataset and Variables Used
This example uses the student-por.csv dataset. The verified workflow uses 649 rows and no missing cells in the selected analysis variables. The main outcome is G3, which represents the final grade. The supporting variables are G1, G2, absences and studytime.
| Variable | Role | Meaning |
|---|---|---|
| G3 | Main outcome | Final grade from 0 to 20. |
| G1 | Comparison variable | First-period grade. |
| G2 | Comparison variable | Second-period grade. |
| absences | Comparison variable | Number of school absences. |
| studytime | Group variable | Weekly study time category. |
External data source: UCI Machine Learning Repository: Student Performance dataset.
Verified Cramer von Mises Test Results
The analysis was verified in R, Python and SPSS. R and Python produced the Monte Carlo p-value. SPSS manually reproduced the W2 statistic by sorting the G3 values and calculating the fitted CDF, empirical midpoint and squared-difference contribution for each observation.
Final report sentence: A Cramer von Mises Test was conducted to evaluate whether G3 final grades follow a fitted normal distribution. The fitted normal model used mean = 11.9060 and SD = 3.2307. The result was W2 = 1.1241, Monte Carlo p = 0.00002. Because p < 0.05, the analysis rejected the fitted normal distribution for G3.
Main G3 Result
| Variable | N | Mean | SD | W2 statistic | Monte Carlo p-value | Decision |
|---|---|---|---|---|---|---|
| G3 | 649 | 11.9060 | 3.2307 | 1.1241 | 0.00002 | Reject fitted normal distribution |
Variable Comparison Results
| Variable | N | Mean | SD | W2 statistic | Interpretation |
|---|---|---|---|---|---|
| G1 | 649 | 11.3991 | 2.7453 | 0.7091 | Rejects fitted normality in the R/Python workflow. |
| G2 | 649 | 11.5701 | 2.9136 | 0.8293 | Rejects fitted normality in the R/Python workflow. |
| G3 | 649 | 11.9060 | 3.2307 | 1.1241 | Rejects fitted normality in the main test. |
| absences | 649 | 3.6595 | 4.6408 | 7.1169 | Shows the strongest departure from fitted normality. |
G3 by Studytime Group
| Studytime group | N | Mean G3 | SD | W2 statistic | Meaning |
|---|---|---|---|---|---|
| 1: <2 hours | 212 | 10.8443 | 3.2186 | 0.8288 | Largest group-wise W2 among studytime groups. |
| 2: 2 to 5 hours | 305 | 12.0918 | 3.2431 | 0.5356 | Moderate distributional departure. |
| 3: 5 to 10 hours | 97 | 13.2268 | 2.5021 | 0.1411 | Smaller mismatch than groups 1 and 2. |
| 4: >10 hours | 35 | 13.0571 | 3.0384 | 0.1045 | Smallest W2 among studytime groups. |
Cramer von Mises Test Result Images and Chart Interpretation
1. Histogram with Fitted Normal Curve

This chart gives the first visual clue that the fitted normal distribution is not perfect for G3. The grade values are discrete, concentrated around the middle, and include a visible low-end pile near zero. The smooth normal curve cannot fully capture these features, especially the bounded grade scale and the unusual low scores.
2. Empirical CDF vs Fitted Normal CDF

This is the most direct visual explanation of the Cramer von Mises Test. The step-like empirical CDF represents the observed data, while the smooth curve represents the fitted normal distribution. Where the two lines separate, the test accumulates squared differences. The visible gaps around the lower tail and central grade levels contribute to the large W2 statistic.
3. Squared-Difference Contribution Plot

This chart shows which sorted observations contribute most to W2. Larger spikes mean larger differences between the fitted normal CDF and the empirical midpoint. The repeated grade values create structured peaks because many students share the same integer grade. The largest contribution area appears around the lower-middle portion of the sorted distribution, helping explain why the final W2 value is large.
4. Normal Q-Q Plot

The Q-Q plot compares sample quantiles with theoretical normal quantiles. If G3 were close to normal, the points would follow the reference line more smoothly. Instead, the plot shows clear stair-step behavior because G3 is measured in integer grades. The lower tail also departs strongly because of very low scores. This supports the formal rejection of fitted normality.
5. Monte Carlo Null Distribution

This chart shows how unusual the observed W2 value is under simulated fitted-normal samples. The dashed line marks the simulated 95% critical value, while the observed W2 is far to the right. Because the observed statistic is much larger than almost all simulated values, the p-value is extremely small, about 0.00002.
6. W2 Comparison Across Variables

This chart compares distributional mismatch across four variables. G1, G2 and G3 all depart from fitted normality, but absences is much more non-normal than the grade variables. This makes practical sense because absences are count data, often right-skewed and clustered near low values. The chart helps readers see that the test is useful beyond one variable.
7. G3 W2 by Studytime Group

This chart breaks the G3 normality-style check into studytime groups. The <2 hours group has the largest W2 value, while the >10 hours group has the smallest. This does not mean the highest studytime group has better grades only; it means its distributional shape is closer to its own fitted normal curve than the lower studytime groups in this workflow.
Additional Verification Images
The following “-1” files are additional verification charts from the repeated software workflow. They are included here for completeness. If page speed is a concern, the seven main charts above are enough for most readers, while these repeated visuals can be kept in the media library for documentation.







Cramer von Mises Test in R
In R, the W2 statistic can be computed by sorting the data, calculating fitted normal CDF values, comparing them with empirical midpoints, and summing squared differences.
student <- read.csv("student-por.csv", sep = ";", stringsAsFactors = FALSE)
g3 <- as.numeric(student$G3)
g3 <- g3[!is.na(g3)]
n <- length(g3)
mu <- mean(g3)
sigma <- sd(g3)
x_sorted <- sort(g3)
fitted_cdf <- pnorm(x_sorted, mean = mu, sd = sigma)
empirical_midpoint <- (2 * seq_len(n) - 1) / (2 * n)
w2 <- (1 / (12 * n)) + sum((fitted_cdf - empirical_midpoint)^2)
w2
The verified R output gives W2 = 1.1241 and a Monte Carlo p-value of about 0.00002.
Cramer von Mises Test in Python
Python follows the same logic. The code below shows the core W2 calculation. A simulation loop can then be added to estimate the p-value.
import pandas as pd
import numpy as np
from math import erf, sqrt
student = pd.read_csv("student-por.csv", sep=";")
g3 = pd.to_numeric(student["G3"], errors="coerce").dropna().to_numpy()
n = len(g3)
mu = np.mean(g3)
sigma = np.std(g3, ddof=1)
x_sorted = np.sort(g3)
def normal_cdf(x, mean, sd):
z = (x - mean) / (sd * sqrt(2))
return 0.5 * (1 + np.vectorize(erf)(z))
fitted_cdf = normal_cdf(x_sorted, mu, sigma)
empirical_midpoint = (2 * np.arange(1, n + 1) - 1) / (2 * n)
w2 = (1 / (12 * n)) + np.sum((fitted_cdf - empirical_midpoint) ** 2)
print(w2)
The verified Python result matches R: W2 = 1.1241, Monte Carlo p = 0.00002, so the fitted normal distribution is rejected.
Cramer von Mises Test in SPSS
SPSS does not provide a simple one-click Cramer von Mises normality output in the standard workflow. For this guide, SPSS was used to manually verify the W2 calculation.
SPSS Manual G3 Result
| N | Mean G3 | SD G3 | Sum of squared differences | W2 statistic |
|---|---|---|---|---|
| 649 | 11.906009 | 3.230656 | 1.123946 | 1.124074 |
SPSS Syntax Used
GET DATA
/TYPE=TXT
/FILE='D:\cramer_von_mises_test\student_cvm_spss_clean.csv'
/ENCODING='UTF8'
/DELIMITERS=","
/QUALIFIER='"'
/FIRSTCASE=2
/VARIABLES=
studytime F1.0
G1 F2.0
G2 F2.0
G3 F2.0
absences F3.0.
CACHE.
EXECUTE.
SORT CASES BY G3(A).
COMPUTE sorted_order = $CASENUM.
EXECUTE.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=
/n_total=N(G3)
/mean_G3=MEAN(G3)
/sd_G3=SD(G3).
COMPUTE fitted_normal_cdf = CDF.NORMAL(G3, mean_G3, sd_G3).
COMPUTE empirical_midpoint = ((2 * sorted_order) - 1) / (2 * n_total).
COMPUTE squared_difference = (fitted_normal_cdf - empirical_midpoint) ** 2.
EXECUTE.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=
/sum_squared_difference=SUM(squared_difference).
COMPUTE W2_G3 = (1 / (12 * n_total)) + sum_squared_difference.
EXECUTE.
Download SPSS verification PDF: Cramer von Mises Test SPSS Output PDF.
Cramer von Mises Test in Excel
Excel can reproduce the W2 statistic manually, although R and Python are better for Monte Carlo p-value simulation.
- Put the G3 values in one column.
- Sort the G3 values from smallest to largest.
- Calculate the fitted normal CDF for each sorted value.
- Calculate the empirical midpoint for each sorted observation.
- Subtract the empirical midpoint from the fitted CDF.
- Square each difference.
- Add the squared differences and add 1/(12n).
=NORM.DIST(sorted_value, mean_G3, sd_G3, TRUE)
=(2*row_order-1)/(2*n)
=(fitted_cdf-empirical_midpoint)^2
=1/(12*n)+SUM(squared_differences)
How to Report the Cramer von Mises Test
A strong report should include the variable, fitted distribution, estimated parameters, W2 statistic, p-value method and final decision.
APA-style report: A Cramer von Mises Test was conducted to evaluate whether G3 final grades followed a fitted normal distribution. The fitted normal model used mean = 11.9060 and SD = 3.2307. The result was W2 = 1.1241, Monte Carlo p = 0.00002. Since p < 0.05, the fitted normal distribution was rejected.
Plain-language report: G3 final grades do not closely follow a normal distribution. The empirical distribution differs strongly from the fitted normal curve, especially because the grades are discrete, bounded and include unusual low-end values.
Common Mistakes in Cramer von Mises Test
1. Treating the histogram as the whole test
A histogram is useful, but the W2 statistic formally summarizes the gap between the empirical CDF and theoretical CDF.
2. Forgetting that parameters are estimated
When the mean and standard deviation are estimated from the data, simulation or adjusted methods should be used for a safer p-value.
3. Confusing W2 with variance
W2 is not a variance. It is a goodness-of-fit statistic based on cumulative distribution differences.
4. Ignoring discrete data
G3 grades are integer scores. This creates stair-step patterns in the empirical CDF and Q-Q plot, which affects normality-style analysis.
5. Reporting only p-value
A complete report should include W2, sample size, fitted distribution, estimated parameters, p-value method and interpretation.
Download Cramer von Mises Test Files
The SPSS PDF contains the verified manual output, including import checks, W2 calculation, variable comparison and studytime-group comparison.
Sources and Method Notes
This guide uses verified R, Python and SPSS outputs from the student performance dataset. The following sources support the dataset and software environment.
FAQs About Cramer von Mises Test
What is the Cramer von Mises Test?
It is a goodness-of-fit test that compares an empirical distribution with a theoretical distribution using cumulative distribution differences.
What is the W2 statistic?
W2 is the statistic that summarizes the squared differences between the fitted theoretical CDF and the empirical CDF midpoints.
What was the result in this example?
The result for G3 was W2 = 1.1241 with Monte Carlo p = 0.00002, so the fitted normal distribution was rejected.
Can this test be used for normality?
Yes, it can be used as a goodness-of-fit check against a fitted normal distribution, especially when the workflow accounts for estimated parameters.
Can I run it in R?
Yes. R can compute the W2 statistic directly and use simulation to estimate a p-value.
Can I run it in Python?
Yes. Python can compute the fitted CDF, empirical midpoint, squared differences and Monte Carlo p-value.
Can I run it in SPSS?
SPSS can manually verify the calculation by sorting the data, calculating fitted CDF values, empirical midpoints and squared differences.
Can I run it in Excel?
Yes. Excel can calculate W2 manually, but R or Python is better for simulation-based p-values.
Google AdSense bottom placement reserved here


