Normality Diagnostics and Probability Plots
P-P Plot Normality Check is a visual diagnostic method used to compare empirical cumulative probabilities from a sample with theoretical cumulative probabilities from a fitted normal distribution. This complete guide explains the P-P plot formula, interpretation, R workflow, Python workflow, SPSS output, Excel checking method, charts and verified results from the student-por.csv dataset using G3 final grade as the main outcome variable.
Google AdSense top placement reserved here
Quick Answer: P-P Plot Normality Check Result
A P-P Plot Normality Check was applied to the final grade variable G3 from the student-por.csv dataset. The plot compared empirical cumulative probabilities with fitted normal cumulative probabilities. The main P-P plot followed the diagonal in several middle regions, but visible step-like clusters appeared because the grade variable is discrete and bounded from 0 to 19.
The analysis used 649 valid cases. The verified G3 mean was 11.9060, the standard deviation was 3.2307, and the maximum P-P probability deviation for G3 was approximately 0.1227. Formal tests in Python and SPSS supported the same conclusion: the final grade distribution is not perfectly normal.
Final report sentence: A P-P Plot Normality Check was conducted for G3 final grade. The P-P plot showed partial alignment with the diagonal reference line, but step-like probability clusters and supporting Q-Q, histogram, ECDF and formal normality tests indicated that G3 was not perfectly normally distributed. SPSS reported Shapiro-Wilk W = .926, p < .001, and Python reported Shapiro-Wilk p = 2.41599e-17.
Important reporting note: SPSS may display very small p-values as .000. Do not write p = .000 in the final article, thesis or assignment. Report it as p < .001.
What Is a P-P Plot Normality Check?
A P-P plot, or probability-probability plot, compares two cumulative probability distributions. In a P-P Plot Normality Check, the observed cumulative probabilities from the sample are compared with the cumulative probabilities expected from a fitted normal distribution.
The method is visual. If the sample follows the fitted normal distribution closely, the points should lie near the diagonal reference line. If the points form a systematic curve or repeated departure from the diagonal, the distribution is not well described by the fitted normal model.
In this worked example, the outcome variable is G3 final grade. Since G3 is a discrete score with repeated integer values, the P-P plot naturally forms visible vertical clusters. That clustering is not a software error. It is a feature of the data scale and must be explained during interpretation.
Practical note: A P-P plot should not be treated as a single yes/no normality test. It is best interpreted together with a Q-Q plot, histogram, empirical CDF, Shapiro-Wilk test and Kolmogorov-Smirnov test with Lilliefors correction.
If you are building a full normality and assumption-checking workflow, you may also need related guides such as the Kolmogorov-Smirnov Test, DAgostino Pearson Test, Cramer von Mises Test, Brown-Forsythe Test, Goldfeld-Quandt Test, Durbin-Watson Test and Influence Diagnostics.
P-P Plot Normality Check Formula
The P-P Plot Normality Check is built from cumulative probabilities. The sample values are sorted, empirical cumulative probabilities are calculated, and the fitted normal cumulative probability is computed for each sorted value.
Empirical cumulative probability = (i - 0.5) / n
Theoretical normal cumulative probability = Fnormal(xi; mean, standard deviation)
Here, i is the rank of the sorted value, n is the number of valid observations, and Fnormal is the cumulative distribution function of the fitted normal distribution. For this example, the fitted normal distribution used the sample mean and standard deviation of G3:
G3 n = 649
G3 mean = 11.9060
G3 SD = 3.2307
The plot then places theoretical normal cumulative probability on one axis and empirical cumulative probability on the other axis. Points close to the diagonal suggest closer agreement with the fitted normal distribution. Curved or systematic departures suggest non-normality.
| Step | Calculation | Purpose |
|---|---|---|
| Sort observed values | Arrange G3 from smallest to largest | Creates the ordered sample distribution. |
| Calculate empirical probability | (i – 0.5) / n | Estimates the observed cumulative probability. |
| Fit normal distribution | Use G3 mean and standard deviation | Defines the theoretical normal comparison curve. |
| Calculate theoretical probability | Normal CDF for each sorted G3 value | Shows expected cumulative probability under normality. |
| Create scatter plot | X = theoretical probability, Y = empirical probability | Creates the P-P plot normality diagnostic. |
P-P Plot Normality Check Null Hypothesis and Alternative Hypothesis
The P-P plot itself is a graph, not a formal hypothesis test. However, it is often interpreted beside formal normality tests. For the supporting Shapiro-Wilk and Kolmogorov-Smirnov tests, the hypotheses are:
| Hypothesis | Meaning | Applied to this example |
|---|---|---|
| H0 | The sample follows a normal distribution. | G3 final grades are normally distributed. |
| H1 | The sample does not follow a normal distribution. | G3 final grades are not normally distributed. |
Because the SPSS Shapiro-Wilk test for G3 was significant and Python also produced extremely small p-values, the formal test decision is to reject the null hypothesis of perfect normality. The visual P-P plot should therefore be described as a helpful diagnostic rather than as proof of normality.
Google AdSense middle placement reserved here
Dataset Used for the P-P Plot Normality Check
This worked example uses the student-por.csv student performance dataset. The key grade variables are G1, G2 and G3. G1 is the first period grade, G2 is the second period grade, and G3 is the final grade. The main normality check in this guide focuses on G3.
| Item | Verified value | Explanation |
|---|---|---|
| Dataset file | student-por.csv | Student performance dataset used for the analysis. |
| Rows used | 649 | Total valid observations for G1, G2 and G3. |
| Original columns | 33 | Original variables in the dataset. |
| Main variable | G3 | Final grade used for the main P-P Plot Normality Check. |
| Software used | R, Python, SPSS, Excel | R and Python generated charts; SPSS validated normality tests; Excel method explains manual calculation. |
External dataset source: UCI Machine Learning Repository: Student Performance dataset.
Verified Results in R, Python and SPSS
The P-P Plot Normality Check was reproduced in R, Python and SPSS. R created the main visual diagnostic workflow. Python validated the same values and produced improved publication-style charts. SPSS confirmed the import, descriptives, P-P plots, Q-Q plot, normality tests and residual checks.
Descriptive Statistics for G1, G2 and G3
| Grade variable | N | Minimum | Maximum | Mean | Standard deviation | Short interpretation |
|---|---|---|---|---|---|---|
| G1 | 649 | 0 | 19 | 11.3991 | 2.7453 | First period grade; lowest mean among the three. |
| G2 | 649 | 0 | 19 | 11.5701 | 2.9136 | Second period grade; slightly higher than G1. |
| G3 | 649 | 0 | 19 | 11.9060 | 3.2307 | Final grade; highest mean and largest standard deviation. |
P-P Plot Deviation Summary
| Grade variable | Mean used | SD used | Mean absolute deviation | Maximum absolute deviation | Decision from plot pattern |
|---|---|---|---|---|---|
| G1 | 11.3991 | 2.7453 | 0.0265 | 0.0855 | Smallest maximum probability deviation. |
| G2 | 11.5701 | 2.9136 | 0.0287 | 0.0868 | Slightly larger deviation than G1. |
| G3 | 11.9060 | 3.2307 | 0.0332 | 0.1227 | Largest deviation; selected as the main worked example. |
SPSS Tests of Normality
| Variable | Kolmogorov-Smirnov statistic | Kolmogorov-Smirnov Sig. | Shapiro-Wilk statistic | Shapiro-Wilk Sig. | Interpretation |
|---|---|---|---|---|---|
| G1 | .086 | .000 | .986 | .000 | Reject perfect normality; report p < .001. |
| G2 | .088 | .000 | .962 | .000 | Reject perfect normality; report p < .001. |
| G3 | .124 | .000 | .926 | .000 | Strongest departure among the three grade variables. |
Python Normality Test Validation
| Python test | Value for G3 | Meaning |
|---|---|---|
| Shapiro-Wilk p-value | 2.41599e-17 | Strong evidence against perfect normality. |
| DAgostino-Pearson p-value | 1.58764e-25 | Strong evidence against perfect normality. |
| Jarque-Bera p-value | 1.87631e-62 | Strong evidence against perfect normality. |
P-P Plot Normality Check Charts and Interpretation
1. Main R P-P Plot for G3

The main P-P plot shows that the G3 distribution partly follows the diagonal reference line, especially in the middle probability region. However, the repeated vertical groups show that the grade variable is discrete. This means the plot should be interpreted with caution and supported by formal normality tests.
2. P-P Plot Comparison for G1, G2 and G3

This comparison makes the main result clearer. G1 and G2 have smaller maximum deviations, while G3 has a maximum deviation of about 0.1227. That is why the article uses G3 as the main normality-check example.
3. Q-Q Plot Supporting the P-P Plot

The Q-Q plot shows stronger tail departures than the P-P plot. This is expected because Q-Q plots are more sensitive to tail behavior, while P-P plots often emphasize cumulative probability agreement around the center of the distribution.
4. Histogram and Normal Curve for G3

The histogram confirms that G3 is not a smooth continuous normal variable. It has repeated integer values, a concentration around the middle grades and a visible low-score tail. This supports the formal test conclusion of non-normality.
5. Empirical CDF versus Fitted Normal CDF

The ECDF chart makes the P-P plot logic easier to understand. The step line is the observed cumulative distribution, while the dashed curve is the fitted normal cumulative distribution. Gaps between these lines correspond to departures visible in the P-P plot.
6. Regression Residual P-P Plot

For regression analysis, normality is usually checked on residuals, not only on the raw dependent variable. The residual P-P plot shows a visible curved pattern, meaning residual normality is also not strongly supported in this model.
7. P-P Plot for G3 by School

The school-wise plot shows that the normality pattern differs between school groups. This does not replace a full subgroup statistical analysis, but it is useful for visual diagnosis and interpretation.
8. P-P Plot for G3 by Sex

The sex-wise plot again shows visible step-like probability clusters. This reinforces the main teaching point: P-P plots are valid diagnostic tools, but their shape must be interpreted in the context of the measurement scale.
9. Maximum P-P Probability Deviation Summary

The deviation summary is a compact way to compare the three grade variables. G3 has the largest maximum deviation, which agrees with the visual plots and formal normality-test results.
Python Validation Charts for P-P Plot Normality Check
The Python workflow produced improved publication-style validation charts. These charts are cleaner for article presentation and confirm that the result is not dependent on R alone.




The Python charts are especially useful for WordPress presentation because the titles, subtitles and footers are clearer. The statistical conclusion remains the same as R and SPSS: G3 does not follow a perfect normal distribution.
How to Run a P-P Plot Normality Check in R, Python, SPSS and Excel
P-P Plot Normality Check in R
In R, the workflow is to read the dataset, sort G3, calculate empirical probabilities, calculate fitted normal probabilities and create a scatter plot with a diagonal reference line.
# P-P Plot Normality Check in R
student <- read.csv("student-por.csv", sep = ",", stringsAsFactors = FALSE)
x <- na.omit(as.numeric(student$G3))
x_sorted <- sort(x)
n <- length(x_sorted)
empirical_probability <- ppoints(n)
mean_g3 <- mean(x_sorted)
sd_g3 <- sd(x_sorted)
normal_probability <- pnorm(x_sorted, mean = mean_g3, sd = sd_g3)
pp_data <- data.frame(
value = x_sorted,
empirical_probability = empirical_probability,
normal_probability = normal_probability
)
plot(
pp_data$normal_probability,
pp_data$empirical_probability,
xlab = "Theoretical normal cumulative probability",
ylab = "Empirical cumulative probability",
main = "P-P Plot Normality Check for G3"
)
abline(0, 1, lty = 2)
P-P Plot Normality Check in Python
In Python, SciPy can calculate theoretical normal probabilities and Matplotlib can create the plot.
# P-P Plot Normality Check in Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm, shapiro, normaltest, jarque_bera
student = pd.read_csv("student-por.csv")
x = student["G3"].dropna().astype(float).sort_values().to_numpy()
n = len(x)
empirical_probability = (np.arange(1, n + 1) - 0.5) / n
mean_g3 = np.mean(x)
sd_g3 = np.std(x, ddof=1)
normal_probability = norm.cdf(x, loc=mean_g3, scale=sd_g3)
plt.scatter(normal_probability, empirical_probability)
plt.plot([0, 1], [0, 1], linestyle="--")
plt.xlabel("Theoretical normal cumulative probability")
plt.ylabel("Empirical cumulative probability")
plt.title("P-P Plot Normality Check for G3")
plt.show()
print("Shapiro-Wilk:", shapiro(x))
print("DAgostino-Pearson:", normaltest(x))
print("Jarque-Bera:", jarque_bera(x))
P-P Plot Normality Check in SPSS
SPSS can create P-P plots with the PPLOT command. The EXAMINE command provides descriptives, normality tests, histograms and normal plots.
* P-P Plot Normality Check in SPSS.
GET DATA
/TYPE=TXT
/FILE='D:\low kda score priority basis posts\first post\P P Plot Normality Check\student-por.csv'
/ENCODING='UTF8'
/DELCASE=LINE
/DELIMITERS=","
/QUALIFIER='"'
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/VARIABLES=
school A2
sex A1
age F8.0
address A1
famsize A3
Pstatus A1
Medu F8.0
Fedu F8.0
Mjob A12
Fjob A12
reason A12
guardian A8
traveltime F8.0
studytime F8.0
failures F8.0
schoolsup A3
famsup A3
paid A3
activities A3
nursery A3
higher A3
internet A3
romantic A3
famrel F8.0
freetime F8.0
goout F8.0
Dalc F8.0
Walc F8.0
health F8.0
absences F8.0
G1 F8.0
G2 F8.0
G3 F8.0.
CACHE.
EXECUTE.
PPLOT VARIABLES=G1 G2 G3
/DISTRIBUTION=NORMAL
/FRACTION=BLOM
/TIES=MEAN
/NOSTANDARDIZE
/TYPE=P-P
/PLOT=BOTH
/NOLOG.
EXAMINE VARIABLES=G1 G2 G3
/PLOT BOXPLOT STEMLEAF NPPLOT HISTOGRAM
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
P-P Plot Normality Check in Excel
Excel can create a manual P-P plot by calculating empirical cumulative probabilities and fitted normal cumulative probabilities.
| Excel column | Content | Example formula or action |
|---|---|---|
| A | G3 values | Paste the final grade values. |
| B | Sorted G3 | Sort from smallest to largest. |
| C | Rank | 1, 2, 3, ..., n |
| D | Empirical cumulative probability | =(C2-0.5)/COUNT($B:$B) |
| E | Theoretical normal cumulative probability | =NORM.DIST(B2,AVERAGE($B:$B),STDEV.S($B:$B),TRUE) |
| Chart | Scatter plot | X = column E, Y = column D |
After creating the scatter plot, add a diagonal reference line from 0 to 1. Points near the line indicate stronger agreement with the fitted normal distribution.
How to Report the P-P Plot Normality Check Result
A good report should describe the visual pattern, the reason for any visible clustering, the supporting plots and the formal normality tests. Avoid using the P-P plot alone as proof of normality.
APA-style report: Normality of G3 final grade was assessed using a P-P plot, Q-Q plot, histogram and formal normality tests. The P-P plot showed partial alignment with the diagonal reference line but also visible step-like departures due to the discrete grade scale. The Shapiro-Wilk test was significant, W = .926, p < .001, suggesting that G3 was not normally distributed.
Plain-language report: The final grade data were not perfectly normal. The P-P plot was useful for comparing observed and fitted normal cumulative probabilities, but the Q-Q plot, histogram, ECDF and formal tests showed clear departures from normality.
When Should You Use a P-P Plot Normality Check?
Use a P-P Plot Normality Check when you want a visual comparison between the empirical cumulative distribution of your data and a fitted theoretical distribution, most commonly the normal distribution. It is useful in education research, psychology, social science, regression diagnostics, thesis data analysis and general statistical assumption checking.
| Situation | Use it? | Reason |
|---|---|---|
| Checking whether a continuous variable is approximately normal | Yes | The plot compares observed and theoretical cumulative probabilities. |
| Supporting Shapiro-Wilk or Kolmogorov-Smirnov tests | Yes | The plot adds visual context to formal p-values. |
| Checking regression residuals | Yes | Regression normality is usually assessed on residuals. |
| Detecting tail problems | Use Q-Q plot too | Q-Q plots are better for tail departures and outliers. |
| Proving normality with one graph | No | A P-P plot is diagnostic, not final proof. |
Common Mistakes
1. Treating the P-P plot as a final normality test
A P-P plot is visual evidence. It should be combined with Q-Q plots, histograms, ECDF charts and formal tests.
2. Ignoring the measurement scale
Discrete variables such as grade scores often create step-like patterns. These patterns should be explained instead of being treated as software errors.
3. Reporting p = .000
SPSS often prints very small p-values as .000. In formal writing, report these values as p < .001.
4. Checking only the raw dependent variable in regression
For regression models, normality is usually checked on residuals, not only on the raw dependent variable.
5. Confusing P-P plots with Q-Q plots
P-P plots compare cumulative probabilities. Q-Q plots compare quantiles. They are related, but they are not identical.
Download SPSS Output and Verification Files
The SPSS PDF verifies the clean data import, descriptive statistics, P-P plot output, Q-Q plot output, Explore normality tests, grouped normality checks and regression residual normality check.
External References for P-P Plot Normality Check
This post uses verified R, Python and SPSS outputs together with external software documentation and statistical references. These resources help readers verify the background of normal probability plots, normality testing and distribution diagnostics.
FAQs About P-P Plot Normality Check
What does a P-P Plot Normality Check do?
It compares empirical cumulative probabilities from the sample with theoretical cumulative probabilities from a fitted normal distribution.
When should I use a P-P plot?
Use it when you want to visually check how closely your data follow a fitted distribution, especially a fitted normal distribution.
Is a P-P plot the same as a Q-Q plot?
No. A P-P plot compares cumulative probabilities, while a Q-Q plot compares quantiles. Q-Q plots are usually stronger for tail behavior.
What was the main P-P plot result in this example?
The G3 P-P plot showed partial diagonal alignment but also visible step-like departures. The maximum P-P probability deviation was approximately 0.1227.
Was G3 normally distributed?
No. The visual charts and formal tests showed that G3 was not perfectly normally distributed. SPSS reported Shapiro-Wilk W = .926, p < .001.
Why does the P-P plot show vertical clusters?
G3 is a discrete grade variable. Many students have the same integer grade, so the P-P plot forms vertical groups.
Can this plot be made in SPSS?
Yes. SPSS can create P-P plots with the PPLOT command and normality tests with the EXAMINE command.
Can this plot be made in R?
Yes. R can calculate empirical probabilities with ppoints and theoretical normal probabilities with pnorm.
Can this plot be made in Python?
Yes. Python can calculate normal cumulative probabilities with SciPy and plot the result with Matplotlib.
Can this plot be made in Excel?
Yes. Excel can calculate empirical probabilities and theoretical normal probabilities using NORM.DIST, then create a scatter plot.
Google AdSense bottom placement reserved here


