Normality and Assumption Tests

P-P Plot Normality Check: Complete R, Python, SPSS and Excel Improvement Guide 1

P-P Plot Normality Check guide with empirical cumulative probabilities, fitted normal probabilities, G3 final grade results, R, Python, SPSS and Excel workflow
Advertisement
Google AdSense Top placement reserved here

Normality Diagnostics and Probability Plots

P-P Plot Normality Check is a visual diagnostic method used to compare empirical cumulative probabilities from a sample with theoretical cumulative probabilities from a fitted normal distribution. This complete guide explains the P-P plot formula, interpretation, R workflow, Python workflow, SPSS output, Excel checking method, charts and verified results from the student-por.csv dataset using G3 final grade as the main outcome variable.

Advertisement
Google AdSense top placement reserved here

Quick Answer: P-P Plot Normality Check Result

A P-P Plot Normality Check was applied to the final grade variable G3 from the student-por.csv dataset. The plot compared empirical cumulative probabilities with fitted normal cumulative probabilities. The main P-P plot followed the diagonal in several middle regions, but visible step-like clusters appeared because the grade variable is discrete and bounded from 0 to 19.

The analysis used 649 valid cases. The verified G3 mean was 11.9060, the standard deviation was 3.2307, and the maximum P-P probability deviation for G3 was approximately 0.1227. Formal tests in Python and SPSS supported the same conclusion: the final grade distribution is not perfectly normal.

Main variableG3
Sample size649
G3 mean11.9060
Max P-P deviation0.1227

Final report sentence: A P-P Plot Normality Check was conducted for G3 final grade. The P-P plot showed partial alignment with the diagonal reference line, but step-like probability clusters and supporting Q-Q, histogram, ECDF and formal normality tests indicated that G3 was not perfectly normally distributed. SPSS reported Shapiro-Wilk W = .926, p < .001, and Python reported Shapiro-Wilk p = 2.41599e-17.

Important reporting note: SPSS may display very small p-values as .000. Do not write p = .000 in the final article, thesis or assignment. Report it as p < .001.

What Is a P-P Plot Normality Check?

A P-P plot, or probability-probability plot, compares two cumulative probability distributions. In a P-P Plot Normality Check, the observed cumulative probabilities from the sample are compared with the cumulative probabilities expected from a fitted normal distribution.

The method is visual. If the sample follows the fitted normal distribution closely, the points should lie near the diagonal reference line. If the points form a systematic curve or repeated departure from the diagonal, the distribution is not well described by the fitted normal model.

In this worked example, the outcome variable is G3 final grade. Since G3 is a discrete score with repeated integer values, the P-P plot naturally forms visible vertical clusters. That clustering is not a software error. It is a feature of the data scale and must be explained during interpretation.

Practical note: A P-P plot should not be treated as a single yes/no normality test. It is best interpreted together with a Q-Q plot, histogram, empirical CDF, Shapiro-Wilk test and Kolmogorov-Smirnov test with Lilliefors correction.

If you are building a full normality and assumption-checking workflow, you may also need related guides such as the Kolmogorov-Smirnov Test, DAgostino Pearson Test, Cramer von Mises Test, Brown-Forsythe Test, Goldfeld-Quandt Test, Durbin-Watson Test and Influence Diagnostics.

P-P Plot Normality Check Formula

The P-P Plot Normality Check is built from cumulative probabilities. The sample values are sorted, empirical cumulative probabilities are calculated, and the fitted normal cumulative probability is computed for each sorted value.

Empirical cumulative probability = (i - 0.5) / n

Theoretical normal cumulative probability = Fnormal(xi; mean, standard deviation)

Here, i is the rank of the sorted value, n is the number of valid observations, and Fnormal is the cumulative distribution function of the fitted normal distribution. For this example, the fitted normal distribution used the sample mean and standard deviation of G3:

G3 n    = 649
G3 mean = 11.9060
G3 SD   = 3.2307

The plot then places theoretical normal cumulative probability on one axis and empirical cumulative probability on the other axis. Points close to the diagonal suggest closer agreement with the fitted normal distribution. Curved or systematic departures suggest non-normality.

Step Calculation Purpose
Sort observed values Arrange G3 from smallest to largest Creates the ordered sample distribution.
Calculate empirical probability (i – 0.5) / n Estimates the observed cumulative probability.
Fit normal distribution Use G3 mean and standard deviation Defines the theoretical normal comparison curve.
Calculate theoretical probability Normal CDF for each sorted G3 value Shows expected cumulative probability under normality.
Create scatter plot X = theoretical probability, Y = empirical probability Creates the P-P plot normality diagnostic.

P-P Plot Normality Check Null Hypothesis and Alternative Hypothesis

The P-P plot itself is a graph, not a formal hypothesis test. However, it is often interpreted beside formal normality tests. For the supporting Shapiro-Wilk and Kolmogorov-Smirnov tests, the hypotheses are:

Hypothesis Meaning Applied to this example
H0 The sample follows a normal distribution. G3 final grades are normally distributed.
H1 The sample does not follow a normal distribution. G3 final grades are not normally distributed.

Because the SPSS Shapiro-Wilk test for G3 was significant and Python also produced extremely small p-values, the formal test decision is to reject the null hypothesis of perfect normality. The visual P-P plot should therefore be described as a helpful diagnostic rather than as proof of normality.

Advertisement
Google AdSense middle placement reserved here

Dataset Used for the P-P Plot Normality Check

This worked example uses the student-por.csv student performance dataset. The key grade variables are G1, G2 and G3. G1 is the first period grade, G2 is the second period grade, and G3 is the final grade. The main normality check in this guide focuses on G3.

Item Verified value Explanation
Dataset file student-por.csv Student performance dataset used for the analysis.
Rows used 649 Total valid observations for G1, G2 and G3.
Original columns 33 Original variables in the dataset.
Main variable G3 Final grade used for the main P-P Plot Normality Check.
Software used R, Python, SPSS, Excel R and Python generated charts; SPSS validated normality tests; Excel method explains manual calculation.

External dataset source: UCI Machine Learning Repository: Student Performance dataset.

Verified Results in R, Python and SPSS

The P-P Plot Normality Check was reproduced in R, Python and SPSS. R created the main visual diagnostic workflow. Python validated the same values and produced improved publication-style charts. SPSS confirmed the import, descriptives, P-P plots, Q-Q plot, normality tests and residual checks.

Descriptive Statistics for G1, G2 and G3

Grade variable N Minimum Maximum Mean Standard deviation Short interpretation
G1 649 0 19 11.3991 2.7453 First period grade; lowest mean among the three.
G2 649 0 19 11.5701 2.9136 Second period grade; slightly higher than G1.
G3 649 0 19 11.9060 3.2307 Final grade; highest mean and largest standard deviation.

P-P Plot Deviation Summary

Grade variable Mean used SD used Mean absolute deviation Maximum absolute deviation Decision from plot pattern
G1 11.3991 2.7453 0.0265 0.0855 Smallest maximum probability deviation.
G2 11.5701 2.9136 0.0287 0.0868 Slightly larger deviation than G1.
G3 11.9060 3.2307 0.0332 0.1227 Largest deviation; selected as the main worked example.

SPSS Tests of Normality

Variable Kolmogorov-Smirnov statistic Kolmogorov-Smirnov Sig. Shapiro-Wilk statistic Shapiro-Wilk Sig. Interpretation
G1 .086 .000 .986 .000 Reject perfect normality; report p < .001.
G2 .088 .000 .962 .000 Reject perfect normality; report p < .001.
G3 .124 .000 .926 .000 Strongest departure among the three grade variables.

Python Normality Test Validation

Python test Value for G3 Meaning
Shapiro-Wilk p-value 2.41599e-17 Strong evidence against perfect normality.
DAgostino-Pearson p-value 1.58764e-25 Strong evidence against perfect normality.
Jarque-Bera p-value 1.87631e-62 Strong evidence against perfect normality.

P-P Plot Normality Check Charts and Interpretation

1. Main R P-P Plot for G3

P-P Plot Normality Check for G3 final grade in R
Main R P-P plot for final grade G3. The plot compares empirical cumulative probabilities with fitted normal cumulative probabilities.

The main P-P plot shows that the G3 distribution partly follows the diagonal reference line, especially in the middle probability region. However, the repeated vertical groups show that the grade variable is discrete. This means the plot should be interpreted with caution and supported by formal normality tests.

2. P-P Plot Comparison for G1, G2 and G3

P-P Plot comparison for G1 G2 and G3 grade variables
R comparison of P-P plots for G1, G2 and G3. G3 shows the largest maximum probability deviation.

This comparison makes the main result clearer. G1 and G2 have smaller maximum deviations, while G3 has a maximum deviation of about 0.1227. That is why the article uses G3 as the main normality-check example.

3. Q-Q Plot Supporting the P-P Plot

Q-Q plot supporting P-P Plot Normality Check for G3
Q-Q plot for G3. Q-Q plots are especially useful for checking tail behavior and extreme departures.

The Q-Q plot shows stronger tail departures than the P-P plot. This is expected because Q-Q plots are more sensitive to tail behavior, while P-P plots often emphasize cumulative probability agreement around the center of the distribution.

4. Histogram and Normal Curve for G3

Histogram and fitted normal curve for G3 final grade
Histogram, density curve and fitted normal curve for G3. The distribution has a strong center and a visible low-score tail.

The histogram confirms that G3 is not a smooth continuous normal variable. It has repeated integer values, a concentration around the middle grades and a visible low-score tail. This supports the formal test conclusion of non-normality.

5. Empirical CDF versus Fitted Normal CDF

Empirical CDF versus fitted normal CDF for G3
ECDF versus fitted normal CDF for G3. This chart shows the same cumulative probability idea behind the P-P plot.

The ECDF chart makes the P-P plot logic easier to understand. The step line is the observed cumulative distribution, while the dashed curve is the fitted normal cumulative distribution. Gaps between these lines correspond to departures visible in the P-P plot.

6. Regression Residual P-P Plot

P-P Plot Normality Check for regression residuals
P-P plot for regression residuals from a model predicting G3 using G1, G2, studytime, absences and failures.

For regression analysis, normality is usually checked on residuals, not only on the raw dependent variable. The residual P-P plot shows a visible curved pattern, meaning residual normality is also not strongly supported in this model.

7. P-P Plot for G3 by School

P-P Plot Normality Check for G3 by school
Grouped P-P plot for G3 by school. Grouped plots help identify whether the normality pattern changes across subgroups.

The school-wise plot shows that the normality pattern differs between school groups. This does not replace a full subgroup statistical analysis, but it is useful for visual diagnosis and interpretation.

8. P-P Plot for G3 by Sex

P-P Plot Normality Check for G3 by sex
Grouped P-P plot for G3 by sex. Both groups show step-like patterns because the outcome is a discrete grade variable.

The sex-wise plot again shows visible step-like probability clusters. This reinforces the main teaching point: P-P plots are valid diagnostic tools, but their shape must be interpreted in the context of the measurement scale.

9. Maximum P-P Probability Deviation Summary

Maximum P-P probability deviation summary for G1 G2 and G3
Maximum P-P plot probability deviation for G1, G2 and G3. Larger values mean stronger separation between empirical and fitted-normal probabilities.

The deviation summary is a compact way to compare the three grade variables. G3 has the largest maximum deviation, which agrees with the visual plots and formal normality-test results.

Python Validation Charts for P-P Plot Normality Check

The Python workflow produced improved publication-style validation charts. These charts are cleaner for article presentation and confirm that the result is not dependent on R alone.

Python P-P Plot Normality Check for final grade G3
Improved Python P-P plot for final grade G3.
Python P-P Plot comparison for G1 G2 and G3
Python comparison of P-P plots for G1, G2 and G3.
Python Q-Q plot supporting P-P Plot Normality Check for G3
Python Q-Q plot supporting the normality interpretation for G3.
Python histogram and normal curve for G3
Python histogram, fitted normal curve and kernel density for G3.

The Python charts are especially useful for WordPress presentation because the titles, subtitles and footers are clearer. The statistical conclusion remains the same as R and SPSS: G3 does not follow a perfect normal distribution.

How to Run a P-P Plot Normality Check in R, Python, SPSS and Excel

P-P Plot Normality Check in R

In R, the workflow is to read the dataset, sort G3, calculate empirical probabilities, calculate fitted normal probabilities and create a scatter plot with a diagonal reference line.

# P-P Plot Normality Check in R

student <- read.csv("student-por.csv", sep = ",", stringsAsFactors = FALSE)

x <- na.omit(as.numeric(student$G3))
x_sorted <- sort(x)

n <- length(x_sorted)
empirical_probability <- ppoints(n)

mean_g3 <- mean(x_sorted)
sd_g3 <- sd(x_sorted)

normal_probability <- pnorm(x_sorted, mean = mean_g3, sd = sd_g3)

pp_data <- data.frame(
  value = x_sorted,
  empirical_probability = empirical_probability,
  normal_probability = normal_probability
)

plot(
  pp_data$normal_probability,
  pp_data$empirical_probability,
  xlab = "Theoretical normal cumulative probability",
  ylab = "Empirical cumulative probability",
  main = "P-P Plot Normality Check for G3"
)

abline(0, 1, lty = 2)

P-P Plot Normality Check in Python

In Python, SciPy can calculate theoretical normal probabilities and Matplotlib can create the plot.

# P-P Plot Normality Check in Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm, shapiro, normaltest, jarque_bera

student = pd.read_csv("student-por.csv")

x = student["G3"].dropna().astype(float).sort_values().to_numpy()
n = len(x)

empirical_probability = (np.arange(1, n + 1) - 0.5) / n
mean_g3 = np.mean(x)
sd_g3 = np.std(x, ddof=1)

normal_probability = norm.cdf(x, loc=mean_g3, scale=sd_g3)

plt.scatter(normal_probability, empirical_probability)
plt.plot([0, 1], [0, 1], linestyle="--")
plt.xlabel("Theoretical normal cumulative probability")
plt.ylabel("Empirical cumulative probability")
plt.title("P-P Plot Normality Check for G3")
plt.show()

print("Shapiro-Wilk:", shapiro(x))
print("DAgostino-Pearson:", normaltest(x))
print("Jarque-Bera:", jarque_bera(x))

P-P Plot Normality Check in SPSS

SPSS can create P-P plots with the PPLOT command. The EXAMINE command provides descriptives, normality tests, histograms and normal plots.

* P-P Plot Normality Check in SPSS.

GET DATA
  /TYPE=TXT
  /FILE='D:\low kda score priority basis posts\first post\P P Plot Normality Check\student-por.csv'
  /ENCODING='UTF8'
  /DELCASE=LINE
  /DELIMITERS=","
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /VARIABLES=
    school A2
    sex A1
    age F8.0
    address A1
    famsize A3
    Pstatus A1
    Medu F8.0
    Fedu F8.0
    Mjob A12
    Fjob A12
    reason A12
    guardian A8
    traveltime F8.0
    studytime F8.0
    failures F8.0
    schoolsup A3
    famsup A3
    paid A3
    activities A3
    nursery A3
    higher A3
    internet A3
    romantic A3
    famrel F8.0
    freetime F8.0
    goout F8.0
    Dalc F8.0
    Walc F8.0
    health F8.0
    absences F8.0
    G1 F8.0
    G2 F8.0
    G3 F8.0.

CACHE.
EXECUTE.

PPLOT VARIABLES=G1 G2 G3
  /DISTRIBUTION=NORMAL
  /FRACTION=BLOM
  /TIES=MEAN
  /NOSTANDARDIZE
  /TYPE=P-P
  /PLOT=BOTH
  /NOLOG.

EXAMINE VARIABLES=G1 G2 G3
  /PLOT BOXPLOT STEMLEAF NPPLOT HISTOGRAM
  /COMPARE GROUPS
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.

P-P Plot Normality Check in Excel

Excel can create a manual P-P plot by calculating empirical cumulative probabilities and fitted normal cumulative probabilities.

Excel column Content Example formula or action
A G3 values Paste the final grade values.
B Sorted G3 Sort from smallest to largest.
C Rank 1, 2, 3, ..., n
D Empirical cumulative probability =(C2-0.5)/COUNT($B:$B)
E Theoretical normal cumulative probability =NORM.DIST(B2,AVERAGE($B:$B),STDEV.S($B:$B),TRUE)
Chart Scatter plot X = column E, Y = column D

After creating the scatter plot, add a diagonal reference line from 0 to 1. Points near the line indicate stronger agreement with the fitted normal distribution.

How to Report the P-P Plot Normality Check Result

A good report should describe the visual pattern, the reason for any visible clustering, the supporting plots and the formal normality tests. Avoid using the P-P plot alone as proof of normality.

APA-style report: Normality of G3 final grade was assessed using a P-P plot, Q-Q plot, histogram and formal normality tests. The P-P plot showed partial alignment with the diagonal reference line but also visible step-like departures due to the discrete grade scale. The Shapiro-Wilk test was significant, W = .926, p < .001, suggesting that G3 was not normally distributed.

Plain-language report: The final grade data were not perfectly normal. The P-P plot was useful for comparing observed and fitted normal cumulative probabilities, but the Q-Q plot, histogram, ECDF and formal tests showed clear departures from normality.

When Should You Use a P-P Plot Normality Check?

Use a P-P Plot Normality Check when you want a visual comparison between the empirical cumulative distribution of your data and a fitted theoretical distribution, most commonly the normal distribution. It is useful in education research, psychology, social science, regression diagnostics, thesis data analysis and general statistical assumption checking.

Situation Use it? Reason
Checking whether a continuous variable is approximately normal Yes The plot compares observed and theoretical cumulative probabilities.
Supporting Shapiro-Wilk or Kolmogorov-Smirnov tests Yes The plot adds visual context to formal p-values.
Checking regression residuals Yes Regression normality is usually assessed on residuals.
Detecting tail problems Use Q-Q plot too Q-Q plots are better for tail departures and outliers.
Proving normality with one graph No A P-P plot is diagnostic, not final proof.

Common Mistakes

1. Treating the P-P plot as a final normality test

A P-P plot is visual evidence. It should be combined with Q-Q plots, histograms, ECDF charts and formal tests.

2. Ignoring the measurement scale

Discrete variables such as grade scores often create step-like patterns. These patterns should be explained instead of being treated as software errors.

3. Reporting p = .000

SPSS often prints very small p-values as .000. In formal writing, report these values as p < .001.

4. Checking only the raw dependent variable in regression

For regression models, normality is usually checked on residuals, not only on the raw dependent variable.

5. Confusing P-P plots with Q-Q plots

P-P plots compare cumulative probabilities. Q-Q plots compare quantiles. They are related, but they are not identical.

Download SPSS Output and Verification Files

The SPSS PDF verifies the clean data import, descriptive statistics, P-P plot output, Q-Q plot output, Explore normality tests, grouped normality checks and regression residual normality check.

External References for P-P Plot Normality Check

This post uses verified R, Python and SPSS outputs together with external software documentation and statistical references. These resources help readers verify the background of normal probability plots, normality testing and distribution diagnostics.

FAQs About P-P Plot Normality Check

What does a P-P Plot Normality Check do?

It compares empirical cumulative probabilities from the sample with theoretical cumulative probabilities from a fitted normal distribution.

When should I use a P-P plot?

Use it when you want to visually check how closely your data follow a fitted distribution, especially a fitted normal distribution.

Is a P-P plot the same as a Q-Q plot?

No. A P-P plot compares cumulative probabilities, while a Q-Q plot compares quantiles. Q-Q plots are usually stronger for tail behavior.

What was the main P-P plot result in this example?

The G3 P-P plot showed partial diagonal alignment but also visible step-like departures. The maximum P-P probability deviation was approximately 0.1227.

Was G3 normally distributed?

No. The visual charts and formal tests showed that G3 was not perfectly normally distributed. SPSS reported Shapiro-Wilk W = .926, p < .001.

Why does the P-P plot show vertical clusters?

G3 is a discrete grade variable. Many students have the same integer grade, so the P-P plot forms vertical groups.

Can this plot be made in SPSS?

Yes. SPSS can create P-P plots with the PPLOT command and normality tests with the EXAMINE command.

Can this plot be made in R?

Yes. R can calculate empirical probabilities with ppoints and theoretical normal probabilities with pnorm.

Can this plot be made in Python?

Yes. Python can calculate normal cumulative probabilities with SciPy and plot the result with Matplotlib.

Can this plot be made in Excel?

Yes. Excel can calculate empirical probabilities and theoretical normal probabilities using NORM.DIST, then create a scatter plot.

Advertisement
Google AdSense bottom placement reserved here

Advertisement
Google AdSense Bottom placement reserved here

Need Data Analysis Help?

Send your project details and get ethical tutoring, interpretation or dashboard support.

Request Data Analysis Help

About the author

Online Internet Cafe publishes practical guides for statistics, research methods, data analysis tools and ethical project support.

Related articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Request QuoteWhatsApp