Normality Testing and Distribution Diagnostics
Lilliefors Test is a corrected Kolmogorov-Smirnov normality test used when the mean and standard deviation of the normal distribution are estimated from the same sample. This complete guide explains the Lilliefors statistic, null hypothesis, p-value interpretation, R workflow, Python workflow, SPSS output, Excel checking method, charts and verified results from the student-por.csv dataset.
Google AdSense top placement reserved here
Quick Answer: Lilliefors Test Result
A Lilliefors Test was applied to G3 final grade from the UCI Student Performance dataset. The same normality conclusion was reproduced in R, Python and SPSS. The full dataset contained 649 valid cases, and the overall result rejected the null hypothesis of normality.
Final report sentence: A Lilliefors normality test showed that G3 final grades were not normally distributed. R reported D = 0.12352 with a p-value displayed as approximately zero. Python reported D = 0.12352, p = 0.001. SPSS reported Kolmogorov-Smirnov Statistic = .124, df = 649, Sig. = .000, with the footnote Lilliefors Significance Correction. Therefore, the null hypothesis of normality was rejected.
Important reporting note: SPSS may display very small p-values as .000. Do not write p = .000 in the final article, report or thesis. Report it as p < .001.
Table of Contents
- What Is the Lilliefors Test?
- Lilliefors Test Formula
- Lilliefors Test Null Hypothesis and Alternative Hypothesis
- Dataset and Normality Model Used
- Verified Results in SPSS, R and Python
- Group-Specific Lilliefors Test Results
- Lilliefors Test Charts and Interpretation
- Python Validation Charts for Lilliefors Test
- How to Run the Lilliefors Test in SPSS, R, Python and Excel
- How to Report the Lilliefors Test Result
- Related Data Analysis Guides
- When Should You Use the Lilliefors Test?
- Common Mistakes
- Download SPSS Output and Verification Files
- External References for Lilliefors Test
- FAQs About Lilliefors Test
What Is the Lilliefors Test?
The Lilliefors Test is a normality test used when the population mean and population standard deviation are unknown. In real data analysis, this is the normal situation. Researchers usually estimate the mean and standard deviation from the same sample they are testing. The ordinary one-sample Kolmogorov-Smirnov test assumes that the reference distribution is already fully specified, but the Lilliefors Test corrects the normality test for estimated parameters.
The test is also called the Kolmogorov-Lilliefors test or the Lilliefors corrected Kolmogorov-Smirnov normality test. It compares the empirical cumulative distribution of the sample with the cumulative distribution of a normal curve fitted using the sample mean and sample standard deviation.
In this post, the Lilliefors Test is applied to G3 final grade. G3 is a bounded integer grade variable. It ranges from 0 to 19 in this sample. Because it is bounded, discrete and affected by low-end values, it is not surprising that the overall distribution fails a strict normality test.
Practical note: In SPSS, the result usually appears in the Tests of Normality table as the Kolmogorov-Smirnov row with the footnote Lilliefors Significance Correction. That footnote is the key evidence that the SPSS normality output is being interpreted as a Lilliefors-corrected test.
If you are building a full data-analysis workflow, compare this guide with the Kolmogorov-Smirnov Test, D’Agostino-Pearson Test, Cramer-von Mises Test, Greenhouse-Geisser Correction, Hartley F Max Test, Brown-Forsythe Test and Influence Diagnostics.
Lilliefors Test Formula
The Lilliefors Test statistic is the largest absolute distance between the empirical cumulative distribution function and the fitted normal cumulative distribution function. The simplified formula is:
D = max | F_n(x) - Φ((x - x̄) / s) |
In this formula, Fn(x) is the empirical cumulative distribution function, x̄ is the sample mean, s is the sample standard deviation and Φ is the standard normal cumulative distribution function. The larger the maximum distance, the stronger the evidence that the sample does not follow a normal distribution.
How to Calculate the Kolmogorov-Lilliefors Test Statistic Manually
The manual calculation method is useful for understanding the test, although software should be used for the final p-value. To calculate the Kolmogorov-Lilliefors test statistic manually, sort the sample values, calculate the sample mean and sample standard deviation, compute the empirical CDF, compute the fitted normal CDF and find the largest absolute difference.
| Step | Manual calculation action | Meaning in the G3 example |
|---|---|---|
| 1 | Sort all G3 final grades from lowest to highest. | This creates the order needed for the empirical cumulative distribution. |
| 2 | Calculate the sample mean and sample standard deviation. | The fitted normal distribution is based on the observed G3 mean and SD. |
| 3 | Calculate the empirical CDF for each score. | This shows the actual cumulative pattern of the G3 grades. |
| 4 | Calculate the fitted normal CDF for the same score. | This shows what the fitted normal curve would predict. |
| 5 | Find the largest absolute difference. | This largest distance is the Lilliefors D statistic. |
Important: The D statistic can be understood manually, but the p-value should come from Lilliefors-specific critical values, tables, simulation or software. Ordinary Kolmogorov-Smirnov p-values are not appropriate when the normal mean and standard deviation are estimated from the same sample.
Lilliefors Test Null Hypothesis and Alternative Hypothesis
The Lilliefors Test is a formal normality test. It does not test whether the mean is high or low. It tests whether the shape of the sample distribution is consistent with a fitted normal distribution.
| Hypothesis | Meaning | Applied to this example |
|---|---|---|
| H0 | The sample comes from a normal distribution. | G3 final grades follow a fitted normal distribution. |
| H1 | The sample does not come from a normal distribution. | G3 final grades depart from normality. |
| Decision rule | If p < .05, reject the null hypothesis. | The overall p-value is below .05, so normality is rejected. |
Because the overall Lilliefors p-value is below .05 in R, Python and SPSS, the null hypothesis is rejected. The overall distribution of G3 final grade is not normal.
Google AdSense middle placement reserved here
Dataset and Normality Model Used
This worked example uses the student-por.csv Student Performance dataset from the UCI Machine Learning Repository. The tested variable is G3 final grade. Group-specific normality was also checked for school, sex, age group, studytime group, failures group, absences group, schoolsup and romantic.
| Item | Verified value | Explanation |
|---|---|---|
| Rows used | 649 | Total valid observations with usable G3 final grade values. |
| Tested variable | G3 | Final grade variable tested for normality. |
| Mean G3 | 11.9060 | Average final grade in the dataset. |
| Standard deviation | 3.2307 | Spread of final grades around the mean. |
| Observed range | 0 to 19 | The grade scale is bounded and discrete. |
| Main method | Lilliefors normality test | Tests fitted normality when mean and SD are estimated from the sample. |
External dataset source: UCI Machine Learning Repository: Student Performance dataset.
Overall Descriptive Statistics for G3
| Statistic | Verified value | Interpretation |
|---|---|---|
| N | 649 | Large enough for normality tests to detect moderate departures. |
| Mean | 11.9060 | Average final grade is around 12. |
| Median | 12 | Median is close to the mean, but that alone does not prove normality. |
| Standard deviation | 3.2307 | Final grades have moderate spread. |
| Variance | 10.4371 | Variance is used to understand dispersion. |
| Skewness | Approximately -0.91 | The distribution has a lower-tail pattern. |
| Kurtosis | Approximately 5.7 | The distribution has stronger tail behaviour than a perfect normal curve. |
| Minimum and maximum | 0 and 19 | Shows the bounded grade scale. |
Verified Results in SPSS, R and Python
The analysis was reproduced in R, Python and SPSS. R used the nortest package. Python used statsmodels.stats.diagnostic.lilliefors. SPSS reported the result through the Explore procedure in the Tests of Normality table.
Overall Lilliefors Test Result
| Software | Method | Test statistic | p-value / Sig. | Decision | Short interpretation |
|---|---|---|---|---|---|
| R | nortest::lillie.test() |
D = 0.12352 | Approximately 0 | Reject normality | G3 final grade is not normally distributed. |
| Python | statsmodels.stats.diagnostic.lilliefors() |
D = 0.12352 | 0.001 | Reject normality | Python confirms the same substantive result as R. |
| SPSS | Explore → Tests of Normality | Kolmogorov-Smirnov = .124 | Sig. = .000 | Reject normality | SPSS reports Lilliefors Significance Correction. |
The overall conclusion is stable across all three tools. The p-values are displayed slightly differently because each software uses its own reporting precision or approximation method. The interpretation does not change. In all cases, p < .05, so the normality assumption is rejected for overall G3 final grade.
Final result: The G3 final grade variable is statistically different from a fitted normal distribution. The histogram, Q-Q plot and empirical CDF chart show that the departure is mainly caused by the bounded grade scale, integer scoring pattern and lower-tail observations including zero scores.
Group-Specific Lilliefors Test Results
The group-specific results help answer a more practical research question: does G3 look normal inside important student categories? This matters because many analyses compare grades by school, sex, age group, study time, previous failures or support status.
| Grouping variable | Group | N | Mean G3 | SD | D statistic | p-value | Decision |
|---|---|---|---|---|---|---|---|
| school | Gabriel Pereira School | 423 | 12.5768 | 2.6256 | 0.09236 | < .001 | Reject normality |
| school | Mousinho da Silveira School | 226 | 10.6504 | 3.8340 | 0.14734 | < .001 | Reject normality |
| sex | Female | 383 | 12.2533 | 3.1241 | 0.10483 | < .001 | Reject normality |
| sex | Male | 266 | 11.4060 | 3.3207 | 0.14803 | < .001 | Reject normality |
| age group | 16 or younger | 289 | 12.0381 | 2.5998 | 0.10955 | < .001 | Reject normality |
| age group | 17 years | 179 | 12.2682 | 3.1490 | 0.09217 | about .001 | Reject normality |
| age group | 18 years | 140 | 11.7714 | 4.1541 | 0.14204 | < .001 | Reject normality |
| age group | 19 or older | 41 | 9.8537 | 3.3508 | 0.27750 | < .001 | Reject normality |
| studytime group | <2 hours | 212 | 10.8443 | 3.2186 | 0.16068 | < .001 | Reject normality |
| studytime group | 2 to 5 hours | 305 | 12.0918 | 3.2431 | 0.12504 | < .001 | Reject normality |
| studytime group | 5 to 10 hours | 97 | 13.2268 | 2.5021 | 0.10306 | about .016 | Reject normality |
| studytime group | >10 hours | 35 | 13.0571 | 3.0384 | 0.13607 | about .10 | Normality not rejected |
| failures group | No previous failures | 549 | 12.5100 | 2.8288 | 0.09456 | < .001 | Reject normality |
| failures group | One or more previous failures | 100 | 8.5900 | 3.3001 | 0.23905 | < .001 | Reject normality |
| absences group | No absences | 244 | 12.0410 | 4.0722 | 0.16877 | < .001 | Reject normality |
| absences group | 1 to 5 absences | 234 | 12.0556 | 2.5409 | 0.13117 | < .001 | Reject normality |
| absences group | 6 to 15 absences | 150 | 11.6133 | 2.6717 | 0.15745 | < .001 | Reject normality |
| absences group | 16 or more absences | 21 | 10.7619 | 2.4063 | 0.19566 | about .036 | Reject normality |
| schoolsup | No school support | 581 | 11.9793 | 3.3160 | 0.11694 | < .001 | Reject normality |
| schoolsup | Receives school support | 68 | 11.2794 | 2.3041 | 0.17170 | < .001 | Reject normality |
| romantic | Not in romantic relationship | 410 | 12.1293 | 3.0037 | 0.10749 | < .001 | Reject normality |
| romantic | In romantic relationship | 239 | 11.5230 | 3.5608 | 0.14196 | < .001 | Reject normality |
The group-specific results show that non-normality is not limited to one subgroup. It appears across school groups, sex groups, age groups, failure groups, absence groups, school support groups and romantic relationship groups. The main exception is the >10 hours studytime group, where normality was not rejected. However, this group has only 35 cases, so the result should be interpreted carefully.
Lilliefors Test Charts and Interpretation
1. Overall Histogram with Fitted Normal Curve

The histogram shows that most G3 scores are concentrated around the middle of the grade scale, especially between about 9 and 15. However, the distribution is not a smooth bell-shaped curve. The lower tail includes very low values and zero scores, while the upper side is naturally limited by the maximum grade scale. This explains why the fitted normal curve does not match the observed distribution perfectly.
This chart supports the Lilliefors decision. The test rejects normality because the empirical distribution differs from the fitted normal distribution. The visual problem is not only a small irregularity. It is connected with the bounded and discrete nature of the grade variable.
2. Overall Q-Q Plot

The Q-Q plot compares observed G3 scores with theoretical normal quantiles. The central points follow the line more closely than the tails, but the lower tail bends away because of the zero and very low final-grade values. The upper tail is also restricted because G3 cannot increase without limit. This visual pattern is consistent with the significant Lilliefors Test result.
3. Empirical CDF vs Fitted Normal CDF

This chart is directly connected to the Lilliefors statistic. The test statistic D is the largest distance between the empirical CDF and the fitted normal CDF. The step-like empirical curve appears because G3 is an integer grade. The visible gaps between the two curves explain why the D statistic is large enough to reject normality.
4. Group-Specific Lilliefors p-value Chart

The group-specific p-value chart is the best summary of subgroup normality. Most groups fall below the 0.05 decision line, so normality is rejected for most student categories. The main visible exception is the >10 hours studytime group, where the p-value is above 0.05.
5. G3 Distribution by School

The school histogram shows that Gabriel Pereira School has a higher mean G3 and tighter spread, while Mousinho da Silveira School has a lower mean and wider spread. Both groups reject normality, but Mousinho da Silveira School has a larger D statistic, which matches its more uneven distribution.
6. G3 Distribution by Sex

The sex-based histograms show non-normality for both female and male students. Female students have a slightly higher mean G3, while male students show a slightly lower mean and wider spread. Both groups still show the same bounded-score and lower-tail features seen in the full dataset.
7. G3 Distribution by Age Group

The age-group histogram shows that younger students cluster more strongly around mid-range and higher scores, while the older group has a lower mean and more irregular pattern. The 19 or older group has only 41 observations, but it shows the largest D statistic among age groups.
8. G3 Distribution by Study Time Group

The study time histogram is the most important subgroup chart. Students studying more than 10 hours are the only group where normality is not rejected. The lower study-time groups show stronger departures from normality, while the highest study-time group has a more balanced shape and fewer extreme low scores.
9. G3 Distribution by Previous Failures

Students with no previous failures have a much higher mean G3 than students with one or more previous failures. The previous-failures group has a lower center, wider relative irregularity and a strong non-normal pattern. Its D statistic is one of the largest group-specific values.
10. G3 Distribution by Absences Group

The absences histogram shows that the no-absence group includes a broad spread and lower-end values. The 1 to 5 and 6 to 15 absence groups are more centrally clustered but still reject normality. The high-absence group has only 21 cases, and its p-value is close to the 0.05 threshold, but it still rejects normality.
11. G3 Distribution by School Support

The school support groups are highly unbalanced. Most students do not receive school support, while the support group is much smaller. Both groups reject normality. The support group has a lower mean and smaller standard deviation, but the distribution still differs from a fitted normal curve.
12. G3 Distribution by Romantic Relationship Status

Both romantic relationship categories reject normality. Students not in a romantic relationship have a slightly higher mean G3, while students in a romantic relationship have a wider spread. The chart supports the test result because both groups show non-normal score patterns.
Python Validation Charts for Lilliefors Test
The Python workflow produced matching validation charts. These charts are useful because they confirm that the conclusion is not dependent on one software program only.
Python Overall Histogram with Fitted Normal Curve

The Python histogram confirms the same central clustering and lower-tail departure seen in the R chart. The fitted normal curve does not fully match the observed grade distribution.
Python Overall Q-Q Plot

The Python Q-Q plot confirms the same tail departures. The central grades are closer to the normal line, but the lower and upper tails depart from the expected normal pattern.
Python Empirical CDF vs Fitted Normal CDF

The Python ECDF chart reproduces the gap between the empirical distribution and the fitted normal distribution. This supports the same Lilliefors D statistic and rejection decision.
Python Group-Specific Lilliefors p-value Chart

The Python p-value chart confirms the R result. Almost all groups fall below the 0.05 threshold. The main exception remains the >10 hours study time group.
Python Q-Q Plot by School

The Python school Q-Q plot verifies that both school groups deviate from the normal line, especially in the lower tail.
Python Q-Q Plot by Sex

The Python sex Q-Q plot confirms that both female and male groups show lower-tail departure from normality.
Python Q-Q Plot by Study Time Group

The Python study-time Q-Q plot supports the group-specific exception. The >10 hours group follows the normal line more closely than the lower study-time groups.
How to Run the Lilliefors Test in SPSS, R, Python and Excel
Lilliefors Test in SPSS
SPSS reports the Lilliefors-corrected normality result through the Explore procedure. In the output, use the Tests of Normality table and read the Kolmogorov-Smirnov row with the Lilliefors Significance Correction footnote.
* Lilliefors normality test in SPSS through Explore.
EXAMINE VARIABLES=g3
/PLOT BOXPLOT HISTOGRAM NPPLOT
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Lilliefors Test in R
In R, the simplest method is the lillie.test() function from the nortest package.
install.packages("nortest")
library(nortest)
# Overall Lilliefors Test for G3
lillie.test(student_data$g3)
# Group-specific example
by(student_data$g3, student_data$school, lillie.test)
Lilliefors Test in Python
In Python, use the lilliefors() function from statsmodels.stats.diagnostic. This is more appropriate than treating a fitted normal distribution as if it were fully specified in an ordinary Kolmogorov-Smirnov test.
from statsmodels.stats.diagnostic import lilliefors
statistic_d, p_value = lilliefors(
df["g3"],
dist="norm",
pvalmethod="table"
)
print("D statistic:", statistic_d)
print("p-value:", p_value)
Lilliefors Test in Excel
Excel can help explain the manual calculation, but it is not the best tool for final Lilliefors p-values unless a verified add-in or critical-value table is used. Excel can calculate the empirical CDF, fitted normal CDF and maximum absolute difference.
| Excel column | Content | Example formula idea |
|---|---|---|
| A | Sorted G3 values | Sort final grades from smallest to largest. |
| B | Empirical CDF | =ROW(A2)-1 / n after adjusting row reference. |
| C | Fitted normal CDF | =NORM.DIST(A2, mean, sd, TRUE) |
| D | Absolute difference | =ABS(B2-C2) |
| E | D statistic | =MAX(D:D) |
Mean:
=AVERAGE(A2:A650)
Standard deviation:
=STDEV.S(A2:A650)
Fitted normal CDF:
=NORM.DIST(A2,$H$2,$H$3,TRUE)
Absolute difference:
=ABS(B2-C2)
Lilliefors D statistic:
=MAX(D2:D650)
How to Report the Lilliefors Test Result
A good report should mention the test name, the tested variable, the test statistic, sample size, p-value and final normality decision. It should also include a chart-based explanation if the normality result is important for the analysis.
APA-style report: A Lilliefors-corrected Kolmogorov-Smirnov normality test indicated that G3 final grades were not normally distributed, D(649) = .124, p < .001. Therefore, the null hypothesis of normality was rejected.
Plain-language report: The final-grade scores did not follow a normal distribution. The histogram and Q-Q plot showed that the grade distribution was affected by a bounded score scale and lower-tail values, including zero scores.
When Should You Use the Lilliefors Test?
Use the Lilliefors Test when you want to test whether a variable follows a normal distribution and the normal mean and standard deviation are estimated from the sample. This is common in education, psychology, medicine, social science, business analytics and general statistical reporting.
| Situation | Use it? | Reason |
|---|---|---|
| Testing normality when mean and SD are unknown | Yes | This is the main use case for the Lilliefors Test. |
| Checking raw grade distribution | Yes, with charts | Useful for understanding distribution shape before analysis. |
| Fully specified theoretical distribution | No, use ordinary K-S if parameters are known | Lilliefors is designed for estimated parameters. |
| Regression residual checking | Sometimes | Use it on residuals if normality of residuals matters. |
| Only two paired measurements | Not specifically | Choose a test based on the research question and model assumptions. |
Common Mistakes
1. Treating SPSS Sig. = .000 as p = 0
SPSS often displays very small p-values as .000. In final reporting, write p < .001, not p = .000.
2. Ignoring the Lilliefors correction footnote
In SPSS, the table may say Kolmogorov-Smirnov, but the footnote Lilliefors Significance Correction changes how the output should be interpreted.
3. Using charts only after the test
Charts are not optional decoration. The histogram, Q-Q plot and empirical CDF explain why the Lilliefors Test rejects normality.
4. Assuming non-normality automatically destroys the analysis
A significant normality test does not automatically make every parametric method invalid. The correct decision depends on the model, sample size, residuals, robustness and research purpose.
5. Forgetting that G3 is a bounded integer score
G3 is not an unlimited continuous variable. It is a grade score with a lower and upper bound. Such variables often fail strict normality tests, especially when the sample size is large.
Download SPSS Output and Verification Files
The uploaded PDF verifies the Lilliefors Test workflow and supporting output. It can be used as a supporting analysis file for this post.
External References for Lilliefors Test and Normality Testing
This post uses verified R, Python and SPSS outputs together with external statistical references and software documentation. These references help readers verify the background of normality testing, Kolmogorov-Smirnov logic and Lilliefors correction.
- UCI Machine Learning Repository: Student Performance dataset
- R Documentation: nortest::lillie.test
- statsmodels Documentation: lilliefors
- SciPy Documentation: scipy.stats.kstest
- SciPy Documentation: scipy.stats.normaltest
- NIST/SEMATECH: Normal Probability Plot
- NIST/SEMATECH: Kolmogorov-Smirnov Goodness-of-Fit Test
- IBM SPSS Statistics documentation
FAQs About Lilliefors Test
What does the Lilliefors Test do?
It tests whether a sample comes from a normal distribution when the mean and standard deviation are estimated from the same sample.
When should I use the Lilliefors Test?
Use it when you want to test normality but do not know the population mean and standard deviation in advance.
What is the Lilliefors Test statistic?
The statistic is the largest absolute distance between the empirical cumulative distribution function and the fitted normal cumulative distribution function.
What was the Lilliefors result in this example?
The overall result rejected normality. R and Python gave D = 0.12352, and SPSS reported Kolmogorov-Smirnov Statistic = .124 with Lilliefors Significance Correction.
Which group did not reject normality?
The studytime group of students studying more than 10 hours did not reject normality. Its p-value was approximately .10.
Is the Lilliefors Test the same as the Kolmogorov-Smirnov Test?
No. It is related to the Kolmogorov-Smirnov test, but it corrects the normality test for the case where mean and standard deviation are estimated from the sample.
Can this test be done in SPSS?
Yes. SPSS reports it in the Explore procedure under Tests of Normality as Kolmogorov-Smirnov with Lilliefors Significance Correction.
Can this test be done in R?
Yes. The R package nortest provides the lillie.test() function.
Can this test be done in Python?
Yes. Python users can run it with statsmodels.stats.diagnostic.lilliefors().
Can this test be done in Excel?
Excel can calculate the D statistic manually using empirical CDF and fitted normal CDF columns, but software such as R, Python or SPSS is better for reliable p-values.
Google AdSense bottom placement reserved here


