Mathematics · Statistics · Inferential Statistics
Chi-Square Test Calculator
Calculate the chi-square test statistic and p-value for goodness-of-fit or independence tests using observed and expected frequencies.
Calculator
Formula
\chi^2 is the chi-square test statistic; O_i is the observed frequency for category i; E_i is the expected frequency for category i; k is the number of categories. Degrees of freedom df = k - 1 for goodness-of-fit tests.
Source: Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175.
How it works
The chi-square statistic measures the overall discrepancy between observed and expected counts. For each category, the squared difference between observed (O) and expected (E) frequencies is divided by the expected frequency, and these values are summed across all categories. A larger χ² value indicates greater deviation from what was expected under the null hypothesis.
Degrees of freedom equal k − 1 for a goodness-of-fit test with k categories. The resulting χ² value is compared against the chi-square distribution with those degrees of freedom to compute a p-value. A p-value below 0.05 typically leads to rejection of the null hypothesis, indicating the observed distribution is unlikely to have occurred by chance.
Worked example
Suppose a die is rolled 100 times and you want to test if it is fair. Each face should appear 100/6 ≈ 16.67 times. Observed counts are: 1→14, 2→18, 3→20, 4→15, 5→17, 6→16.
Calculate each term: (14−16.67)²/16.67 = 0.427, (18−16.67)²/16.67 = 0.107, (20−16.67)²/16.67 = 0.667, (15−16.67)²/16.67 = 0.167, (17−16.67)²/16.67 = 0.007, (16−16.67)²/16.67 = 0.027. Summing gives χ² = 1.400 with df = 5. The corresponding p-value ≈ 0.924, so we fail to reject the null hypothesis — the die appears fair.
Limitations & notes
The chi-square test requires that expected frequencies are sufficiently large — a common rule of thumb is that all expected values should be at least 5. When expected counts are small, Fisher's exact test or combining categories may be more appropriate. This calculator handles up to 5 categories; for larger tables or two-way independence tests, a full contingency table tool should be used. The p-value approximation uses the incomplete gamma function and may have minor numerical errors for extreme values.
Frequently asked questions
What p-value is considered statistically significant?
A p-value below 0.05 is the conventional threshold for statistical significance, meaning there is less than a 5% chance the observed deviation occurred by random chance. Some fields use stricter thresholds like 0.01 or 0.001.
What is the difference between goodness-of-fit and test of independence?
A goodness-of-fit test compares one categorical variable's observed frequencies against a theoretical distribution. A test of independence examines whether two categorical variables in a contingency table are related.
Why must expected frequencies be at least 5?
The chi-square distribution is a continuous approximation to a discrete distribution, and this approximation breaks down when expected counts are very small. Small expected values can inflate the χ² statistic, leading to misleading p-values.
Can I use this calculator for a 2×2 contingency table?
You can approximate it by entering the four cell frequencies as four categories with expected values derived from marginal totals, but degrees of freedom must be set to 1. A dedicated contingency table calculator with Yates' continuity correction is more appropriate.
How are degrees of freedom determined?
For a goodness-of-fit test, df = k − 1, where k is the number of categories. For a two-way independence test, df = (rows − 1) × (columns − 1).
Last updated: 2025-01-15 · Formula verified against primary sources.