TSE.
MathematicsFinanceHealthPhysicsEngineeringBrowse all

Mathematics · Statistics · Hypothesis Testing

Kruskal-Wallis Test Calculator

Calculates the Kruskal-Wallis H statistic and p-value to test whether multiple independent groups have the same population distribution.

Calculator

Advertisement

Formula

H is the Kruskal-Wallis test statistic. N is the total number of observations across all groups. k is the number of groups. n_i is the number of observations in group i. R_i is the sum of ranks assigned to group i (ranks are assigned across all observations pooled together). Under the null hypothesis of equal distributions, H follows an approximate chi-squared distribution with k − 1 degrees of freedom.

Source: Kruskal, W. H. & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621.

How it works

The Kruskal-Wallis test works by replacing raw data values with their ranks across all groups combined. Rather than comparing means directly — as a one-way ANOVA does — it evaluates whether the rank sums of the groups differ more than would be expected by chance. This makes the test robust to outliers, non-normal distributions, and ordinal scales where the precise numerical differences between values may not be meaningful.

The core formula is H = (12 / (N(N+1))) × Σ(R²ᵢ / nᵢ) − 3(N+1), where N is the total number of observations, k is the number of groups, R_i is the rank sum for group i, and n_i is the number of observations in group i. When the null hypothesis is true (all groups share the same distribution), H follows an approximate chi-squared distribution with k − 1 degrees of freedom. A correction factor is applied when tied ranks are present, which slightly increases H and makes the test more sensitive. The null hypothesis is rejected when H exceeds the critical chi-squared value at the chosen significance level α.

Common applications include comparing patient recovery scores across three treatment arms in clinical trials, evaluating customer satisfaction ratings across product versions, testing whether soil nutrient levels differ across multiple land-use types, or assessing whether salary distributions differ across job categories. When the Kruskal-Wallis test yields a significant result, post-hoc pairwise comparisons (such as Dunn's test) are used to identify which specific groups differ.

Worked example

Suppose a researcher measures pain relief scores (0–10 scale) for three independent treatment groups:

  • Group 1 (Drug A): 2, 4, 3, 5
  • Group 2 (Drug B): 6, 7, 8, 6
  • Group 3 (Placebo): 1, 2, 3, 1

Step 1 — Pool and rank all 12 observations: Sorted values with ranks assigned (ties receive averaged ranks): 1(1.5), 1(1.5), 2(3.5), 2(3.5), 3(5.5), 3(5.5), 4(7), 5(8), 6(9.5), 6(9.5), 7(11), 8(12).

Step 2 — Assign ranks back to groups:
Group 1 ranks: 3.5, 7, 5.5, 8 → R₁ = 24
Group 2 ranks: 9.5, 11, 12, 9.5 → R₂ = 42
Group 3 ranks: 1.5, 3.5, 5.5, 1.5 → R₃ = 12

Step 3 — Compute H:
N = 12, each group has n = 4
H = (12 / (12 × 13)) × [(24²/4) + (42²/4) + (12²/4)] − 3 × 13
H = (12/156) × [144 + 441 + 36] − 39
H = 0.07692 × 621 − 39
H = 47.77 − 39 = 8.77

Step 4 — Compare to critical value: With df = 2 and α = 0.05, the critical χ² value is 5.991. Since 8.77 > 5.991, we reject the null hypothesis and conclude that at least one group has a significantly different distribution of pain relief scores.

Limitations & notes

The Kruskal-Wallis test assumes that observations are independent within and across groups. It is not appropriate for paired or repeated-measures designs — use the Friedman test instead. The chi-squared approximation for the H statistic is less accurate when any group has fewer than 5 observations; exact tables or permutation-based p-values should be used for small samples. While the test is distribution-free, it does assume that all group distributions have the same shape and spread under the null hypothesis — if groups differ only in variance and not in central tendency, interpretation requires care. A significant H statistic only tells you that at least one group differs; it does not identify which groups are different. Post-hoc analyses such as Dunn's test with Bonferroni correction are needed for pairwise comparisons. This calculator supports up to three groups and uses a critical value lookup table rather than an exact p-value computation, so treat results as approximate for very small or unbalanced samples.

Frequently asked questions

When should I use the Kruskal-Wallis test instead of a one-way ANOVA?

Use the Kruskal-Wallis test when your data violate the normality assumption required by ANOVA, when you have ordinal data (e.g., Likert scale responses), or when sample sizes are too small to reliably verify normality. It is also preferred when outliers are present and cannot be removed, since ranks are resistant to extreme values. If normality holds and sample sizes are adequate, one-way ANOVA has greater statistical power.

What does the H statistic represent?

The H statistic measures the degree to which the rank sums of your groups differ from what would be expected if all observations came from the same population. A larger H value indicates greater discrepancy between groups. Under the null hypothesis, H approximately follows a chi-squared distribution with k − 1 degrees of freedom, where k is the number of groups.

How are tied values handled in the Kruskal-Wallis test?

When two or more observations share the same value, they are each assigned the average of the ranks they would have occupied. For example, if values ranked 4th and 5th are tied, both receive rank 4.5. A correction factor can be applied to the H statistic to account for ties; this slightly increases H and generally makes the test more conservative. For small proportions of ties, the effect is negligible.

What does it mean if the Kruskal-Wallis test is significant?

A significant result (H exceeds the critical chi-squared value) means you reject the null hypothesis that all groups come from the same distribution. It indicates that at least one group differs from the others in terms of its central tendency or distribution. To identify which specific pairs of groups differ, you must conduct post-hoc pairwise tests, such as Dunn's test with an appropriate multiple-comparisons correction like Bonferroni.

What is the minimum sample size for the Kruskal-Wallis test?

The chi-squared approximation used for the p-value is generally considered reliable when each group has at least 5 observations. For smaller groups, the approximation may be inaccurate and exact critical values from published tables or a permutation test should be used instead. The overall minimum is at least 2 groups with at least 2 observations each, but such small samples offer very low statistical power.

Last updated: 2025-01-15 · Formula verified against primary sources.