Mathematics · Statistics · Hypothesis Testing
Effect Size Calculator
Calculates Cohen's d effect size to quantify the practical significance of the difference between two group means.
Calculator
Formula
d is Cohen's d effect size; \mu_1 and \mu_2 are the means of Group 1 and Group 2; s_{\text{pooled}} is the pooled standard deviation; n_1 and n_2 are the sample sizes of each group; s_1 and s_2 are the standard deviations of each group. A positive d indicates Group 1 has a higher mean than Group 2.
Source: Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
How it works
When comparing two groups — such as a treatment group and a control group — a statistically significant p-value tells you that the observed difference is unlikely to be due to chance. However, it says nothing about whether that difference is meaningful in practice. A study with a very large sample can detect tiny, trivial differences as statistically significant. Cohen's d solves this problem by expressing the mean difference in units of the pooled standard deviation, producing a dimensionless measure that can be compared across studies regardless of the original measurement scale.
The formula for Cohen's d is the difference between the two group means divided by the pooled standard deviation. The pooled standard deviation is a weighted average of the two sample standard deviations, accounting for potentially unequal sample sizes using the formula: s_pooled = sqrt[((n1 - 1)s1² + (n2 - 1)s2²) / (n1 + n2 - 2)]. This pooling gives more weight to the group with the larger sample, producing a more stable and representative estimate of the population standard deviation. The resulting d value can be positive or negative, depending on which group mean is larger, though the absolute value is often used when directionality is not the focus.
Jacob Cohen (1988) established widely used benchmarks for interpreting d: a value of 0.2 is considered a small effect, 0.5 a medium effect, and 0.8 or above a large effect. These thresholds are guidelines, not rigid rules — a 'small' effect in medicine (such as a drug reducing mortality by a fraction of a standard deviation) can be enormously consequential, while a 'large' effect in a low-stakes educational intervention may still be unimportant. Effect size is also essential for statistical power analysis and meta-analysis, where results across multiple studies are synthesized using a common metric.
Worked example
Suppose a researcher tests whether a mindfulness program reduces anxiety scores. Group 1 (treatment) has a mean of 52.4, a standard deviation of 8.2, and a sample size of 30. Group 2 (control) has a mean of 45.1, a standard deviation of 7.9, and a sample size of 30.
Step 1 — Compute the pooled standard deviation:
s_pooled = sqrt[((30 - 1) × 8.2² + (30 - 1) × 7.9²) / (30 + 30 - 2)]
= sqrt[(29 × 67.24 + 29 × 62.41) / 58]
= sqrt[(1949.96 + 1809.89) / 58]
= sqrt[3759.85 / 58]
= sqrt[64.825]
≈ 8.051
Step 2 — Compute Cohen's d:
d = (52.4 − 45.1) / 8.051 = 7.3 / 8.051 ≈ 0.907
Interpretation: A Cohen's d of approximately 0.91 is a large effect by Cohen's conventional benchmarks (d ≥ 0.8). The treatment group's anxiety scores are nearly one full pooled standard deviation higher than the control group. This suggests the mindfulness intervention has a practically significant impact on anxiety, complementing any p-value significance test performed on the same data.
Limitations & notes
Cohen's d assumes that both groups are drawn from populations with approximately normal distributions and roughly equal variances (homogeneity of variance). While it is moderately robust to mild violations, heavily skewed data or extreme outliers can distort the pooled standard deviation and produce a misleading d value — in such cases, a rank-based or robust effect size measure may be preferable. The formula implemented here uses the pooled standard deviation and is appropriate for independent samples; it should not be used for paired or repeated-measures designs, which require a different effect size formula (e.g., d_z using the standard deviation of difference scores). Additionally, Cohen's d is sensitive to sample size in small samples — Hedges' g applies a small-sample correction factor and is preferred when n1 or n2 is less than 20. Finally, effect size benchmarks (small = 0.2, medium = 0.5, large = 0.8) are domain-specific heuristics and should be interpreted in the context of the specific field, prior literature, and the practical consequences of the phenomenon being studied.
Frequently asked questions
What is a good Cohen's d effect size?
Cohen's conventional benchmarks classify d = 0.2 as small, d = 0.5 as medium, and d = 0.8 as large. However, what counts as 'good' depends on the discipline — a d of 0.3 may be highly meaningful in a medical context where the outcome is life or death, while in psychology it might be considered modest.
What is the difference between Cohen's d and Hedges' g?
Both measure standardized mean difference, but Hedges' g applies a correction factor (J) for small sample sizes, making it less biased when n1 or n2 is below 20. For large samples, the two measures are nearly identical. Hedges' g is preferred in meta-analyses to reduce systematic overestimation of effect size in small studies.
Can Cohen's d be negative?
Yes. Cohen's d is negative when the mean of Group 2 is larger than the mean of Group 1. The sign simply indicates the direction of the effect. For interpreting magnitude alone, use the absolute value of d. When reporting results, always specify which group had the higher mean so the sign is interpretable.
How does effect size relate to statistical significance (p-value)?
A p-value tests whether an effect is distinguishable from zero given sampling variability, while effect size measures how large the effect actually is. A large sample can yield a very small p-value for a trivially small effect, and a small sample can fail to detect a large, real effect. Both metrics should be reported together for a complete picture of experimental results.
When should I use Cohen's d vs. other effect size measures?
Cohen's d is appropriate for comparing the means of two independent groups on a continuous outcome. Use eta-squared (η²) or omega-squared (ω²) for ANOVA designs with three or more groups, Pearson's r or R² for correlation and regression, and odds ratio or risk ratio for binary outcomes in clinical or epidemiological research.
Last updated: 2025-01-15 · Formula verified against primary sources.