Getting Started with the Kruskal-Wallis Test

One of the most well-known statistical tests to analyze the differences between means of given groups is the ANOVA (analysis of variance) test. While ANOVA is a great tool, it assumes that the data in question follows a normal distribution. What if your data doesn’t follow a normal distribution or if your sample size is too small to determine a normal distribution? That’s where the Kruskal-Wallis test comes in.

The Kruskal-Wallis test can be thought of as the non-parametric equivalent to ANOVA. This test determines if independent groups have the same mean on ranks; instead of using the data values themselves, a rank is assigned to each data point and those ranks are used to determine if the data in each group originates from the same distribution. Essentially this test determines if the groups have the same median.

As mentioned above, Kruskal-Wallis is a non-parametric test, meaning it makes no assumptions about the data’s parameters such as its mean, variance, etc. Because it makes no assumptions about the data’s parameters, it is unable to make an assumption about the distribution of the data; this is how Kruskal-Wallis does not assume normally distributed data.

Kruskal-Wallis is typically used with three or more independent groups, but can be used with just two, and each group should have a sample size of 5 or more. To perform a Kruskal-Wallis test, we use the ranks of the data to calculate the test statistic, H, given by

\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i}-3(N+1)\]

where N is the total sample size, k is the number of groups we are comparing, \(R_i\) is the sum of ranks for group i, and \(n_i\) is the sample size of group i.

We then compare H to a critical cutoff point determined by the chi-square distribution (chi-square is used because it is a good approximation of H, especially if each group’s sample size is >= 5). If the H statistic is significant (H is larger than the cutoff) we reject the null hypothesis. If the H statistic is not significant (H is smaller than the cutoff) we fail to reject the null hypothesis. In this test the null hypothesis is that the medians of each group are the same, meaning that all groups come from the same distribution. The alternative hypothesis is that at least one of the groups has a different median, meaning at least one comes from a different distribution than the others.

Assumptions

Ordinal Variables - the variable in question should be ordinal or continuous, i.e., have some kind of hierarchy to them
Independence - each group should be independent from the others
Distributions - observations in each group come from populations with the same distributional shape
Sample size - each group must have a sample size of 5 or more. With a sample size in this range, the chi-square distribution well-approximates the H statistic.

How-To and Example (by hand)

The step-by-step process to calculate the H statistic is as follows:

Step 1: State your hypothesis - Null Hypothesis: the medians (mean on ranks) are equal across the samples; Alternative Hypothesis: at least one median is different

Step 2: Prepare and rank your data - Arrange data from all groups together in one list in an ascending order - Give a rank to each of the data entries

Step 3: Sum the ranks for each group

Step 4: Calculate the test statistic, H

Step 5: Compare it to the critical cutoff, determined by the critical chi-square value

Step 6: Interpret your results

As an example, we will use data on antibody production after receiving a vaccine. A hospital administered three different vaccines to 6 individuals each and measured the antibody presence in their blood after a chosen time period. The data is as follows:

Vaccine	Antibodies (μg/ml)
A	1232
A	751
A	339
A	848
A	447
A	542
–	–
B	302
B	57
B	521
B	278
B	176
B	201
–	–
C	839
C	342
C	473
C	1128
C	242
C	475

We want to determine how the three vaccines perform compared to each other. This can be quantified by determining if each vaccine causes the recipients to produce the same number of antibodies. Essentially we are looking to determine if the antibody data for each vaccine originates from the same distribution. We have relatively small sample sizes so we cannot well-determine if the data is normally distributed, so we use the Kruskal-Wallis test.

Step 1:

Null Hypothesis \(H_0 =\) the vaccines cause the same amount of antibodies to be produced (all three groups originate from the same distribution and have the same median)

Alternative Hypothesis \(H_A =\) At least one of the vaccines causes a different amount of antibodies to be produced (at least one group originates from a different distribution and has a different median)

Step 2:

Here we organize our data into ascending order then give each a rank.

Vaccine	Antibodies (μg/ml)	Rank
B	57	1
B	176	2
B	201	3
C	242	4
B	278	5
B	302	6
A	339	7
C	342	8
A	447	9
C	473	10
C	475	11
B	521	12
A	542	13
A	751	14
C	839	15
A	848	16
C	1128	17
A	1232	18

Step 3:

Now we put our data back into their original groups and sum the ranks for each group.

Vaccine	Antibodies (μg/ml)	Rank
A	1232	18
A	751	14
A	339	7
A	848	16
A	447	9
A	542	13
–	–	–
B	302	6
B	57	1
B	521	12
B	278	5
B	176	2
B	201	3
–	–	–
C	839	15
C	342	8
C	473	10
C	1128	17
C	242	4
C	475	11

Here, the sum of ranks for vaccine A is 77, the sum of ranks for vaccine B is 29, and the sum of ranks for vaccine C is 65.

Step 4:

Now we are ready to calculate our test statistic H \(H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i}-3(N+1)\). For our data,

\[N = 18\]

\[k = 3\]

\[R_i = 77, 29, 65\]

\[n_i = 6, 6, 6 \]

Plugging these in we get:

\[H = \frac{12}{18(18+1)} \left[\frac{77^2}{6} + \frac{29^2}{6} + \frac{65^2}{6}\right]-3(18+1)\]

Working out the math gives us a test statistic of \[H = 7.29824\]

Step 5:

Next we compare this H statistic to the critical cutoff: the corresponding chi-square value. We can determine the chi-squre value by referencing a chi-square probabilities table.

We find the degrees of freedom by subtracting 1 from \(k\):

\[df = k-1 \] \[ = 3-1 = 2\]

Using this value and a probability of 0.05 we find

\[\chi^2(2) = 5.99\]

The comparison between H = 7.29824 and \(\chi^2(2)\) = 5.99 gives

\[H > \chi^2(2).\]

Step 6:

Finally we interpret our results. Since H is larger than the critical cutoff \(\chi^2(2)\), we reject the null hypothesis; the medians are not the same across all three groups, at least one of them has a different median than the others. This means that all three vaccines do not perform equally, at least one vaccine causes their recipients to produce a different amount of antibodies than the others.

It’s important to note that Kruskal-Wallis can only tell us that at least one of the groups originates from a different distribution. It cannot tell us which of the group(s) that is(are).

How-To and Example (with Python)

The Python scipy.stats module has a function called kruskal(). Basically this function carries out the above calculation for us. This function takes two or more array-like objects as arguments and returns the H statistic and the p-value. Like most statistical software, the kruskal() function computes approximate p-values that are based on the chi-squared distribution. To refresh our memories, the p-value in this case is the probability of seeing differences in the groups as large as what we witnessed if the null hypothesis is true. If we have a small p-value, say less than 0.05, we have evidence against the null. Small p-values with Kruskal-Wallis lead us to reject the null hypothesis and say that at least one of our groups likely originates from a different distribution than the others.

Here we will use the same example data and use kruskal() to carry out the test. We enter the data into three separate arrays, one array for each group (in this case vaccine). We store the data in one array per group to make it easy for kruskal() to tell our groups apart. This function interprets each array input as a separate group and will use each array as its own group in the H statistic and \(\chi^2\) calculations.


from scipy import stats
import numpy as np

# Store the data from each vaccine (the group for this example) into its own array                                 
d1 = np.array([1232, 751, 339, 848, 447, 542])
d2 = np.array([302, 57, 521, 278, 176, 201])
d3 = np.array([839, 342, 473, 1128, 242, 475])

# Conduct the Kruskal-Wallis test                                               
H = stats.kruskal(d1, d2, d3)

print(H)


KruskalResult(statistic=7.298245614035082, pvalue=0.02601393801711558)

Here we see that the p-value is ~0.026 which is less than the cutoff 0.05, so we reject the null hypothesis: the medians are not the same across all three groups, at least one of them has a different median than the others. This means that the vaccines do not perform equally well because the resulting antibody production is not the same for each vaccine. We draw the same conclusion as we did above when we performed the calculation ourselves!

Again we emphasize that the Kruskal-Wallis test can only tell us that at least one of the vaccines performs differently than the others. It cannot tell us which vaccine(s) that is(are). In order to determine which vaccine performs differently we would need to conduct a post hoc test.

Summary

Kruskal-Wallis tests if groups originate from the same distribution by determining if the groups have the same median
Kruskal-Wallis is a non-parametric test, meaning it does not assume normally distributed data
The test statistic is the H statistic given by \(H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i}-3(N+1)\)
Compare the H statistic to the critical cutoff given by the \(\chi^2\) distribution (with df=k-1 and chosen probability)
- H > \(\chi^2\) –> reject the null hypothesis
- H < \(\chi^2\) –> fail to reject the null hypothesis
Use the Python scipy.stats function kruskal() to compute this quickly
- p < 0.05 : reject the null hypothesis
- p >= 0.05 : fail to reject the null hypothesis
Reject the null hypothesis: at least one group has a different median so we're confident at least one group originates from a different distribution
Fail to reject the null hypothesis: we cannot reject the possibility that all groups originate from the same distribution
Kruskal-Wallis can only tell us if the groups originate from the same distribution. If we reject the null hypothesis, we can only conclude that one or more of the groups has a different median (comes from a different distribution). The test cannot tell us which groups originate from a different distribution.

References

May, R.B., Masson, M.E.J, & Hunter, M.A. (1990) Applications of Statistics in Behavioral Research. Harper & Row. Pages 494 - 496.
Ostertagová, E, Ostertag, O, & Kovác, J. (2014) Methodology and Application of the Kruskal-Wallis Test. Applied Mechanics and Materials. ISSN: 1662-7482, Vol. 611, pp 115-120. doi:10.4028/www.scientific.net/AMM.611.115.

Samantha Lomuscio
StatLab Associate
University of Virginia Library
December 7, 2021

For questions or clarifications regarding this article, contact statlab@virginia.edu.

View the entire collection of UVA Library StatLab articles, or learn how to cite.