Name three examples of nonparametric tests used by educational researchers
What could be the reason for such a high average? Well, one of the highest-paid Indian celebrity, Shahrukh Khan graduated from Hansraj College in 1988 where he was pursuing economics honours. This and many such examples tell us that average is not a good indicator of the centre of the data. It can be extremely influenced by Outliers. In such cases, looking at the median is a better choice. It is a better indicator of the centre of the data because half of the data lies below the median and the other half lies above it.
So far, so good – I am sure you have seen people make this point earlier. The problem is no one tells you how to perform the analysis like hypothesis testing taking median into consideration. Statistical tests are used for making decisions. To perform analysis using the median, we need to use non-parametric tests. Non-parametric tests are distribution independent tests whereas parametric tests assume that the data is normally distributed. It would not be wrong to say parametric tests are more infamous than non-parametric tests but the former does not take median into account while the latter makes use of median to conduct the analysis.
Without wasting any more time, let’s dive into the world of non-parametric tests.
Note: This article assumes that you have prerequisite knowledge of hypothesis testing, parametric tests, one-tailed & two-tailed tests.
Table of Contents
How are non-parametric tests different from parametric tests?
When can I apply non-parametric tests?
Pros and Cons of non-parametric tests
Hypothesis testing with non-parametric tests.
1. Mann Whitney U Test
2. Wilcoxon Sign-Rank Test
3. Sign Test
4. Kruskal-Wallis Test
5. Spearman Rank Correlation
1. How are Non-Parametric tests different from Parametric tests?
If you read our articles on probability distributions and hypothesis testing, I am sure you know that there are several assumptions attached to each probability distribution. Parametric tests are used when the information about the population parameters is completely known whereas non-parametric tests are used when there is no or few information available about the population parameters. In simple words, the parametric test assumes that the data is normally distributed. However, non-parametric tests make no assumptions about the distribution of data.
2. When can I apply non-parametric tests?
1. A winner of the race is decided by the rank and rank is allotted based on crossing the finish line. Now, the first person to cross the finish line is ranked 1, the second person to cross the finish line is ranked 2 and so on. We don’t know by what distance the winner beat the other person so the difference is not known.
2. A sample of 20 people followed a course of treatment and their symptoms were noted by conducting a survey. The patient was asked to choose among the 5 categories after following the course of treatment. The survey looked somewhat like this. Now, if you look carefully the values in the above survey aren’t scalable, it is based on the experience of the patient. Also, the ranks are allocated and not calculated. In such cases, parametric tests become invalid.
For nominal data, there does not exist any parametric test.
3. Limit of detection is the lowest quantity of a substance that can be detected with a given analytical method but not necessarily quantitated as an exact value. For instance, a viral load is the amount of HIV in your blood. A viral load can either be beyond the limit of detection or it can a higher value.
4. In the example above of the average salary package, Shahrukh’s income would be an outlier. What is an outlier? The income of Shahrukh lies at an abnormal distance from the income of other economics graduates. So the income of Shahrukh here becomes an outlier because it lies at an abnormal distance from other values in the data.
To summarize, non-parametric tests can be applied to situations when:
- The data does not follow any probability distribution
- The data constitutes of ordinal values or ranks
- There are outliers in the data
- The data has a limit of detection
The point to be noted here is that if there exists a parametric test for a problem then using nonparametric tests will yield highly inaccurate answers.
Pros and Cons of using a non-parametric test
In the above discussion, you may have noticed that mentioned a few points where using non-parametric tests could be beneficial or disadvantageous so now let’s look at these points collectively.
The pros of using non-parametric tests over parametric tests are
1. Non-parametric tests deliver accurate results even when the sample size is small.
2. Non-parametric tests are more powerful than parametric tests when the assumptions of normality have been violated.
3. They are suitable for all data types, such as nominal, ordinal, interval or the data which has outliers.
1. If there exists any parametric test for a data then using non-parametric test could be a terrible blunder.
2. The critical value tables for non-parametric tests are not included in many computer software packages so these tests require more manual calculations.
Hypothesis testing with non-parametric tests
Now you know that non-parametric tests are indifferent to the population parameters so it does not make any assumptions about the mean, standard deviation etc of the parent population. The null hypothesis here is as general as the two given populations are equal.
Steps to follow while conducting non-parametric tests:
1. The first step is to set up a hypothesis and opt a level of significance
Now, let’s look at what these two are
Hypothesis: My prediction is that Rahul is going to win the race and the other possible outcome is that Rahul isn’t going to win the race. These are our hypothesis. Our alternative hypothesis is Rahul will win the race because we set an alternative hypothesis equal to what we want to prove. The null hypothesis is the opposite one, generally null hypothesis is the statement of no difference. For example,
Level of significance: It is the probability of making a wrong decision. In the above hypothesis statement, null hypothesis indicates no difference between sample and population mean. Say there’s a 5% risk of rejecting the null hypothesis when there is no difference between the sample and the population mean. This risk or probability of rejecting the null hypothesis when it’s true is called a level of significance.