*[Note: This post was partly inspired by a talk given at the NTEN ResearchED event held at Huntington School, York on 3rd May, 2014. Also note that I am not a statistician – which might appear obvious!]*

Many of us in the educational community are at last coming around to the realisation that research does have something to offer. We read it all the time on social media and have all witnessed discussions that seem to go on for days about which research is best or how to be critical about things we’ve read. There is an assumption that if research supports our particular view then we have permission to take the high ground and shoot down all those who refuse to accept the ‘evidence’. Of course, we need evidence (that’s the whole point) but we also need to be critical of it.

A great deal of educational research is positivist; like psychology, educational research often assumes that outcomes can be measured using scientific principles and anyone who is familiar with academic papers in psychology will have noticed that there is an awful lot of numbers and bizarre equations involved. The scientific method is one of hypothesis testing – to paraphrase Richard Feynman, the first thing you do is make a guess and then you test the guess by conducting an experiment. If your experiment doesn’t support your guess then your guess is wrong.

In psychology (and many social sciences) what we are looking for is a statistical significance, the nuts and bolts of which are dependent upon statistical tests. The main criteria we use in order to establish significance is something called a p value (or probability value). Psychologists often set the p value as 5% and represent this using the statement p≤0.05.

**What does p≤0.05 actually mean?**

This is actually quite straightforward, despite the looks on by students’ faces when presented with it: “Maths! In psychology… nooooo.”

All it means is that there is a 5% (or less) chance that the results were due to something other than the manipulation of the independent variable (i.e. something the researcher was unable to control for). The p value is fairly arbitrary but there is a general consensus that 0.05 is a good place to start. We could set it higher but this might mean that we accept our hypothesis based on a false positive (a Type 1 error), or we could set it lower – but then we face the possibility that we reject our hypothesis and accept our null hypothesis when, in fact, the difference was significant (a Type 2 error). So, sometimes that which is true is actually false and that which is false is actually true (cue Robin Thicke).

P values are a hot topic at the moment with many suggesting that effect size might be a better measure to use (there are problems here as well). Nevertheless, while p values remain so influential, we need to be mindful that errors do occur. More worrying, perhaps (if less common) is the phenomenon of p-hacking. P-hacking involves removing the data which prevents the p value from being reached, thus manipulating the data in order to create a positive result. So a researcher might remove all the outliers, sometimes under the ruse that there was something ‘wrong’ with these results. P-hacking (and other such dubious practices) are often uncovered due to the inability to replicate the results – so be wary of single studies (especially if they are a few years old) with no recent studies to support them.

So, to claim that true and false (or right and wrong) are absolute in research is perhaps to misunderstand the workings of the scientific method as it applies to real people. Other factors such as bias, demand characterises and individual differences can blur the lines even further. This is perhaps the reason for the oft-used line ‘research suggests’ because there is always the probability (however small) that the results aren’t as statistically significant as we thought.

teachingbattlegroundWas it my talk which suggested this topic to you? This is one of the many topics I wish I could have discussed in more detail. I did briefly mention that science has its statistical methods and (I think during the questions) that there was more to be thought about in this respect. I think we tend to neglect the role of probability in a few ways.

Firstly, looking at too many different sets of data without realising that if you look at enough some effects will be chance. I’ve seen this done when looking at subgroups of a population (i.e. “this policy works for X” where X is just one of many subgroups to have been considered). I’ve also seen it done to subdivide teaching methods in effect size comparisons. So when it turns out that groupwork has only an average effect size, people look at the effect size for lots of different types of group work, pick the one with the highest effect size, and say “this shows that group work is highly effective”, yet, of course, just by probability, if you look at enough types of groupwork you are bound to find one with a high effect size. This sort of thing: http://xkcd.com/882/

Secondly, I’ve seen it neglected when interpreting small samples. People often claim that what you see in one school tells you nothing. But even a sample of one tells you something about a population statistically. It can be enough to disprove a claim that something never happens, and as a matter of probability, it can show (with the same sorts of probability as the research you describe above) that something isn’t particularly rare. If something is meant to happen in less than 1 in 20 schools, then finding it in a sample of 1 has a less than 5% chance of happening, which is a good level of significance.

Thirdly, and this is more of an error of layman than people who are used to statistics, research is often dismissed for having a sample which is a small percentage of the population. However, statistics tells us that for large enough populations, it is the absolute size, not the relative size, of the sample which matters.

Overall, I think we would benefit from talking explicitly about the probability of error in all results from research and I don’t think it would undermine anything in my talk. Truth is not about certainty, it’s about honesty.

Marc SmithPost authorYes, it was your talk. I half expected you to discuss research evidence further but I realise time was short and you had a lot to cover.

Interestingly, the more I become involved in research the more I become critical of it. I’ve listened to PhD students telling me that because their results aren’t significant they will need to increase their sample size – which is really just nonsense. There comes a point when we have to accept that the research is ‘right’, at least until some other study comes along and says it’s not.

One problem I think is that people (including teachers) latch onto some study or other and accept its findings uncritically (or latch onto a person and assume everything they do or publish is god’s own truth – mentioning no names, and certainly not American cognitive psychologists). Also many seek out evidence to support a view they already hold (confirmation bias).

We also have to accept that most studies conducted (certainly in psychology) produce insignificant results, are never published (cue publication bias) and therefore lead to others conducted the same research only to discover that their results aren’t insignificant.

teachingbattlegroundReblogged this on The Echo Chamber.