It’s like seeing dark clouds and knowing it might rain, and wondering whether to grab an umbrella.
We automatically make a series of decisions when we see or hear certain things. Our brains are wired to make our past experiences work for us in the future.
Making those kinds of predictions is human nature and simultaneously a powerful illusion.
The illusion is that correlation implies causation.
Dark clouds don’t always produce rain. A seemingly menacing cloud could easily drift apart in a matter of minutes, leaving you holding an umbrella that’s practically useless unless you’re prone to sunburns.
We make these calculations on a daily basis. It’s probably better to grab an umbrella even if there’s only a 50 percent chance of rainfall.
But you probably wouldn’t make a business decision based on results that only had a 50 percent confidence level.
We want to have actionable results, and the easiest way to find them is through statistical significance.
While extremely useful, the term statistical significance is widely misused and misunderstood.
Even statisticians ignore their own field-based definition just to make a point.
Statistical significance is part of the background noise. It’s used for political polls, psychological studies, and for a large variety of businesses.
And it’s practical when you’re evolving your website by observing the results of random design tweaks or by scientific application of CRO software- you want to make sure the conclusions you act on our reliable and meaningful.
What is Statistical Significance?
A statistically significant result is when we can determine a measurement of data to be true to a degree of confidence (i.e. not due to random chance or sampling error).
The term comes from statistical hypothesis testing, and is the lynchpin to finding accurate results from A/B testing.
What you ask with every test is: Which option has higher conversions? and did that option actually cause those conversions?
Statistical significance helps us sift the wheat from the chaff.
Sample size is the biggest factor to finding more accurate results. But the harsh reality of data driven business is that there’s not always time or resources available to implement mass surveys or studies.
Faith is free but resources are not.
In order to make the right choice, sometimes a confidence level has to be reduced in order to find statistical significance.
If data on a particular segment trickles in relatively slowly, reaching a high level of confidence might be prohibitively time-consuming. Modern CRO software presents each finding along with an indicator of the finding’s reliability based on the mathematical underpinnings of statistical significance.
Stats Terms You Need to Know:
Since this post won’t be able to completely cover the ins and outs of stats language, I’ve covered a few basic terms everyone should understand:
- Alpha (α) or significance level is a way to quantify the possibility of committing a Type I error in an experiment. The standard alpha value is usually set at 5 percent, or .05 (or two standard deviations for you A types).Alpha is always in between 0 and 1; a higher alpha value means a higher chance for a false positive. Typically, an experimenter is accepting a 5 percent possibility of a Type I error occurring. Simultaneously, a confidence level would be set at 95 percent.This does not mean the test is 95 percent accurate, but merely stating there’s a 5 percent chance that the findings are not reliable (due possibly to bad sampling, or chance).
- Type I error is when a true null hypothesis is rejected, when it should have been retained (False Positive).
- Type II error is when a false null hypothesis should have been rejected, but wasn’t (False Negative).
- p-value or probability value, is used to determine statistical significance. In hypothesis testing, when the p-value is less than or equal to alpha, the null hypothesis should be rejected. However, the p-value doesn’t mean the alternate hypothesis is true or the null hypothesis is false. If the p-value is correctly calculated, it simply means that Type I error rates can only be as high as alpha.A p-value simply helps us quantify our confidence with the results of our tests. There’s a lot of concern over the use of the p-value, mainly over accuracy and misconceptions. Steven N. Goodman coined the term “p-value fallacy” in 1999 in order to criticize the binary of statistically significant findings.
Using a Chi-Square for A/B Testing:
There’s a wealth of methods that statisticians use to generate results: Z-tests, F-tests, two tail, one tail…
But when it comes to A/B testing, the Chi-Square test is an excellent method to see if results are statistically significant.
The Chi-Square test is used to find the relationship between categorical variables, like the relationship between a home page and conversion rates.
If you want to find out whether or not that new home page drives higher conversions, and if the results are statistically significant, a Chi-Square test is your best friend.
All you need to have in order to find statistical significance with a Chi-square test is a null and alternate hypothesis, an alpha value, and some collected data.
You can calculate your Chi-square manually, create your own calculator in Excel, or find calculators online. Juliette Kopecky created a really easy Excel spreadsheet for A/B testing. You can download it here.
How does statistical significance impact ROI?
While it may or may not be important for every single marketer to understand the nuances of statistical lingo, it’s important to be able to spot misconceptions or misinterpretations.
In fact, the p-value is said to be included so readers can make their own decisions regarding the accuracy of any given result.
Should data be our only driving force?
“Politicians use statistics in the same way that a drunk uses lamp-posts—for support rather than illumination.” Andrew Lang
But if Forbes reports that 86 percent of executives using predictive analytics have seen positive return on investment, don’t you want to know how they came to that conclusion?
When you’re seeking statistical significance, some tried and true methods for increasing efficacy with conversion rate optimization include:
- Testing over longer periods of time. It helps reduce false positives, and also increases accuracy.
- Running A/B tests one at a time. Too many results can be misleading.
- Testing on a strategy based method.
- Segmenting your visitors. Running tests on consumer subsets may lead to more interesting results.
- Recognizing the value of a negative result. Not all results are immediately profitable.
The reality of testing is that data is intrinsically tied to the methods used to create it, and the people who analyze it.
However, you can’t run a test on something that doesn’t exist. It takes a fair amount of creativity to conceive and launch a product.
To put it bluntly, there’s not a singular formula for success.
Solid science and profitable businesses are compelled by the same question: why?
Some useful statistics resources for your future use:
Here’s another useful and more detailed A/B test calculator.
And depending on the variety of your tests, this p-value calculator might be useful.
A great explanation of how to run a hypothesis experiment:
Thanks for reading!