Stupid Statistics Estimates Based on Way Too Small Sample Sizes – November 23, 2019

I had two great statistics professors – one at the Bachelor’s level and one at the Master’s level. Both of them would laugh at using small samples to generate serious research on any topic no matter how trivial the topic. Their point was simple – you can make the same set of statistics show whatever you want them to show. I have also seen too many statistics experts support such use of small samples as valid and reliable.To paraphrase a well-known quote “There are lies, dang lies, and statistics.”

Yet again, I see an “expert” relying on way too small a sample (20 respondents out of an approximately 26 million database) to estimate what the blogger is trying to show. Making grand sweeping statements based on that small a sample size is silly in the extreme. Sadly, this has been pretty common among researchers for a while now. It doesn’t matter if it’s medical research, marketing research, scientific research, or any other kind of research. It’s worse when they are looking at a much larger population than 26 million. The U. S. population is around 330 million, but I have seen numerous studies where the researchers look at 30 individuals to generate statistics on Americans. I doubt that 3,300 individuals is going to generate meaningful statistics on our population. Many of the phone surveys rely on less than 2,000 respondents to generate supposedly accurate results.

I don’t care if it’s DNA, genealogy, marketing, etc. the blogger needed much larger samples. In this case, the “expert” is well-known enough the blogger could have approached the companies and gotten actual numbers.

In my case, it would have taken some time as I don’t have the name recognition, but I have enough results from the companies in question where I could have generated anywhere from 2,000 to well over 50,000+ comparisons. It would have taken time for me to run the reports or search the databases to generate the data. Does that mean my data would be more accurate than this blogger’s data? No, but it also doesn’t mean my data would be necessarily any more inaccurate.

With one of the databases, the blogger could have easily have run additional comparisons as the blogger should have access to certain information from the 20 respondents to do a much larger search. With the other companies, the blogger wouldn’t have that access.

I don’t know why researchers think it’s okay to use such small samples, but it’s been a trend for many years now and the numbers are getting smaller. Yes, it’s cheaper and quicker to run small samples, but the results should be taken with a huge dose of salt. If computers and programs had not become as complex and fast as they are today, it’s pretty easy to run significantly larger samples a lot easier than it was 10 years ago. How long before some researcher relies on a single individual to generate research showing whatever?

About Wichita Genealogist

Originally from Gulfport, Mississippi. Live in Wichita, Kansas now. I suffer Bipolar I, ultra-ultra rapid cycling, mixed episodes. Blog on a variety of topics - genealogy, DNA, mental health, among others. Let's
This entry was posted in Bloggers, DNA, Genealogy and tagged . Bookmark the permalink.

3 Responses to Stupid Statistics Estimates Based on Way Too Small Sample Sizes – November 23, 2019

  1. Betul Erbasi says:

    I have heard these statements a lot too: that you can show anything with your stats. But we need to be honest and take large samples for better representation. Nice points!

    Liked by 1 person

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.