You probably wouldn’t be surprised to hear that every education technology (edtech) publisher says their product works, and they all have some sort of supporting evidence. But oftentimes that evidence—if it’s fully experimental—is very scarce. In many cases, it’s just one study. Yet just that one piece of "gold standard" evidence is often considered good enough by educators when making a purchasing decision.
But it shouldn’t be.
Educators aren’t the only ones stuck in this “one good study” paradigm. Highly credible edtech evaluation lists give top marks for just one RCT (randomized controlled trial). Meanwhile, any other program with studies not meeting the RCT bar of rigor is by comparison downgraded—no matter how many other high quality studies they may have, under how many different conditions, in what timeframe, or even with repeated positive results.
The biggest problem with relying solely on fully experimental RCT studies in evaluating edtech programs is their rarity. In order to meet the requirements of full experiments, these studies take years of planning and often years of analysis before publication. These delays cause a host of other challenges:
Altogether, these issues demonstrate the often limited relevance of relying on “one good study.”
Extrapolating one study's results to your situation is not fully valid unless the study was performed on a district like yours, on the grade band you’re planning on using, with a student subgroup mix like yours, with usage like you plan to adopt, on your most recent assessment, and with the program revision you’ll be using.
The time has come for a shift in how we evaluate edtech programs. Rather than relying on one “gold standard” study, we should be looking at a large number of studies, using recent program versions, garnering repeatable results, over many varied districts. Quasi-experiments can study the adoption of a program as is, without requiring the complexity and time that up-front experiment planning takes. Methods of matching and comparing similar schools with and without the program can be made statistically rigorous and powerful. And if we study at the grade-level, we have the average test performance data universally available on state websites. It is then possible to do a quasi-experimental study on any large enough school cohort. If, instead of relying on RCT’s alone, we pay attention to quasi-experiments, a much higher number of studies is possible.
Crucially, a larger number of studies enables buyers to evaluate repeatability. Why is repeatability so important? Because even the “gold standard” results of a single study in the social sciences have very often failed to be replicable. In fact, published studies are allowed a 5% chance of drawing a false conclusion; how does one know that one “gold standard” study was not itself a cherry-picked or fluke result? So, “one good study” is not enough evidence. Reliable results as evidenced by replication from a lot of studies need to be the new normal.
Imagine this new paradigm with a large number of recent studies—let’s say five or more. This can allow us to look for consistent patterns over multiple years, across grade levels, and especially across different types of districts and assessments. This paradigm shows its rigor through repeatability, and adds vastly improved validity with respect to:
At MIND, we believe a high volume of effectiveness studies is the future of a healthy market of product information in education. To illustrate and promote this new paradigm, we’ve created a program evaluation rubric.
Download the Program Evaluation Rubric (with ST Math notation)
Download the Program Evaluation Rubric (blank)
While MIND has not yet achieved the highest standard in each of these rubric sections, we are driving toward that goal as well as annual, transparent evaluations of results of all school cohorts. We’ve already been able to do just that in grades 3, 4 and 5. We want our program to be held accountable for scalable, repeatable, robust results—it’s how the program will improve and student results will grow.
Effective learning is important enough that there should be studies published every year covering every customer. We believe MIND is ahead of the curve on study volume, but we are not alone. For example, districts are starting to share the results of their own many studies with peers on LearnPlatform, which will eventually create a pool of many studies covering a variety of conditions for any program. With more relevant information to inform their purchasing decisions, educators can find the best product fit for their students.
Andrew R. Coulson, Chief Data Science Officer at MIND Research Institute, chairs MIND Education's Science Research Advisory Board and drives and manages all research at MIND enterprises. This includes all student outcomes evaluations, usage evaluations, research datasets, and research partnering with grants-makers, NGOs, and universities. With a background in high-tech manufacturing engineering, he brings expertise in process engineering, product reliability, quality assurance, and technology transfer to edtech. Coulson holds a bachelor's and master's degree in physics from UCLA.
Comment