Holding Ourselves to a Higher Standard

During my eight years as a data analyst at MIND Research Institute, my work has mostly focused on annual quasi-experimental design (QED) analyses for new cohorts of school-grades implementing Spatial-Temporal (ST) Math. Our repeatable results continuously demonstrated the positive impact of ST Math in schools compared to a similarly matched control group. Recently, our donors at the Overdeck Family Foundation generously offered to sponsor my participation in Harvard University’s Strategic Data Project (SDP) cohort 13. During my two-year data education fellowship with SDP, I completed a capstone project and presented my findings at our convening in Chicago in May 2023.

Randomized Control Trials (RCT) are rightfully the gold standard in terms of proving causation from an intervention. Unfortunately, these are very costly and time-consuming to conduct, resulting in a single study that is usually outdated and/or ungeneralizable. While most of our competitors feel comfortable relying on a single RCT from many years passed, often on old versions of the product and results from assessments no longer being administered, MIND believes in providing a rich set of continuous, rigorous research. In addition to our own RCT, we annually conduct many different QED studies on a wide range of state assessments and demographic backgrounds. While a QED does meet What Works Clearinghouse (WWC) standards with reservations, it is often met with a level of skepticism. My SDP capstone project aims to address these concerns by providing a dramatic and innovative methodology to further increase the closeness of match and eliminate yet one more possible bias in a control group in a QED.

Throughout my tenure at MIND, I have been collecting publicly available state math assessment data for grades 3-5, year after year. This has resulted in math results for 39 states, in addition to the District of Columbia, across numerous years of administration, with some states having data going back to the 2004/2005 school year. My SDP advisor and long-term research partner to MIND, Dr. Teomara (Teya) Rutherford, helped drive this project forward with helpful feedback on how to best utilize this treasure trove of longitudinal data.

As a result, this project expanded upon the typical matching in MIND’s QED analysis of ST Math. Focusing specifically on the state of California, due to the availability and size of the data, a cohort of new grades 3-5, beginning ST Math in either 2019/20 or 2020/21, was filtered down for enrollment and progress requirements. The resulting 45 school-grades 3, 4, and 5 in 30 California schools were then matched 1:1 using propensity score matching, via the “matchit” program in R, with “mahalanobis” as the distance measure. The innovation: the matching criteria consisted of five years of California Assessment of Student Performance and Progress (CAASPP) math performance data administered for several years before the implementation of ST Math (2014/15-2018/19), including matching on both Mean Percent of students with proficiency as “Standard Met” or “Standard Exceeded” and Mean Scale Score, in addition to the percent of students needing free or reduced lunch in 2018/19 (from MDR).

The resulting 45 matched control grades 3-5 who never used ST Math (Control) were plotted against the ST Math grades (Treatment) for their math performance in the matched years, 2014/15 through 2018/19, and the treatment year, 2021/22. Due to the COVID-19 pandemic, California is missing CAASPP math assessment data in 2019/20 and 2020/21. The subsequent line charts demonstrate equivalent longitudinal trends in the matched years- both Treatment and Control sets had a positive trajectory of math scores. Furthermore, the differences between the Treatment and Control grades met WWC’s baseline equivalence requirements in the 2018/19 school year (directly preceding the implementation of ST Math in the following years in the Treatment set).

Finally, comparing math performance between the Treatment and Control sets from 2018/19 to 2021/22 yielded positive, statistically significant results in favor of those grades using ST Math. For both the Mean Percent of students with proficiency as “Standard Met” or “Standard Exceeded” and the Mean Scale Score, the ST Math Treatment set outperformed their matched controls with statistically significant differences of 7.53 and 18.41, respectively.

These dramatic post-covid full recovery results of this study- which has as its genesis just a deepening of rigor- are especially relevant in today’s post-covid era. The learning loss seen across schools in all states during the covid-19 pandemic is a serious issue. This analysis not only strengthens the claims made by MIND about ST Math’s efficacy via a rigorously-matched control group, but it shows the Treatment set maintaining their Mean Percent of students with proficiency as “Standard Met” or “Standard Exceeded” from before covid-19 (2018/19) to afterwards (2021/22). This recovery in learning loss was unfortunately not seen in the matched control group over the same time period- they dropped 8 points in percent of students meeting standards.

We hope this quasi-experimental methodology can be utilized by and large to help bring more accountability to the edtech space. Using publicly available data in this new way is scalable and practical in many settings. Decision makers must demand rigorous and relevant evidence with repeatable results so they can choose the best products to achieve maximum student learning.

Jessica Guise

About the Author

Jessica Guise is a data analyst at MIND Research Institute. She just completed her two-year Strategic Data Project Data Fellowship at Harvard University.


Join Our Newsletter