Lately, I have been asked some very sophisticated questions about why benchmarking is important, when is it required, and how we can measure performances that require unobservable thinking processes.
When standards matter:
The most common reason to benchmark is that common assessments are not always the same as summative assessments. Summative assessments are designed to measure the student’s ability to meet a standard or multiple standards by using subject content. Common assessments regularly only measure content. We need developmental performance data standard by standard. This is well explained in our presentations and videos.
When strategies vary:
However, there is a more sophisticated problem. When the strategy used to master the skill or process is obvious and common, then the scores on the assessment may be predictive of how a student will perform on an external test or interim test (aligned to the external test). For instance, a certain number of math problems on a summative assessment may be predictive of performance on other tests. Therefore, benchmarking is not absolutely required. (However, benchmarking once or twice a year may help administrators track school-wide progress in a manner that all parties can understand.) Another more perfect example is music. If a student can masterfully perform a piece of music, which has a known level of sophistication, we can know exactly how well the student can play. Whatever rating system is used it need not be translated into benchmarks –at least, not for the music teachers.
When we try to measure the developmental growth of a student’s ability to write or measure reading comprehension, measurement becomes difficult. First, objective measures do not work as well as performance measures in these cases. Having a student answer multiple choice questions about writing rules is not as predictive of the student’s ability to write as having the student write and rewrite an essay. Measuring how a student comprehends what is read is even more difficult to measure. Since we cannot yet look into a student’s brain, we must, instead, observe the strategies the student is using. We can measure the student’s ability to demonstrate his or her proficiency at using a reading comprehension or writing strategy.
Measurement process looks like this:
Standard -> strategy-> formative scores -> summative benchmark scores-> interim assessment-> external test
In this case, the subject teacher can teach and measure the student’s formative ability to operate the strategy by using a uniform, scoring system agreed to by the PLT or school-wide PLC. After formative practice and re-looping, the student demonstrates the ability to use the strategy on a uniform, summative, performance assessment. The summative assessment is benchmarked to the standard. The PLT gets a performance report on how well each student is progressing by benchmark. It is important to understand that there is an assumption that being able to operate the strategy means the student can do the skill or standard, but it is the best we can do. When the student is assessed on an interim measure, the data can be used to determine if the correct strategy was used. (The reliability of the interim assessment to predict the external test score is a reliability problem for the vender, administration or the school-wide, inter-disciplinary PLC.)
Last, educators have asked, “Can the local summative, performance assessment actually be a better measurement than standardized testing when attempting to understand how a student can do very complex, cognitive performance? My opinion is that highly trained, experienced educators can cause students to perform at high levels of performance. Sports, art, and drama coaches do it all the time. However, no one is looking over their shoulder. Can you prove that your performance assessments are better measures? Politically, I do not know if we will ever have a choice. A school must have some external measure to validate what is done. You cannot ask the public to believe, if there is no evidence that you are right. However, I would hope over time that our country can develop complex performance assessments, such as is done by International Bachelorette or the Westinghouse Scholars and its national science scholarship contest. Westinghouse Scholars uses a true complex, performance assessment to measure the best science students in the country. Some schools build six year programs around these measures. I am sure that the science teachers, who coach these students, could teach us how to measure rigorously using performance assessment. IB offers an even more practical solution for performance assessment.
This is a very short explanation. I hope it helps a little.
Thanks for reading,
Howard McMackin, Ph.D
Empowered High Schools