‘Testing’ has become a pejorative term in education. It is used synonymously with exams that are tethered to high stakes. We use high-stakes assessments to award qualifications to students, to hold schools accountable and, increasingly, to compare the educational performance of countries. They have drawn fierce criticism from educators, parents and even students themselves.
We must, therefore, ask: What are the justifications for high-stakes testing? What do we gain – and lose – by lowering the stakes? Is formative assessment a viable alternative? And what role might technology play in shaping the way we measure educational outcomes?
The rationale for high stakes testing
Exams, in principle, can be a motivational force for learners. Roland Fryer’s famous research attests to the positive impact that financial incentives can have on students’ exam performance. In a more recent study, a group of US students improved their performance in a low-stakes exam, modelled on the PISA test, when they were offered cash incentives. Interestingly, students in Shanghai, a region notorious for its strong performance in PISA, exhibited no comparable difference in performance before and after the financial incentives were offered. Their motivation levels were already high, possibly as a result of a testing culture in which exam performance signifies rank and status. Whatever the source of high stakes, they are a demonstrable lever for student effort and attainment.
To its proponents, exams are a means of holding educators accountable for improving learning outcomes. A wellspring of learning data emerges from exams, which can be used as a proxy for evaluating the performance of learning institutions. In the UK, for example, league tables rank primary and secondary schools in terms of their students’ exam performance. International comparison studies like PISA seek to do the same at the level of entire education systems, and as policymakers pay more attention to the rankings, the stakes get higher.
Exams are also the basis for academic qualifications. Higher education institutions and employers still rely on candidates’ exam scores as a filter for evaluating their potential. Every student deserves the opportunity to showcase their potential, and a standardised evaluation protocol like national exams seems fair enough.
Motivation, accountability and qualifications make for a compelling rationale for high-stakes assessment. But there is a natural dynamic to exams that threatens to distort each of these objectives.
Campbell’s Law applied to education
In his 1979 essay, the social scientist Donald Campbell noted an interesting phenomenon concerning measurement:
“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Put differently, the higher the stakes we attach to any form of measurement, the more it induces unintended behaviours and the less reliable the measurement becomes. With high-stakes testing, Campbell’s Law manifests itself in several ways, each corroding the underlying intent of assessment.
The phenomenon of ‘teaching to the test’ is ubiquitous. When exam scores are linked to school and teacher performance, instruction becomes narrowly focused on the particular aims of the exam. Assessment becomes master of the curriculum as teachers, working against the clock and under the threat of penalty if scores drop, sacrifice rich learning experiences for the singular objective of test performance. Learning and performance become conflated educational objectives, with short-term exam preparation privileged ahead of more enduring learning outcomes.
The narrowing focus of exams is compounded by the anxiety that many students face in the face of ratcheted expectations. Far from fostering an intrinsic love of learning, exams can have a crippling effect on students’ attitudes. It is widely noted that some of the highest-performing education systems, such as those in the Far East, are also some of the most stressed and fatigued, owing to the intense focus on exams. Moreover, when summative judgements are placed on students through these exams, they risk condemning them to fixed beliefs in their intelligence – what Carol Dweck has termed a ‘fixed mindset’.
Nefarious exam practices extend to school leadership too. Increasingly, schools are responding to the threats of accountability by engaging in ‘off-rolling’, where students who are projected to perform poorly in exams are removed from the school roll.
These dynamics are a natural consequence of what journalist Warwick Mansell has termed ‘hyper-accountability’ systems. When the stakes are high enough, stakeholders across the education system align their behaviours according perverse incentives. The holistic, longitudinal aims of education are all too readily sacrificed at the altar of exams.
Meanwhile, the potential of learning data as a source of meaningful, actionable insight is easily lost in the rush for summative judgements. Jonathan Supovitz, associate professor at the University of Pennsylvania, puts it this way:
“The data from high stakes tests are useful to policymakers for assessing school and system-level performance but insufficient for individual-level accountability and provide meager information for instructional guidance.”
High-stakes testing gives rise to a battle between two types of assessment: formative and summative. The first is intended as a feedback mechanism for students and teachers, where each assessment reveals a critical insight on the student’s progress and informs the next stage of their learning. It is assessment for learning. The latter brings with it finality, a definitive evaluation of the student’s performance in a particular topic or course. It is assessment of learning and it judges rather than informs. It reigns supreme in the realm of high-stakes testing. Supovitz goes on:
“Rather than investing in substantial efforts to improve teaching and learning, we have created a system that values summative testing as the cure to what ails us.”
The key question is whether we can get the best of both forms of assessment. Is there a way to capture the laudable aims of summative assessment, without the unintended consequences? Can formative assessment and summative assessment exist in harmony? A case study of one of the world’s best performing education systems seems to suggest as much.
Applying the lessons of Finland
When the first set of PISA results were announced in 2000, the Finnish education system became an overnight sensation. Contrary to expectations, Finland has been a consistent high performer in these comparison exams, prompting many educators to plumb its educational depths. What makes Finland’s success all the more intriguing is its system’s relaxed approach to exams.
Finland has resisted high-stakes testing. Exams are not standardised and exam scores have no bearing on how schools or teachers are evaluated. Instead, teachers are granted the autonomy (and professional development) to devise their own assessment schemes. The dominant mode of assessment is formative, with the explicit aim of guiding teachers’ instruction. Finnish Education is premised on egalitarian values, which means there is an earnest commitment to maximising every students’ learning outcomes. Assessment data simply serves this ideal.
So far, so good. But bear in mind that Finland is a small country (population 5.5 million) with a relatively homogenous student base. It is not immediately obvious how their model would transfer to larger, more variable contexts in other parts of the world.
What the Finnish example does validate, however, is the notion that formative assessment can play a more prominent role throughout the schooling experience. It shows how learning data can serve its purposes as an aid to learning and teaching.
It’s here that technologies like Maths-Whizz are poised to make a transformative impact. Virtual tutoring, coupled with real-time analytics, enables many of the key benefits of formative assessment. A well-designed system lowers the stakes by pairing learning and assessment as mutually reinforcing processes so that no single assessment carries high stakes. Fixed judgements make way for informed insights into students’ relative strengths and weaknesses. Assessment data is squarely aligned to the needs of students and learners, with no incentives to ‘game the system’ (even where such incentives persist, systems based on continuous assessment are harder to abuse). And the same data can be used to make more holistic, and ultimately more accurate, judgements on how schools are performing.
These systems provide learning insights at a level far more granular and nuanced than single-point summative assessments. The informational potential of learning data comes into its own when it embeds patterns of student understanding and struggle, and when it empowers teachers to take action as the learning is occurring.
The chief takeaway lesson from Finland is the importance of teacher autonomy, so the positioning of such tools must be as a virtual teaching assistant – tool and servant to each teacher’s instructional goals. We must never elevate digital assessment tools to a false prophet because there is only so much any assessment, particularly one steeped in technology, can capture. Learning data should be indicative and suggestive; the final decisions should always rest in the hands of expert, human teachers.
Can we remove summative assessment altogether?
Alternative models like portfolio assessments, in which students display their work outside of the blunt conditions of an exam, have raised concerns around reliability. Whatever concerns might exist around the ‘gaming’ of regulated exams only amplify in the context of unregulated coursework assignments. It seems that some final, consistent evaluation of students’ learning must remain the status quo for signalling their future potential, and that formal exams are the most effective way to do this.
Here, too, technology may make an impact. The ability to assess students on an ongoing basis may pave the way for longitudinal records that stretch back several years. As an admissions tutor or employer, you might get access to the most salient details of a student’s educational profile: how their learning has evolved with time, how they responded to failure, which subjects they showed the most consistency in. But taken to an extreme, this model could have an Orwellian feel about it. After all, as adults, would we welcome the thought of our every learning interaction being tracked and making some contribution to our future career paths? If low stakes testing is realised through continuous assessment, what ethical questions do we need to consider around privacy? It is essential that we make conscious choices on what to collect – and not collect – around students’ learning.
Whatever our assessment mechanisms – formative or summative, low stakes or high – we must recognise there are always trade-offs to be made. The Finnish model, and the affordances of digital technology, at least suggest that, for the most part, we can make assessment the servant of the curriculum by giving formative assessment the emphasis it deserves. When it comes to evaluating schools, informing teachers and motivating students, formative assessment is more than fit for purpose, without the toxic side effects. But before we put high-stakes exams to the slaughter, we must interrogate the alternatives and consider whether the consequences are any more palatable. That’s the exam question we all face as educators.