Another dot in the blogosphere?

Posts Tagged ‘assessment

The word “evaluation” might have been ill-defined and misused.

I was surprised to read someone like Senge reportedly saying this about evaluation.

Evaluation is when you add a value judgment into the assessment. Like, ‘Oh, I only walked two steps. I’ll never learn to walk.’ You see, that’s unnecessary. So, I always say, ‘Look, evaluation is really optional. You don’t need evaluation. But you need assessment.

Evaluation is about adding a value judgement into assessment. That is why it is called eVALUation. But that does not make evaluation negative or optional.

Student A might get an assessment score of 60/100. Student B might get an assessment score of 95/100. One way to evaluate the students is to compare them and say that student B performed better than A. More is better and that is the value, superficial as doing that may be.

If you consider that Student A previously got a score of 20/100 and B a previous score of 90/100, the evaluation can change. Student A improved by 40 points; student B by 5 points. The evaluation: Student A made much more improvement than Student B.

The value judgements we bring into assessments are part of evaluations. Assessments alone are scores and grades, and not to be confused with the value of those numbers and letters.

In the context of working adults who get graded after appraisals, a B-perfomer is better than a C-performer. The appraisal or assessment led up to those grades; the worker, reporting officer, and human resource manager place value in those letters (no matter how meaningless they might actually be).

The assessments of children and adults are themselves problematic. For kids, it might be a broad way of measuring a narrow band of capabilities (academic). For workers, it might be an over simplistic way of assessing complex behaviours. So the problem might first lie with assessment, not evaluation.

As flawed as different assessments may be, they are simply forms of measurement. We can measure just about anything: Reasoning ability, level of spiciness, extent of love, degree of beauty, etc. But only evaluation places value on those measurements: Einstein genius, hot as hell, head over heels, having a face only a mother could love.

I have noticed people — some of them claiming to be teachers or educators — not understanding the differences between assessment and evaluation. As the terms have not been made more distinct, evaluation has been misunderstood and misused.

Evaluation is not a negative practice and it is not optional. If evaluations seem overly critical (what went wrong, how to do better), they merely reflect the values, beliefs, and bias of the evaluator. We do not just need assessment, we also need evaluation to give a measurement meaning.

My reflection starts with an Apple Pay verification process and ends with lessons on teaching and assessment.

When Apple Pay launched in Singapore in May, I jumped on the bandwagon by verifying one of my credit cards. The process was quick and painless: Scan the card details into the Wallet app and verify the card by SMS.

I tried the process with another eligible card, but did not receive SMS verification. I put that down to early implementation issues.

However, I tried about ten times between the launch in May and this month and was still unsuccessful. The Wallet app provided the alternative verification process of calling the credit card issuing bank’s customer service.

I dread using such customer “service” because the process make me feel like a rat being tested in a maze.

I had to get through several layers of number pressing before getting the option to speak with a human. Once there, I was informed that they were “experiencing a high call volume”.

I missed having an old phone that I could slam down on the receiver.

This particular bank provided the option of leaving my contact number so that I would receive a call-back in four hours. That must have been some really high call volume!

I received one shortly before the four-hour mark and explained how I did not receive SMS verification for Apple Pay from that bank’s system. I also mentioned that I had done the verification for another bank’s card quickly and seamlessly with the same process.

The customer service representative (CSR) was puzzled, checked the messaging records, and told me that SMS had been sent to my phone. I wanted to reply that I was not an idiot, but I bit my tongue. I repeated that I did not receive any despite several attempts over two months.

The CSR then advised me not to use my bank-issued security dongle. I told him that the dongle was irrelevant because it was not a verification option in Apple’s Wallet app. So he said he needed to look into my case and asked if he could call me back in an hour.

As soon we disconnected, something connected. A long time ago, I blocked a few of the bank’s SMS numbers because I kept getting marketing messages despite telling them I did not want any. I wondered if the SMS verification shared one of those numbers.

I figured out how to unblock the numbers and tested the SMS verification for that bank card. It worked as quickly as my first card.

The was not the fault of the bank. It was mine for blocking numbers, irritating as their messages were.

I reminded myself of two lessons on teaching:

  1. You should not just stick to a script. It is important to first listen to the learner’s problem before suggesting a learning solution. The CSR’s advice to not use the dongle was obviously part of a recommended script, but it was irrelevant in this context. Mentioning the dongle not only did not help matters, it added to my frustration.
  2. Thinking out loud is one of the best ways to learn. I knew what the symptom of my problem was (no SMS from the bank), but I did not know its root cause (I had blocked some SMS numbers). Speaking to someone helped me pull thoughts to the surface and helped me find my own solutions.

When the CSR called back, I explained how I had solved the problem myself. He was relieved. I was relieved.

Right after we disconnected, he triggered an SMS to me to rate the customer service by text. It was like being pranked.

Bank SMS.

I did not respond to the SMS because the ratings were too coarse: Below, Meet, Exceed.

The phone service took place over more than one call and had multiple components. Averaging the experience was not meaningful. Detailed feedback on what was good or not good about the experience and analysing a recording of the exchanges are more tedious but better options.

I thought of two lessons on assessment:

  1. The administrative need to collect and collate data drives such bad practice. Just because you collect these data does not make the data meaningful or help CSRs improve. Administrative needs should not drive assessment.
  2. The average rating approach is a hallmark of summative assessment. It is grading an experience. If the CSR received “Exceed”, did he get a pat on the back? If the feedback was “Meet”, would he just keep reading from scripts? If the grade was “Below”, what can he do with that information? Good assessment is based on quality feedback, not just grades.

It does not take special events, teacher observations, prescribed professional development, or even a personal learning network to learn how to teach or assess better. The lessons and reminders are everywhere, even in the Apple Pay card verification process. You just have to pay attention.

The news that caused ripples in Singapore schooling last week was the official announcement from the Ministry of Education (MOE) of the new scoring system that will be implemented in the Primary School Leaving Examination (PSLE) in 2021.

There was a slew of news following the announcement. Some people made tsunamis out of the ripples, some rode the waves as they were [small sample of both].

Beneath the surface was an undercurrent that did not get much attention, but was the most significant change in terms of education. According to STonline, one of the changes was the switch from norm-referenced testing (NRT) to standards or criterion-referenced testing (CRT).

PSLE2021: From NRT to CRT

What are NRT and CRT in layman terms? Why is the switch an important driver of change?

In NRT, the results of a cohort of students are reduced to scores — T-scores in the case of PSLE — and lined up from the highest to the lowest (or vice versa). The result is a bell-shaped curve of scores: There will be a few very low and very high scores, and many somewhere-in-the-middle ones.

Reviewers of these scores typically use this distribution to create an even curve (a normal distribution, ND), and to rank and sort. In the adult world of work, this method might help determine who gets promotions or bonuses, what appraisal grade you get (if you are in the civil service), or who gets fired.

For example, a large organisation can first rank the performances of all its employees. If an ideal ND does not result, it can statistically massage it into an ideal bell curve. So if there are too many A-graders, some will be pushed into Bs, and as a result Bs become Cs and so forth. Once there is an ideal bell curve, someone can decide cut-offs and consequences, say, the top 5% get promotions and the bottom 15% are let go.

If this seems unfair to working adults, then what more for the 12-year-old children who take the PSLE but have no idea what is going on?

The core problem is that people are compared one against the other with or without their knowledge. If with, this can result in unhealthy competition because they want to be on the right part of the ND curve. If without, the people become victims of processes not transparent to them and circumstances beyond their control.

Is there a better way? Yes, it is called CRT (standard-based assessment and/or evaluation).

Modern corporations like Accenture are abandoning the outdated practice of norm-referencing [1] [2] and embracing comparisons of one. The fundamental principle is this: How one improves and contributes individually over time is more important than how one is measured against others.

For example, a worker might show evidence of specific skills that indicate that he or she is a novice, intermediate, or advanced worker. There is no comparing of all the workers regardless of their skill group or even comparing within each skill group.

To make this work, there must be standards or criteria that identify each skill group, e.g., skills A to J for novices to master; K to R for intermediates; S to Z for advanced plus five potential managerial markers.

Back to PSLE 2021. The switch is from NRT to CRT. It is more about the standards or specific criteria that indicate the test-based achievements of the child, and less about the comparison of one child with another.

This is a fundamental shift in mindset from sifting and sorting to measuring performance. The former is about what is good for the system and how to feed it; the latter is about where the learner is at and what is good for the learner.

However, this piecemeal change of the CRT system of academic levels (ALs) still falls short. I share thoughts on these in more reflections on PSLE2021 over the next few days.

Read Part 2: The Dark Side.

In 1982, the late Prince may have partied like it was 1999.


But in 2016, why are we assessing like it is still 1999? Or maybe even 1899?

Video source

In this video Noam Chomsky explains the problems with assessment: The way they are misused, misaligned, and misappropriate.

It is no surprise then that a Secret Teacher wrote the following article in The Guardian about how tests seemed to be dumbing down her students.

The teacher bemoans:

My students are bright, engaged and well-behaved, but there is something missing: they cannot think.

The Secret Teacher goes on to blame a focus on exams and I agree with the teacher for the most part. But tests are not the only thing to blame for students who do not know how to think independently.

Teachers who spoon feed, stifle thought, or fail to stay relevant are just as culpable.

For instance, the teacher said:

Last week I caught another of my A-grade students using his phone in the lesson. As a starter exercise, I told them to think of as many advantages as they could of being on the UN security council. “What are you doing?” I asked. “I’m googling the list of advantages,” came his wary reply. I was flabbergasted. I tried to explain that there is no list of advantages, but that I wanted his own views.

I am confident that the Secret Teacher is also a Good Teacher. But she also sounds like a traditional one in that she is averse to searching for Googleable answers. Perhaps she did not know how to take advantage of a now natural behaviour to show her students how to think, act, and write critically after Googling.

Most people would eventually realize that the most important factor in a schooling or educational system is the quality of its teachers. Those that join the profession are self-selecting by choice and pre-selected by institutes of teacher education.

But only the exceptional step up to deal with the problems with assessment or learn how to skilfully promote critical and creative thinking in a conservative system. The rest need professional development and the mindset of lead learners to do this.

This reflection is a response to a slow chat question on #asiaED about the role of assessment in systematic change.

The question was:

My response was:

The layperson’s likely view of assessement is summative tests and exams, typically of the high stakes variety, because that is what they have experienced. As its name implies, summative assessment is perceived and practiced as a terminal or downstream activity.

Informed educators might point out that formative assessment (on-going feedback) is more important for learning. Educated instructional designers will tell you that assessment or evalutation should be developed before content. Wise educational consultants and leaders will tell you that assessment is a key leverage point in systemic change.

Assessment is actually an upstream component. Change that and you affect processes downstream like teaching, learning support, learning environment design, and policy making.

Imagine for a moment that exams were removed and replaced with learner portfolios. Now imagine how teaching, teacher expectations, teaching philosophies, teacher professional development, and teacher evaluation might change.

I would like to answer a question directed at me:

I cannot say for sure how assessment should change and I do not think that data collected from such assessment only serve as leverage.

Consider an example of a change-in-progress and my suggestions on how to implement change and avoid pitfalls in the process.

There are at least two significant assessment-related changes in Singapore now. One is an emphasis on values-based education (instead of focusing on just grades) and the other is evaluating of the importance of a degree.

Added after initial posting, a timely tweet from a local rag:

These changes were a result of:

  • parental feedback on the unnecessary stress of high stakes testing (particularly of the Primary School Leaving Examination (PSLE)
  • the recognition of grade inflation (particularly at the GCE A Levels)
  • the mismatch between what employers need and what universities produce
  • new and visionary leadership at the Ministry of Education (MOE), Singapore

All these placed pressures on what we understand and value as traditional, summative assessment.

That said, MOE is not going to sacrifice the sacred cows of tests and exams. But it has started emphasizing other processes and measures.

Values-based lessons are being integrated into previously content-only lessons [news article after its announcement in 2011]. Primary school students can get into Secondary schools of their choice based on non-academic talents with the Direct School Admissions (DSA) scheme.

Experts of systemic change might label these efforts as piecemeal change. They do not profoundly disrupt existing processes and are instead implemented in periodically and strategically in an attempt to create overall change.

However, critical observers might also note that significant and sustained change tends to happen with disruptive interventions. Examples might include:

  • the impact of antibiotics and anaesthesia on medical practice
  • the effect of the printing press on schooling and the spread of information
  • the influence of smartphones on banking, commerce, education, entertainment and gaming, information consumption, content creation, and socialization.

I predict that e-portfolios will rise in importance as a means of recording and evaluating (not just assessing) both the processes and products of learning.

e-Portfolios are a systemic and disruptive change in that they:

  • start and end with the learner
  • belong to the learner
  • emphasize processes and not just products of learning
  • showcase holistic or other attributes (not just academic ability)
  • promote lifelong, career wide learning

The battle to create acceptance, buy-in, and hopefully ownership of what we now label as alternative assessment will probably last a decade or more. During this time, it might be tempting to try to collect evidence during a trial or a full blown implementation of the effectiveness of e-portfolios to convince stakeholders that the change is making a difference.

However, this is not a wise move. Efforts to do this would repeat the mistakes of the slew of early educational and action research comparing the effects of intervention A (for example, traditional instruction) and intervention B (technology-assisted instruction). There are far too many factors that influence learning outcomes, attitudes, values, etc.

If data on newer forms of assessment need to be collected, analyzed, and presented, I suggest that they be part of a much larger plan. Such a plan could include:

  • having regular conversations with stakeholders
  • creating a shared vision among stakeholders
  • relating success stories to create buy-in
  • developing informed, forward-thinking, and informal leadership
  • providing financial and implementation leeway for unforeseen obstacles

In summary, assessment is an important leverage point and an upstream component for changing educational systems. Data on disruptive changes like the adoption of e-portfolios for assessment and evaluation can be leveraged on to convince stakeholders. However, such data should only be part of a larger and sustainable plan.

272/365: Student by Rrrodrigo, on Flickr
Creative Commons Creative Commons Attribution-Noncommercial 2.0 Generic License   by  Rrrodrigo 

Recently I read an article on The Atlantic, The End of Paper-and-Pencil Exams?

The headline asked a speculative question, but did not deliver a clear answer. It hinted at mammoth change, but revealed that dinosaurs still rule.

Here is the short version.

This is what 13,000 4th grade students in the USA had to do in an online test that was part of the National Assessment of Educational Progress. They had to respond to test prompts to:

  • Persuade: Write a letter to your principal, giving reasons and examples why a particular school mascot should be chosen.
  • Explain: Write in a way that will help the reader understand what lunchtime is like during the school day.
  • Convey: While you were asleep, you were somehow transported to a sidewalk underneath the Eiffel Tower. Write what happens when you wake up there.

This pilot online assessment was scored by human beings. The results were that 40% of students struggled to respond to question prompts as they were rated a 2 (marginal) or 1 (little or no skill) on a 6 point scale.

This was one critique of the online test:

One downside to the NCES pilot study: It doesn’t compare student answers with similar questions answered in a traditional written exam setting.

I disagree that this is necessary. Why should the benchmark be the paper test? Why is a comparison even necessary?

While the intention is to compare the questions, what a paper vs computer-based test might do is actually compare media. After all, the questions are essentially the same, or by some measure very similar.

Cornelia Orr, executive director of the National Assessment Governing Board, stated at a webinar on the results that:

When students are interested in what they’re writing about, they’re better able to sustain their level of effort, and they perform better.

So the quality and type of questions are the greater issues. The medium and strategy of choice (going online and using what is afforded there) also influence the design of questions.

Look at it another way: Imagine that the task was to create a YouTube video that could persuade, explain, or convey. It would not make sense to ask students to write about the video. They would have to design and create it.

If the argument is that the YouTube video’s technical, literacy, and thinking skills are not in the curriculum, I would ask why that curriculum has excluded these relevant and important skills.

The news article mentioned some desired outcomes:

The central goal of the Common Core is deeper knowledge, where students are able to draw conclusions and craft analysis, rather than simply memorize rote fact.

An online test should not be a copy of the paper version. It should have unGoogleable questions so that students can still Google, but they must be tested on their ability to “draw conclusions and craft analysis, rather than simply memorize rote fact”.

An online test should be about collaborating in real-time, responding to real-world issues, and creating what is real to the learners now and in their future.

An online test should not be mired in the past. It might save on paper-related costs and perhaps make some grading more efficient. But that focuses on what administrators and teachers want. It fails to provide what learners need.

Click to see all the nominees!

QR code

Get a mobile QR code app to figure out what this means!

My tweets


Usage policy

%d bloggers like this: