Another dot in the blogosphere?

Posts Tagged ‘assessment

One of the replies to my tweet about the parliamentary response to stolen exam papers — electronic scanning and and marking of scripts — was this tweet.

I had to look up the product and service and found a UK-based website and YouTube video.

Apparently SurpassPaper+ allows students who opt to take electronic versions of an exam on their own devices alongside their peers who opt to take the paper version.

There are several advantages of taking the electronic version. The ones that stood out for me were:

  • Students use a platform they are already accustomed to.
  • The submissions are immediate and do not incur physical handling, storage, security, and transport costs.
  • Proctors can monitor student progress with an app and intervene if necessary.
  • Students can continue on an alternate device should their own fail them.

If all this seems innovative compared to the old-school method of high-stakes exams, then we should cast our eyes on how some standardised tests are regularly taken on Chromebooks in US school districts.

The change is also just an incremental one. Evolutionarily speaking, the new test animal is not that different from the generation before. It has not replaced the old one and actually lives alongside the incumbent species as a minority and novelty.

The bottomline is this: The medium has changed, but the method has not. Changing the medium is comparatively less disruptive and easier than changing the method of assessment.

To change the method is to face the usual suspects of barrier statements. I share just three and pose three questions as responses.

The first barrier statement is: We should not abandon what is good about the old or current method. My questions are: What is objectively good about it? From whose perspective is “good” defined?

The second barrier is an excuse: Now is not the time. My response are: If not now, then when? How will you know when the right time is? What if the right time is too late? How can we make it the right time?

The third barrier is a generalisation: Change will take time. My response is:
Of course it does. But when will you start?

The breaking news that refused to die was about the A-level Chemistry papers that were stolen last year. This time ministers in Parliament discussed how to prevent this from happening again.

The suggestion: Scan the papers and mark them electronically.

For me this was braking news — I had to stop to think about what was actually going on.

Superficially, the issue was about the security of high stakes examinations. While student results are important, the larger messages were missed, i.e.,

  • The exams are still handwritten on paper.
  • They are still reliant on factual recall.
  • The assessment is inauthentic — there is no referencing, no cooperating, etc.

This pays lip service to the supposed 21st century competencies that we are supposed to develop in learners. If we are to do this, we need to pull assessment into the same century.

Like it or not, assessment is the tail that wags the dog. Summative forms of assessment like end-of-course examinations are terminal activities — they are the tail. However, they dictate what is taught, how it is taught, and shape how students opt to learn — they wag the dog.
 

 
The examination in question was the GCE A-Levels. These are taken by girls whose next destination is likely university, and boys who become men via military service (if they are citizens and permanent residents).

However, these students take paper-based exams much the same way they did ten years before when they were in primary school. Heck, I took my A-levels on dead trees and I am older than some trees!

I now mentor, advice, and teach some future faculty who still clutch at paper as the be-all and end-all technology. They teach and test like a book and by the book. The assessment tail does not just wag the dog; it trains the dog and shapes its psyche as it rewards and punishes the dog.

Am I overreacting? After all, the issue was exam paper security and not assessment redesign. But why was the latter not the issue?

Just consider the logistics and costs. The papers had to be transported to the United Kingdom. They had to be stored and provided with some modicum of security. They also had to be transported securely to graders and then brought back centrally for more processing.

Even if every script was scanned and marked electronically, there is still the cost of scanning every page and retraining the graders.

These exercises help the agencies involved in the processes — question-setting, grading, analysing, transport, storage, security, administration, etc. You might think of this as an assessment mill that is dependent on paper mills.

But what of the current student and future employee who has to rely less and less on paper and paper-led habits? Our duty is not to keep the assessment and paper mills alive. It is to help our learners thrive in their future, not our past.

Take writing for example. We still have to write, but how much on paper and how often?

The medium is part of the message and shapes the way we think and craft those messages. For example, I am drafting this reflection in MacOS Notes, I have a web browser with these tabs open: WordPress (for the blog entry), ImageCodr (for the CC-licensed images), and several online references.

The writing skills might be the same — for example, logical paragraphing — but the need to write shorter paragraphs is the new expectation. This reflection is already too long for most people. TLDR. So I also break the message up into chunks with photos (aww, cute doggies and baby!).
 

 
But back to the main topic of changing assessment. I am not suggesting that we throw the baby out with the bath water. I am pointing out that the bath water is still there, getting filthier by the minute, and threatening to drown the baby.

If this analogy is not clear, the paper-based exams are the problem because we do not question their purpose. They solved the problem in the past of how to sort students, and they still do that. But they also create unnecessary stress and entrench old mindsets, neither of which are good for our students.

It is time to throw the bath water out, not build a better receptacle, replace the water, or somehow have self-cleaning water.

One basic assessment design principle is: Do not provide answers in your question.

If you provide answers, it is your fault. If learners take advantage of this, you cannot penalise them (see example above).

Instead of telling the student to see you, you should see yourself, i.e., reflect on the error of your ways and sign up for professional development on Assessment 101.

The tweet below is wrong.

The girl in the image is not five metres tall. She is taller than that.

Including a human figure is a physics problem does not make it friendlier. Claiming that she is a giant does not make it authentic.

Apparently, the diagram was from a textbook. This means that more than one person was involved in designing, editing, and approving the question. All of them have provided a free lesson on how not to design assessment questions.

Far wiser and more articulate people have shared their thoughts on assessment, grading and feedback. So I reshare what they shared.

From these and the work of others, I distill some wisdoms into these image quotes.

Formative feedback

Quantitative grading ends learning. Quality feedback sustains learning.

 
The word “evaluation” might have been ill-defined and misused.

I was surprised to read someone like Senge reportedly saying this about evaluation.

Evaluation is when you add a value judgment into the assessment. Like, ‘Oh, I only walked two steps. I’ll never learn to walk.’ You see, that’s unnecessary. So, I always say, ‘Look, evaluation is really optional. You don’t need evaluation. But you need assessment.

Evaluation is about adding a value judgement into assessment. That is why it is called eVALUation. But that does not make evaluation negative or optional.

Student A might get an assessment score of 60/100. Student B might get an assessment score of 95/100. One way to evaluate the students is to compare them and say that student B performed better than A. More is better and that is the value, superficial as doing that may be.

If you consider that Student A previously got a score of 20/100 and B a previous score of 90/100, the evaluation can change. Student A improved by 40 points; student B by 5 points. The evaluation: Student A made much more improvement than Student B.

The value judgements we bring into assessments are part of evaluations. Assessments alone are scores and grades, and not to be confused with the value of those numbers and letters.

In the context of working adults who get graded after appraisals, a B-perfomer is better than a C-performer. The appraisal or assessment led up to those grades; the worker, reporting officer, and human resource manager place value in those letters (no matter how meaningless they might actually be).

The assessments of children and adults are themselves problematic. For kids, it might be a broad way of measuring a narrow band of capabilities (academic). For workers, it might be an over simplistic way of assessing complex behaviours. So the problem might first lie with assessment, not evaluation.

As flawed as different assessments may be, they are simply forms of measurement. We can measure just about anything: Reasoning ability, level of spiciness, extent of love, degree of beauty, etc. But only evaluation places value on those measurements: Einstein genius, hot as hell, head over heels, having a face only a mother could love.

I have noticed people — some of them claiming to be teachers or educators — not understanding the differences between assessment and evaluation. As the terms have not been made more distinct, evaluation has been misunderstood and misused.

Evaluation is not a negative practice and it is not optional. If evaluations seem overly critical (what went wrong, how to do better), they merely reflect the values, beliefs, and bias of the evaluator. We do not just need assessment, we also need evaluation to give a measurement meaning.

My reflection starts with an Apple Pay verification process and ends with lessons on teaching and assessment.
 

 
When Apple Pay launched in Singapore in May, I jumped on the bandwagon by verifying one of my credit cards. The process was quick and painless: Scan the card details into the Wallet app and verify the card by SMS.

I tried the process with another eligible card, but did not receive SMS verification. I put that down to early implementation issues.

However, I tried about ten times between the launch in May and this month and was still unsuccessful. The Wallet app provided the alternative verification process of calling the credit card issuing bank’s customer service.

I dread using such customer “service” because the process make me feel like a rat being tested in a maze.
 

 
I had to get through several layers of number pressing before getting the option to speak with a human. Once there, I was informed that they were “experiencing a high call volume”.

I missed having an old phone that I could slam down on the receiver.

This particular bank provided the option of leaving my contact number so that I would receive a call-back in four hours. That must have been some really high call volume!

I received one shortly before the four-hour mark and explained how I did not receive SMS verification for Apple Pay from that bank’s system. I also mentioned that I had done the verification for another bank’s card quickly and seamlessly with the same process.

The customer service representative (CSR) was puzzled, checked the messaging records, and told me that SMS had been sent to my phone. I wanted to reply that I was not an idiot, but I bit my tongue. I repeated that I did not receive any despite several attempts over two months.

The CSR then advised me not to use my bank-issued security dongle. I told him that the dongle was irrelevant because it was not a verification option in Apple’s Wallet app. So he said he needed to look into my case and asked if he could call me back in an hour.

As soon we disconnected, something connected. A long time ago, I blocked a few of the bank’s SMS numbers because I kept getting marketing messages despite telling them I did not want any. I wondered if the SMS verification shared one of those numbers.

I figured out how to unblock the numbers and tested the SMS verification for that bank card. It worked as quickly as my first card.

The was not the fault of the bank. It was mine for blocking numbers, irritating as their messages were.

I reminded myself of two lessons on teaching:

  1. You should not just stick to a script. It is important to first listen to the learner’s problem before suggesting a learning solution. The CSR’s advice to not use the dongle was obviously part of a recommended script, but it was irrelevant in this context. Mentioning the dongle not only did not help matters, it added to my frustration.
  2. Thinking out loud is one of the best ways to learn. I knew what the symptom of my problem was (no SMS from the bank), but I did not know its root cause (I had blocked some SMS numbers). Speaking to someone helped me pull thoughts to the surface and helped me find my own solutions.

When the CSR called back, I explained how I had solved the problem myself. He was relieved. I was relieved.

Right after we disconnected, he triggered an SMS to me to rate the customer service by text. It was like being pranked.

Bank SMS.

I did not respond to the SMS because the ratings were too coarse: Below, Meet, Exceed.

The phone service took place over more than one call and had multiple components. Averaging the experience was not meaningful. Detailed feedback on what was good or not good about the experience and analysing a recording of the exchanges are more tedious but better options.

I thought of two lessons on assessment:

  1. The administrative need to collect and collate data drives such bad practice. Just because you collect these data does not make the data meaningful or help CSRs improve. Administrative needs should not drive assessment.
  2. The average rating approach is a hallmark of summative assessment. It is grading an experience. If the CSR received “Exceed”, did he get a pat on the back? If the feedback was “Meet”, would he just keep reading from scripts? If the grade was “Below”, what can he do with that information? Good assessment is based on quality feedback, not just grades.

It does not take special events, teacher observations, prescribed professional development, or even a personal learning network to learn how to teach or assess better. The lessons and reminders are everywhere, even in the Apple Pay card verification process. You just have to pay attention.

The news that caused ripples in Singapore schooling last week was the official announcement from the Ministry of Education (MOE) of the new scoring system that will be implemented in the Primary School Leaving Examination (PSLE) in 2021.

There was a slew of news following the announcement. Some people made tsunamis out of the ripples, some rode the waves as they were [small sample of both].

Beneath the surface was an undercurrent that did not get much attention, but was the most significant change in terms of education. According to STonline, one of the changes was the switch from norm-referenced testing (NRT) to standards or criterion-referenced testing (CRT).

PSLE2021: From NRT to CRT

What are NRT and CRT in layman terms? Why is the switch an important driver of change?

In NRT, the results of a cohort of students are reduced to scores — T-scores in the case of PSLE — and lined up from the highest to the lowest (or vice versa). The result is a bell-shaped curve of scores: There will be a few very low and very high scores, and many somewhere-in-the-middle ones.
 

 
Reviewers of these scores typically use this distribution to create an even curve (a normal distribution, ND), and to rank and sort. In the adult world of work, this method might help determine who gets promotions or bonuses, what appraisal grade you get (if you are in the civil service), or who gets fired.

For example, a large organisation can first rank the performances of all its employees. If an ideal ND does not result, it can statistically massage it into an ideal bell curve. So if there are too many A-graders, some will be pushed into Bs, and as a result Bs become Cs and so forth. Once there is an ideal bell curve, someone can decide cut-offs and consequences, say, the top 5% get promotions and the bottom 15% are let go.

If this seems unfair to working adults, then what more for the 12-year-old children who take the PSLE but have no idea what is going on?

The core problem is that people are compared one against the other with or without their knowledge. If with, this can result in unhealthy competition because they want to be on the right part of the ND curve. If without, the people become victims of processes not transparent to them and circumstances beyond their control.

Is there a better way? Yes, it is called CRT (standard-based assessment and/or evaluation).

Modern corporations like Accenture are abandoning the outdated practice of norm-referencing [1] [2] and embracing comparisons of one. The fundamental principle is this: How one improves and contributes individually over time is more important than how one is measured against others.

For example, a worker might show evidence of specific skills that indicate that he or she is a novice, intermediate, or advanced worker. There is no comparing of all the workers regardless of their skill group or even comparing within each skill group.

To make this work, there must be standards or criteria that identify each skill group, e.g., skills A to J for novices to master; K to R for intermediates; S to Z for advanced plus five potential managerial markers.

Back to PSLE 2021. The switch is from NRT to CRT. It is more about the standards or specific criteria that indicate the test-based achievements of the child, and less about the comparison of one child with another.

This is a fundamental shift in mindset from sifting and sorting to measuring performance. The former is about what is good for the system and how to feed it; the latter is about where the learner is at and what is good for the learner.

However, this piecemeal change of the CRT system of academic levels (ALs) still falls short. I share thoughts on these in more reflections on PSLE2021 over the next few days.

Read Part 2: The Dark Side.


http://edublogawards.com/files/2012/11/finalistlifetime-1lds82x.png
http://edublogawards.com/2010awards/best-elearning-corporate-education-edublog-2010/

Click to see all the nominees!

QR code


Get a mobile QR code app to figure out what this means!

My tweets

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archives

Usage policy

%d bloggers like this: