Another dot in the blogosphere?

Posts Tagged ‘assessment

When I reflected on how we might work towards a better essay, I also discovered the work of Mike Sharples.

His informative and enlightened tweet thread on AI and essay writing is an excellent read. About halfway in the thread, he mentioned how teachers could use the AI that helped students “write” to evaluate those same essays. So he suggested this (part 7):

He also suggested three approaches to embrace AI-enabled essay writing which I paraphrase below:

  1. Hook and critique: Students use AI to start essays, but students continue the essays by improving the content and writing.
  2. Take turns with AI: Students take turns with AI to write an essay, e.g., student and AI write alternate paragraphs. 
  3. Rise above: Facilitate discussions with students about the ethics and limits of relying on AI.

One example of the third point was what Sharples point out in an early tweet (part 3): 

The AI generated the wrong reference by using plausible information in its database. It still took human effort and judgement to see it through.

Then Sharples left a challenge for all teachers and educators (part 11): 

We cannot simply throw our collective hands in the air and give up. We need to take the AI bull by the horns and learn to ride it or corral it. His three approaches above provide some riding tips. His last tweet challenges us to design and implement better assessment.

For me, his challenge is like telling teachers now to not set Googleable questions. While students need to show that they can remember and understand, they can search for these answers. What matters more is their ability to analyse, evaluate, cooperate, and create. The challenge now is the same one in future: What assignments and assessments are we designing that really help our students to learn?

If we keep trying to answer that question, we do not better Google or outwit AI. Instead we battle our own bias and ignorance, and keep us on the path of timeless learning.

Last week, I had planned on reflecting on Martin Weller’s excellent piece, The Tricky Questions For Assessment To Answer. But I am glad that I held off because of this tweet:

I had previously reflected on how we should not return to normal in schooling and education because normal is backward, cruel, and wasteful. We need to do better.

One way to do this is to apply that educational research and critical practice that focuses on seamless and authentic learning. These challenge the norms of schooling and education because they often break out of curricula, fixed schedules, or standard assessments. 

The nature of assessments is a good example and brings me back to Weller’s thoughtful piece about essays as a form of assessment. 

He started with the premise that online exams are the new expectation. Citing a published poll:

  • 89% of the sample looked forward to in-person classes
  • But only 31.4% wanted traditional exams

Weller briefly described the different formats of online exams: 

…standard exam essays with anything from 24 hours to 3 weeks to complete; timed ‘real time’ exams over three hours or so; proctored online exams; multiple choice and other automatic assessment.

Then he summarised research from his own institute, the Open University about the benefits of online exams:

…more students complete the exams, they are closer to their continual assessment scores… EDI* participation is improved…

*I think he was referring to Equality, Diversity and Inclusion.

However, the adoption of online examinations has exacerbated the arms race with sites that offer essays. In the not too distant future, we might have to contend with AI-written papers as @sharpl pointed out.

If we stay where we are (or worse, run backward), we label these developments cheating or unethical. We somehow do not want students to collaborate or innovate on assessment and yet demand this of them when they enter the workforce.

The issue is the essay as the measure of academic ability. It is what drives online paper mills, plagiarism detection tools, and even online proctoring systems.

So Weller listed alternatives like e-portfolios, group projects, solving ill-structured problems, promoting self-evaluation (e.g., process analysis and critical reflection). 

Weller stated that the alternatives, just like traditional exams or essays, are not immune to cheating. It is hard to stop a person who really wants to cheat from cheating, which is probably why systems seems to focus on deterrent measures, e.g., anti-plagiarism tools. 

But we might not be doing enough to shift the mindsets of students and their teachers. If potential cheating in the problem, then we should face it head on with questions, dialogue, and debate. Weller called this engagement; I consider it good education. After all, what use do we have for engineers or artists if they are not ethical people first?

I like to refer to assessment as the tail that wags the dog. If you do not change it, nothing else really changes about teaching. So if we are to develop and adopt better pedagogy, we need better assessment.

Photo by Sebastian Voortman on

In another excellent reflection about what the pandemic might teach educators, Martin Weller argued how “the shift to online can open up other avenues for assessment”.

He listed and linked eight examples of “Internet native” forms of assessment near the end of his piece. They ranged from what might be familiar forms like assessing contributions in discussion forums and online peer assessment to perhaps less common examples of editing Wikipedia entries and creating open resources. 

While his list is not exhaustive (how can it be?), it links to published research on the eight examples. It is a good start for anyone wishing to pivot to better assessment, be it online or not.

An over-simplified answer to the complex issue of “why scrapping mid-year exams is giving some parents more anxiety than relief” is that many parents still only understand and place value in summative examinations.

So the article, penned by a professor (and ex-colleague) from the National Institute of Education, Singapore, tried to inform readers about:

  • the differences between summative and formative assessment
  • assessment of learning 
  • assessment for learning
  • research on feedback strategies for learning

In short, the tweeted article was an attempt to bridge the gap between where parents are now and where they need to be when supporting their children when there are fewer high stakes examinations.

All that said, I wonder if an editor was too hasty with a computer mouse and keyboard. 

For example, I was surprised to read that summative assessments “provide valuable input on how much more is needed to do better”. That is not what summative assessments typically do, especially if they are high stakes like the PSLE. There is no remediation after students take that exam because it sorts them into secondary schools.

There might be remediation after mid-year examinations (and other tests), but as the marks add up to an overall grade, such exams are typically summative. However, since they are a half-year check point, they can be formative as well.

Formative feedback does not just revolve around assessments. Parents need to know that feedback on any student work — homework, group project, performed skill — can be formative. Such feedback focuses on learning and mastery, both without which summative assessment is pointless.

Perhaps some parents do not know how to uncouple assessments and grades. It is important that they understand this because doing work (or improving it) just for a grade (or a better grade) sends the wrong message. 

But wait, there is more! Progressive teachers have already tinkered with alternative strategies (see above) and assessments. A few, particularly in higher education are part of the ungrading movement. Most Singaporean parents are probably not ready to even contemplate these.

So, in the spirit of summative assessment, I would give them a failing grade for refusing to catch up if they know better. If they seek formative feedback, they can contact any assessment-literate educator.

Anyone aiming to be assessment literate needs to unpack the principle represented above. 

Here is my deconstruction. Not everything that can be assessed is worthwhile. Not everything that is worthwhile can be assessed. The overlap of what is assessable and worthwhile is small — this should be our focus.

If I was still a full-time professor, I would probably be a member of the ungrading movement. This is the pushback against number and letter grades because these often obstruct learning.

As the visual in the tweet above illustrates, grading and current forms of assessment ignore what lies beneath. They are not designed for the long tail of learning or the less tangible aspects of learning.

This is why I work with organisations that have a more progressive stance on what counts as success. For example, with one institute the focus is on formative feedback and the course is pass/fail. In another group I work with, my modules have no required assessment — I can focus on what my students need in the short and long term. Both allow me to facilitate learning by diving into what is hidden from hurried and mechanical assessment procedures.

The age of COVID-19 has pushed us to rely on technologies for remote teaching and learning. But how far have we pushed ourselves pedagogically? How have we actually changed the way we assess learning?

This Times Higher Education (THE) article started with the premise that the assessment of learning in higher education is often an afterthought that still takes the form of pen and paper examinations.

Traditional mainstays of assessment have failed in the age of COVID-19. This was evidenced by remote proctoring debacles and the abandoning of IB and GCE/GCSE exams.

According to the article, such dated assessment design is down to bureaucracy, i.e., administrative needs prioritised over student and learning needs. Students and faculty have little power (if any) to question the status quo.

A professor, Dr Jesse Stommel, who was interviewed for the article declared:

He and other interviewees were effectively suggesting what I like to call the pedagogy of trust (PoT). PoT is built on a foundation that students have varied life experiences, diverse needs, and a broad spectrum of goals.

Part of the PoT in assessment design might include more authentic assessments that are based on real-world issues, perhaps shaped by students themselves, and require meaningful opportunities for cooperation.

The article did not suggest how we might implement PoT in detail. To do so, faculty need to answer this question: Is trust mostly earned or created?

If educators think that students need to show that they are trustworthy first, nothing will change. There will always be some students who will cheat and take shortcuts. Ironically, they might do so because of university rules and procedures that assume that they are not trustworthy in the first place.

For example, students typically need to take an anti-plagiarism/cheating module and quiz that are both online because the university prefers an efficient and hands-off mode. Students soon discover that they can use more than one device and/or cooperate with one another to clear this administrative hurdle.

PoT starts with the educator: Opportunities for trust need to be created. This could mean taking the time and effort to be assessment literate, explaining the design and purpose of assessments to students, and counselling students who make mistakes.

This is my reflection about how a boy gamed an assessment system that was driven by artificial intelligence (AI). It is not about how AI drives games.

If you read the entirety of this Verge article, you will learn that a boy was disappointed with the automatic and near instant grading that an assessment tool provided. The reason why he got quick but poor grades was because his text-based answers were assessed with a vendor’s AI.

The boy soon got over his disappointment when he found out that he could add keywords to the end his answers. These keywords were seemingly disjointed or disconnected words that represented key ideas of a paragraph or article. When he included these keywords, he found out that he could get full marks.

My conclusion: Maybe the boy learnt some content, but he definitely learnt how to game the system.

A traditionalist (or a magazine wiriter in this case) might say that the boy cheated. A progressive might point out that this is how every student responds to any testing regime, i.e., they figure out the rules and how to best take advantage of them. This is why test-taking tends to reliably measure just one thing — the ability to take the test.

If the boy had really wanted to apply what he learnt, he would have persisted with answering questions the normal way. But if he did that, he would have been penalised for doing the right thing. I give him props for switching to a strategy that was gamed from the start.

This is not an attack on AI. It is a critique on human decision-making. What was poor about the decisions? For one thing, it seemed like the vendor assumed that the use of key words indicated understanding or application. If a student did not use the exact key words, the system would not detect and reward them.

It sounds like the AI was a relatively low-level matching system, not a more nuanced semantic one. If it was the latter, it would be more like a teacher who would be able to give each student credit when it was due if the same meanings were expressed.

The article did not dive into the vendor’s reasons for using that AI. I do not think the company would want to share that in any case. For me, this exhibited all the signs of a quick fix for quick returns. This is not what education stands for, so that vendor gets an F for implementation.

Depending on how you design and implement it, assessment is when students learn the least or the most.

Students might learn little to nothing if there is no assessment or if the assessment is not constructively aligned to learning outcomes, content, teaching strategies, and learning experiences.

On the other hand, learning is tested and measured if well-designed assessment challenges students to apply, analyse, evaluate, create, and/or cooperate.

Whether formative or summative, assessment puts the responsibility of showing evidence of learning on the student.

Prior to this, the teacher might have delivered information or provided other learning experiences. But we still do not know if the student has learnt anything.

Prior to assessment, students might have the opportunity to negotiate meaning with their peers, but we still do not know if the students have learnt anything.

The evidence of learning is often in the assessment. And yet this is one of the most poorly conceived and dreaded aspects of the teaching and learning process. One need only review poorly-written multiple-choice questions, critique vague rubrics, or rail against grading on a curve to see this.

Parts of assessment are also poorly understood by students, parents, and other stakeholders. They see the obvious classroom performances of teaching, but they are not privy to the heart-wrenching and mind-numbing process of, say, grading essays.

In this respect, assessment is like the engine under the hood of a sexy sports car. Everyone sees and appreciates the outsides. Very few know how to design and maintain the insides so that the car actually works like one.

Just like a car, when the engine breaks down, practically everything else stops working. You have a car that isn’t. Likewise, you have teaching that is empty of evidence of learning.

The title of this blog entry was the name of a sit-com in the mid-1990s. It is also how I would title this assessment mistake.

I share the sentiment of the tweet. Who is Susan? Why did she suddenly appear? What does she have to do with the crayon drama?

This is probably a mistake on the teacher’s part. But it was a needless one. It could have been avoided by some thorough proofreading.


Usage policy

%d bloggers like this: