Another dot in the blogosphere?

Posts Tagged ‘assessment

272/365: Student by Rrrodrigo, on Flickr
Creative Commons Creative Commons Attribution-Noncommercial 2.0 Generic License   by  Rrrodrigo 

Recently I read an article on The Atlantic, The End of Paper-and-Pencil Exams?

The headline asked a speculative question, but did not deliver a clear answer. It hinted at mammoth change, but revealed that dinosaurs still rule.

Here is the short version.

This is what 13,000 4th grade students in the USA had to do in an online test that was part of the National Assessment of Educational Progress. They had to respond to test prompts to:

  • Persuade: Write a letter to your principal, giving reasons and examples why a particular school mascot should be chosen.
  • Explain: Write in a way that will help the reader understand what lunchtime is like during the school day.
  • Convey: While you were asleep, you were somehow transported to a sidewalk underneath the Eiffel Tower. Write what happens when you wake up there.

This pilot online assessment was scored by human beings. The results were that 40% of students struggled to respond to question prompts as they were rated a 2 (marginal) or 1 (little or no skill) on a 6 point scale.

This was one critique of the online test:

One downside to the NCES pilot study: It doesn’t compare student answers with similar questions answered in a traditional written exam setting.

I disagree that this is necessary. Why should the benchmark be the paper test? Why is a comparison even necessary?

While the intention is to compare the questions, what a paper vs computer-based test might do is actually compare media. After all, the questions are essentially the same, or by some measure very similar.

Cornelia Orr, executive director of the National Assessment Governing Board, stated at a webinar on the results that:

When students are interested in what they’re writing about, they’re better able to sustain their level of effort, and they perform better.

So the quality and type of questions are the greater issues. The medium and strategy of choice (going online and using what is afforded there) also influence the design of questions.

Look at it another way: Imagine that the task was to create a YouTube video that could persuade, explain, or convey. It would not make sense to ask students to write about the video. They would have to design and create it.

If the argument is that the YouTube video’s technical, literacy, and thinking skills are not in the curriculum, I would ask why that curriculum has excluded these relevant and important skills.

The news article mentioned some desired outcomes:

The central goal of the Common Core is deeper knowledge, where students are able to draw conclusions and craft analysis, rather than simply memorize rote fact.

An online test should not be a copy of the paper version. It should have unGoogleable questions so that students can still Google, but they must be tested on their ability to “draw conclusions and craft analysis, rather than simply memorize rote fact”.

An online test should be about collaborating in real-time, responding to real-world issues, and creating what is real to the learners now and in their future.

An online test should not be mired in the past. It might save on paper-related costs and perhaps make some grading more efficient. But that focuses on what administrators and teachers want. It fails to provide what learners need.

If this tweet was a statement in a sermon, I would say amen to that.

Teachers, examiners, and adminstrators disallow and fear technology because doing what has always been done is just more comfortable and easier.

Students are forced to travel back in time and not use today’s technologies in order to take tests that measure a small aspect of their worth. They bear with this burden because their parents and teachers tell them they must get good grades. To some extent that is true as they attempt to move from one level or institution to another.

But employers and even universities are not just looking for grades. When students interact with their peers and the world around them, they learn that character, reputation, and other fuzzy traits not measured in exams are just as important, if not more so.

Tests are losing relevance in more ways than one. They are not in sync with the times and they do not measure what we really need.

In an assessment and evaluation Ice Age, there is cold comfort in the slowness of change. There is also money to be made from everything that leads up to testing, the testing itself, and the certification that follows.

Like a glacier, assessment systems change so slowly that most of us cannot perceive any movement. But move they do. Some glaciers might even be melting in the heat of performance evaluations, e-portfolios, and exams where students are allowed to Google.

We can either wait the Ice Age out or warm up to the process of change.

By reading what thought leaders share every day and by blogging, I bring my magnifying glass to examine issues and create hotspots. By facilitating courses in teacher education I hope to bring fuel, heat, and oxygen to light little fires where I can.

What are you going to do in 2014?

I finally read a tab I had open for about a week: A teacher’s troubling account of giving a 106-question standardized test to 11 year olds.

This Washington Post blog entry provided a blow-by-blow account of some terrible test questions and an editorial on the effects of such testing. Here are the questions the article raised:

  • What is the purpose of these tests?
  • Are they culturally biased?
  • Are they useful for teaching and learning?
  • How has the frequency and quantity of testing increased?
  • Does testing reduce learning opportunities?
  • How can testing harm students?
  • How can testing harm teachers?
  • Do we have to?

The article was a thought-provoking piece that asked several good questions. Whether or not you agree with the answers is moot. The point is to question questionable testing practices.

I thought this might be a perfect case study of what a poorly designed test looks like and what its short-term impact on learning, learners, and educators might be.

The long term impact of bad testing (and even just testing) is clear in a society like Singapore. We have teach-to-the-test teachers, test-smart students, and grade-oriented parents. We have tuition not for those that need it but for those who are chasing perfect grades. And meaningful learning takes a back seat or is pushed out of the speeding car of academic achievement.

We live in testing times indeed!

I read with great interest the initial reactions to the post-PSLE assessment book/paper burning by a group of parents and kids as well as the forum letters and editorials that followed.

Some people were vehemently against the act and explored the historical reasons of book burning. Others said that the burning was simply one of catharsis.

I think that many of the responses were emotions disguised as logic. It is perfectly acceptable to be passionate, but that does not mean losing your head about what you are passionate about.

To liken the revision paper burning to the way Nazis burnt books is ridiculous. The former was at worst an ill-judged cathartic release. The latter was based on terrible ideology.

No doubt that both types of burning look the same, but they have different origins and purposes. We regularly incinerate old books, newspapers, and other paper-based material (along with other rubbish) instead of reusing or recycling. Where are the voices and arms raised then?

To paint both with the same black-or-white brush is like saying all killing is bad. We kill for food, greed, defence, revenge, etc. Depending on the context, some killings are easier to justify than others.

I agree that the parents who organized or facilitated the burning might have inadvertently sent the wrong message that burning is the thing to do. There are certainly other ways to express relief (perhaps less emphatic or dramatic) other than burning.

But to judge without first understanding and attempting to educate all parties is just as harmful. It is a way to burn the bridge that links both sides.

Like it or not we live in a world with more shades of grey than ever before. That is why is it less important to transmit values than to teach the next generation how to think critically and recreate the values that matter.

One of the announcements at this year’s National Day Rally was a wider spectrum of entry criteria for the Direct School Admission programme.

Some might say the DSA makes a mockery of standardized exams because it allows Primary school students to get into the Secondary school of their choice. While Primary School Leaving Examination (PSLE) results are still used as criteria once they are released, the student with entry via DSA already has a foothold that non-DSA students do not.

A few might wonder if the PSLE is even necessary if such an alternative form of evaluation exists. Others might argue that the DSA criteria are not enough.

That brings us back to increasing the selection criteria for DSA. What traits might students be evaluated on? Leadership? Character?

When those traits were bandied about in popular media, people asked if things like character and leadership could be measured among 12-year-olds.

You can measure just about anything, even fuzzy, hard to quantify things like happiness [happiness index]. But let us not kid ourselves into thinking that these measures are absolute, objective, or universal.

A trait like creativity is due to many things, and an instrument no matter how elaborate, cannot measure all aspects of creativity. Most fuzzy concepts, like beauty, are subjective no matter how much you quantify them. Ask anyone to define creativity or beauty and you will get different answers; there is no single understanding.

Whenever you measure anything, there are margins of error that originate from the measurer and the measuring instrument. Sometimes the object or subject measured introduces error. Consider what happens if person A measures a fidgety person B’s height with a tape measure.

Let us say that you could measure leadership or character precisely. Just because you can does not mean you should. How different is a person when he is 6, 12, 18, 24 or 36? What if a value judgement at 12 puts a child on a trajectory that s/he is not suitable for?

We learnt that the hard way when we started streaming kids when they were 10 (Normal, Extended or Monolingual). Thankfully that process has been removed from our schooling system. Actually, I take that back. We still test for “giftedness” at 10. Some schools start pre-selecting at 9.

That said, we would be foolish to think that we do not already gauge people on fuzzy traits like character. It happens in the hiring and firing of employees. Some might argue that we are just bringing that process up the line of development.

There are many ways to measure fuzzy traits. At a recent #edsg conversation, I tweeted:

Whether or not these measures to provide alternative evaluation are implemented, we will read in forum letters, blog entries, and Facebook posts rhetorical statements like “parents must change their mindsets.”

Of course they must. But they are not going to do so automatically.

Folks who highlight mindset sometimes fail to realize that you have to start somewhere with behaviour modification. In systemic change, you start with one or more leverage points. In our case, it is the way people are evaluated.

When I read this Wired article, The Nielsen Family is Dead, I saw parallels in the de-emphasis and modifications to Nielsen ratings for TV and assessment in schools.

Here is something from the article to chew on:

Since the 1970s, television has been ruled by the Nielsen Family—25,000 households whose TV habits collectively provide a statistical snapshot of a nation’s viewing behavior. Over the years, the Nielsen rating has been tweaked, but it still serves one fundamental purpose: To gauge how many people are watching a given show on a conventional television set. But that’s not how we watch any more. Hulu, Netflix, Apple TV, Amazon Prime, Roku, iTunes, smartphone, tablet—none of these platforms or devices are reflected in the Nielsen rating.

To borrow from the structure of the article, the dominant and traditional assessment system has been tweaked, but it still serves one fundamental purpose: to gauge how many student are paying attention in class and able to regurgitate information or perform drills.

This is not the only way we learn any more. Google, Khan Academy, Coursera, iTunes U, YouTube Edu, TED-Ed, smartphones, slate PCs — almost none of the skills and values that result from leveraging on these resources appear on the radar of traditional assessment.

At the risk of sounding like a TV programme rerun, I have to say teachers and textbooks are no longer the fountains of knowledge that students must make their way to and drink from in a particular way and at a particular time.

Later in the article I read this:

Nielsen and others have been scrambling to generate a new kind of TV rating, one that takes into account all of the activity that occurs on screens other than a television.

One such “new kind of TV rating” is social media-based metrics.

Are schools scrambling to generate new ways to measure learning? Ways that take into account the learning that occurs in situations outside the classroom for example?

We do not learn just because someone talks to us or attempts to teach us in a formal context. Sometimes we learn despite our schooling, not because of it. Why are we not trying to measure learning as and where it happens?

Measuring these elements is difficult. Thinking of alternatives is difficult. Creating buy-in for alternatives is difficult. But often the things most worth doing are difficult.

Food For Thought, Covent Garden, London by Kake Pugh, on Flickr
Creative Commons Attribution-Noncommercial-Share Alike 2.0 Generic License  by  Kake Pugh 

Last week’s finds provided some really good food for thought.

Humanizing Our Organizations Through Social Media was a great reminder on why institutes of higher learning should leverage on Facebook and Twitter.

Higher Ed and the Monastic Space provided a perspective on how to better use the face-to-face time in classes.

Measuring what you contribute rather than what you collect, my favourite of the lot, provided some concrete examples of so-called alternative assessments. I think they should be mainstream assessments as they are relevant now.

So as not to get mentally fat, I plan on acting on what I consumed!

Click to see all the nominees!

QR code

Get a QR code reader to figure out what this means!

My tweets


Usage policy


Get every new post delivered to your Inbox.

Join 74 other followers

%d bloggers like this: