Another dot in the blogosphere?

Posts Tagged ‘data

What 33-year-old lesson is worth revisiting? Data, no matter how precise and revealing, can be massaged and manipulated to say otherwise.

How did I learn this lesson in 1989? I was a young and inexperienced platoon commander (PC) then. Like other PCs in my training school, I was assigned a platoon of about 40 recruits and assisted by four corporals and one sergeant.

Only one of my corporals had the experience of two or three prior batches of recruits. The sergeant had just been promoted and was the most experienced of us all with over a year’s head start.

Despite our collective naiveté, we ended up topping the other three platoons in my company in terms of performance in almost every measure. We were in line for the Best Platoon award and this meant bragging rights on a plaque and an extra day off for my men.

But when the officers met to discuss the results and award, I was shocked to learn that my Officer Commanding (OC) and in Second-in-Command (2IC) decided that it was not good form to reward such a junior team. It was a “privilege“ for more senior teams.

They massaged and poked at the data to highlight the achievements of the second-best platoon (it was narrowly behind mine) so that they emerged first by selective measures. 

Photo by Andrea Piacquadio on

To add insult to injury, the reverse happened to me when I was a more seasoned officer and about to leave the army. A junior officer and his team beat my platoon to first place, but this time the OC and a different 2IC decided to stick strictly to the data.

I learnt such a hard lesson that I remember it to this day. 

When I was an undergraduate, a graduate student, and then a university professor, I did not need lessons on how people massage data to suit their conclusions. I refused to work with such people.

I left full-time university employment almost eight years ago. But half-baked news articles and opinion pieces that arrive at a conclusion first and find data to feed it remind me that massaging and manipulation are still common practices.

I stay true to the scientific method and the broader ethic of the social sciences. I let ideas, clues, and possible conclusions emerge from data. I tell others how to do the same when these teachable moments arrive.

Tags: , ,

Have you ever wondered what it feels like to be a prophet warning others of impending doom? Climate activists and scientists do. They know what it is like to be ignored.

The edtech world has observers and prophets who gaze forward and see issues too. This world does not have as urgent a set of problems, but they are no less important.

One issue is the commoditisation and commercialisation of education. We should be worried when open courses meant to level access get locked behind paywalls. We should be even more worried when a mega company that is irresponsible with user data wants in on the game.

It has been said that if an online service is free, you are not its user or customer. You are its data and product. 

Like managing climate change, what we do individually matters. We can limit what Facebook does with our data and with us as data points. We might not be in the position to create or enforce regulations, but we can take personal control. If we do not take action, we only have ourselves to blame. 

Both the individuals who tweeted that social media is not inherently harmful expressed their righteous indignation. 

But opinion is not fact. Facts are backed up with rigorous research and critical analysis of data on “screen time” and “addiction”. I curate a running list on those topics at Diigo.

For example, the latest two articles are summaries offered by The Conversation

If we want to create conditions for change, we need not just righteous indignation, we also need research-based indignation. Most people will shout down the former because there is no firm ground on any side. Some people are going to ignore the latter, but at least we have a firm foundation to stand tall on.

The details in the article, Application installed on students’ devices does not track personal information, reminded me about some unanswered questions on student data management and learner self-regulation.

First, some background on the device management application (DMA) that will be installed in all student-owned devices. It reportedly does not keep track of “location, identification numbers or passwords”. It should not.

But the DMA will “capture data on students’ online activities such as web search history… and device information such as the operating system”. If forensics can use those to identify a person, is that not “personal information”?

Consider how your typing rhythm can already be used to identify you or how the different sounds of a keyboard can be used to figure out what you are typing. Raw data generated by a person can be identifiable and personal data.

According to the tweeted news article, a petition against the DMA from around 6,000 individuals did not dissuade the powers-that-be. The authorities argued that data collection and remote monitoring is necessary to protect children from undesirable sites and behaviours. Cue scary sounds and imagery of pornography, gambling, predators, and screen time.

For argument‘s sake, let’s assume that the data is absolutely secure from hackers. It is, however, available to “appointed DMA vendors”. What might the vendors do with such data? They could use it to develop more applications that profit them (example: plagiarism detectors use student-papers for free but charge a fee for its service).

If the vendors slide on integrity or if data is hacked, the online preferences and habits of our students becomes a trove of ad targeting, market development, data bundling and reselling, etc. We need only examine our own experiences with entities like Facebook — we are not the customer, we are the product — to see how this might happen.

Declaring that student data will be securely stored, stringently controlled, and lawfully protected does not guarantee that the policies on all three will not loosen over time. Consider a recent lesson on how TraceTogether data was supposed to only be for COVID-19 contact tracing, but now can also be used to investigate seven forms of serious crimes.

The declaration also does not indicate an expiration and/or expunging of user data. Bluetooth data from TraceTogether is deleted every 25 days.

Another question to ask about data use is: What will MOE/vendors do if the monitoring results in red flags? The alerts could be due to truly nefarious activities (example: the youth who recently self-radicalised and wanted to attack Muslims) or legitimate research on terrorism. What systems are in place in terms of algorithms and human monitors? What constitutes are reasonable response?

Perhaps my questions have already been answered but have not been made public. Perhaps my questions might provoke some reflection.

But I certainly want to provoke some thought and action in the area of student self-management. Might using tools like the DMA create a reliance on them? Such tools trigger extrinsic motivations, e.g., fear of detection for visiting unauthorised sites or waiting for pings from the system to meet deadlines.

We need such tools to be scaffolds. Scaffolds are removed from buildings as they are constructed because they stand on their own. What else will be put in place to ensure that our students learn to stand independently and think responsibly on their own?

I know of a few schools that rely on educational and social scaffolds instead of DMA-like tools. Students use their phones and computers like we might at home. These devices are unencumbered as we wish or as locked down as we make them. We decide.

The message that tools like the DMA as all-powerful and monitoring might provide some comfort to the public. This is disingenuous because more nuanced questions have not been addressed about their use. Equally, not enough emphasis has been placed on actually nurturing independent and responsible learners.

I am still surprised that there does not seem to be much discussion about use of TraceTogether (TT) data for secondary use. That is, how TT data for public health might extend to criminal investigations.

For me, such a move is like sharing your salary information with a trusted life insurance agent only to be approached by car sales folk or income tax auditors. Data for the good of one thing (customising an insurance plan) somehow gets used for something else (sell you something, investigate income).

What little discussion I have processed seems to focus on user privacy as enabled by technical and policy protections [1] [2] [3]. But these conveniently bypass an equally important and preliminary issue — permission. I elaborate on this after I reflect on what I have read.

Screenshot of the TraceTogether app.

One justification for using TT data for criminal investigation seems to be that the data is just another source of information. We might consider this part of the process of triangulation.

For example, a team that is investigating, say, an unsolved murder will look for as much information as they can to figure out whodunnit and why. The where, when, how, and what is the domain of forensics.

A modern forensics team will not just rely on possible witnesses. It might also look at video-based data (e.g., CCTVs) and digital traces to figure the who and they why. In this hypothetical case, digital traces might include data from a TT token or app.

A layperson with a rudimentary understanding on how TT works might realise that TT collects proximity (who was around and how near) and temporal (how long) information. A techie and technocrat would like you to focus on how difficult it is to get this information because of siloing and encryption. They would be right to focus on how TT has built in privacy measures.

But it does not take sophisticated skills to cast doubt on such information. How? My guesses are that a would-be criminal merely need install the TT app on someone else’s phone and have them walk somewhere else, or hand their TT token to that person to do the same.

Alternatively, the would-be criminal might give the token and app to two different people in different places so that s/he would seem to be interacting with very different people at the same time. This could call into question the validity of such data as evidence.

The validity of TT data can be compromised by technically unsophisticated acts. So just how valid and important is TT data for solving this hypothetical case?

In contrast, it would take sophisticated skills to manipulate video data or to not get recorded on video in the first place. Other than video evidence, there are other strong points for triangulation, e.g., data in the form of text exchanges, breadcrumb trails in financial records, etc. Do we need TT data?

That question is something the authorities might already have a firm answer to. It is probably safe to assume that most of the general population is not out to commit crimes while they have the TT token or app on their persons, so we are safe from that worry.

But we are not safe from an insidious erosion of privacy. Insidious because it is not obvious — the TT data was arguably only for COVID-19 contact tracing (i.e., public health safety). But because it has the potential to assist in crime investigations, it was added to the umbrella of overall safety.

Designers of artificial intelligence and information systems know how easy it is to abuse the personal data of people even when those people given permission for their data to be used. We need only recall the fallout that resulted from Facebook’s loose grip on data that led to misuse by Cambridge Analytica.

With a limited scope of information and imagination, I do not claim to have answers. But I still have questions. Did we give permission for TT data to be used for other purposes? Was it unreasonable to assume that TT data was only for public health safety? How sophisticated are we as a society if we let slide the secondary use of TT data? How much trust are the authorities willing to exchange for expediency? If circumstances change so that secondary use of data is critical, how might those with more power communicate with those with less?

It has been more than a day after Channel News Asia (CNA) reported this parliamentary exchange. We found out that TraceTogether data — collected by tokens or phone apps for COVID-19 contact tracing — could also be used for criminal investigations.

Apparently this is part of an umbrella safety policy where “citizen safety and security is or has been affected” [YouTube video of parliamentary exchange when this was mentioned]. This was news to me.

I downloaded the app and collected my token under the promise that the data had a singular purpose. Re the same CNA report:

A privacy statement on the TraceTogether website had earlier said the data would only be used “for contact tracing purposes”.

Here is one response from an academic at NUS:

In a tweet thread, he related how he had also co-authored a paper cautioning against the secondary use of such data. His rationales:

  • The gains from the secondary use of such data is small compared to the risk of loss of public trust.
  • The use the token or app is practically mandatory given the need to enter public spaces like malls and train stations. This favours surveillance at the cost of privacy.

My thoughts? I am all for TraceTogether for the purpose it was designed for. If we extend its use without the express consent of its stakeholders, we are make the mistake that developers of information systems try to avoid, i.e., the unintended and unexpected consequences of data use and manipulation.

We already have extensive surveillance in the form of our near ubiquitous CCTVs. You need only make the effort to count how many you walk by as you go about your business. Is extending the use of TraceTogether data for crime investigation worth the cost of breaking TrustTogether?

I am a fan of proper infographics, not wannabes or pretend-to-bes or images mistakenly labelled as infographics.

One of my favourite sources of infographics is BeautifulNews. Here is one of their more recent illustrations.

Then there are data visualisations that create an emotional and cognitive impact.

The tweeted two-minute video illustrates how something fuzzy like income inequality in USA can be cleverly illustrated with an actual pie and not a pie chart.

The impact is made not just from the final product of pie allotments, but also the process of getting there. This creates shock or surprise, which might then trigger some decision-making on the part of the learner.

These are just a few of many clues when deciding when to use static visuals or moving ones for cognitive dissonance.

This timely tweet reminded me to ask some questions.

Other than “learning styles”, are career guidance programmes here going to keep wasting taxpayer money on Myers-Briggs tests for students and the same training for teachers?

Are people who claim to be edtech, change, or thought leaders still going to talk about “21st century competencies” and “disruption” this decade?

Might people keep confusing “computational thinking” or “authoring with HTML” with “coding”?

Will administrators and policymakers lie low in the protection and regulation of the privacy and data rights of students?

Are vendors going to keep using “personalised learning” and “analytics” as catch-all terms to confuse and convince administrators and policymakers?

Are sellers of “interactive” white boards still going to sell these white elephants?

Are proponents of clickers going to keep promoting their use as innovative pedagogy instead of actually facilitating active learning experiences?

I borrow from the tweet and say: Please don’t. I extend this call by pointing out that if these stakeholders do not change tact, they will do more harm than good to learners in the long run.

Recently I tweeted a comprehensive opinion piece that critiqued the amendments to the Children and Young Persons Act.

I agree that Singapore needs to do more by way of legislation and regulation to protect the data privacy and rights of minors. I also favour doing the same for young adults in the context of higher education and LMS use.

But I wonder what unwanted signals this declaration makes:

Thankfully, Singapore has not experienced such high-profile incidents relating to the breach of children’s digital privacy or online harms.

Does it take a “high-profile incident” for us to take action? It should not. It speaks poorly of us as a people if we only react when tragic things happen to important people.

Does the the paucity or absence of data for a phenomenon mean it does not happen? I think not.

I recall having a passing conversation with a university staff about how much abuse was swept under the table. This was right before high profile cases made the news and universities here seemed to react with countermeasures.

Universities were already aware of what was happening in their campuses. It was just that the data was not shared and so the issues were not transparent to the public.

So shit still happens and about as regularly as we have bowel movements. They seem mundane or are unpleasant to talk about. But if they reveal a health problem we might have, are we going to try to flush the symptoms of an underlying problem away?

Today I combine a quick reflection on a video I just watched and a book that I am reading.

Video source

If you want to know what a dead fish, an MRI machine, and statistics have in common, watch the video above.

The salmon experiment was a red herring. If you focus on the living results coming from dead fish in an MRI machine, you miss the point of the video: Research needs to be based on good design, not led by a foregone conclusion.

That should seem like a given, but the fact that the point needed to be proven and made is evidence that people and scientists need constant reminders.

Here is another reminder and it comes from page 109 of Charles Wheelan’s book, Naked Statistics.

Our ability to analyse data has grown far more sophisticated than our thinking about what to do with the results. — Charles Wheelan

This quote was from Wheelan’s chapter about starting with good data. He was trying to make the point that no amount of elaborate or sophisticated analysis was going to make bad data any better.

For example, we might need a representative sample but select a biased one instead. There is no analysis process that is going to improve a bad sample. That data and the research is tainted from the start.

So the next time someone declares “Research says…” we know better than to take what follows at face value.


Usage policy

%d bloggers like this: