Another dot in the blogosphere?

Posts Tagged ‘data

You need to read and watch these exposés on how Facebook enabled Cambridge Analytics to access and capitalise on user data.

Video source

Video source

If your time is precious, just watch the video immediately above of the whistleblower revealing his role and the impact of his actions.


  • Cambridge Analytica tapped a professor’s idea to get users’ Facebook data with a personality survey app.
  • According to the whistleblower, it took a few hundred thousand initial users to generate the corpus of data that came from millions of users who were associated with the initial few.
  • The tool mined the initial users’ families, friends, and acquaintances so that the company then had access to tens of millions of users’ data.
  • According to the New York Times, only the 270,000 survey participants gave their permission for their data to be used; the rest from the resulting raw corpus of 50 million profiles did not.
  • The company then fed users with targetted resources to sway opinion.
  • Cambridge Analytica would not have been able to do this if not for Facebook’s fast and loose data use policies. This was not Facebook’s first strike (see my curated resources).

This incident is unlikely to be Facebook’s last because most people seem to close their eyes and mouths to such misuse of data. If Facebook does not lose face, it will brazenly continue along this path.

You can choose to block its path or get off this well-beaten track. Unless you are powerful, influential or heavy-handed, you are unlikely impede the Facebook juggernaut.

If you are like me, you can choose to stay off Facebook or try not to provide it data that it and its partners can turn against you.

Yesterday I mentioned how the edtech vendor DRIP — data rich, information poor — approach was like torture. Today I elaborate on one aspect of data-richness and link that to an under-utilised aspect of game-based learning.

The data-richness that some edtech providers tout revolves around a form of data analytics — learning analytics. If they do their homework, they might address different levels of learning analytics: Descriptive, diagnostic, predictive, prescriptive.

A few years of following trends in learning analytics allows me to distill some problems with vendor-touted data or learning analytics:

  • Having data is not the same as having timely and actionable information
  • While the data is used to improve the technological system, it does not guarantee meaningful learning (a smarter system does not necessarily lead to a smarter student)
  • Such data is collected without users’ knowledge or consent
  • Users do not have a choice but to participate, e.g., they need to access resources and submit assignments to institutional LMS
  • The technological system sometimes ignores the existing human system, e.g., coaches and tutors

I define learning analytics and highlight a feature in Pokémon Go to illustrate how data needs to become information to be meaningful to the learner.

First, a seminal definition from Long and Siemens (2011):

… learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs

ERIC source

The processes of measurement, collection, analysis, and reporting are key to analytics. I use a recent but frustrating feature of Pokémon Go to illustrate each.

My PoGo EX Raid Pass.

The Pokémon Go feature is the “EX Raid Pass” invite system (I shorten this to ERP). Players need to be invited to periodic raids to battle, defeat, and catch the rare and legendary, Mewtwo. The ERP seemed to be random like a lottery and rewarded few like a lottery as well.

Even though Niantic (Pokémon Go’s parent company) provided vague tips on how to get ERPs, players all over the world became frustrated as they did not know why they were not selected despite playing by the rules and putting in much effort.

To make matters worse, a few players seemed to strike the lottery more than once. At the time of writing, I know of one player who claimed on Facebook that he has eight ERPs for the next invite on 9 Jan 2018.

Eight Ex Raid Passes!

Players have swarmed Reddit, game forums, and Facebook groups to crack this nut. Some offered their own beliefs and tips. Much of this was hearsay and pseudoscience, but it was data nonetheless — unverifiable and misleading data.

A few Facebookers then decided to poll ERP recipients about where their EX Raids were. This was the start of measurement as they looked for discrete data points. As the data points grew, the Facebookers compiled lists (data collection).

Such data measurement and collection was not enough to help non-ERP players take action. The collected data was messy and there was no pattern to it.

I know of at least one local Pokémon Go player who organised the data as visualisations. He created a tool that placed pinned locations in a Singapore map as potential EX Raid venues. With this tool, it became obvious that locations were reused for EX Raids.

Potential EX Raids hotspots.

Pattern of reuse of venues for EX Raids.

However, such a visualisation was still not information. While the data pointed to specific spots where EX Raids were likely to happen, they still did not provide actionable information on what players might actually do to get an ERP.

To do this, Facebooker-players asked recipients when their ERPs were valid and when they raided those spots previously. One of the patterns to emerge was normal raids of any levels (1 to 5) at hotspot gyms a few days before Ex Raids. So if an Ex Raid was likely to happen on Saturday at Gym X, the advice was to hit that gym on Wednesday, Thursday, and Friday to increase the likelihood of receiving an ERP.

Collectively, these actions were a form of analysis because of the attempts to reduce, generalise, and ultimately suggest a pattern of results. This actionable information was reported and communicated online (social media networks) and in-person (auntie and uncle network).

The advice to players seeking ERPs is a reduction of much data, effort, and distilled knowledge from a crowd. It illustrates how data becomes information. I have benefitted from the data-to-information meta process because I followed the advice and received an ERP (see image embedded earlier).

The advice does not constitute a guarantee. With more players using this strategy, more will enter the pool eligible for selection. There is still a lottery, but you increase your chances with the scientific approach. You do not just rely on lucky red underwear; you create your own “luck”.

Now back to edtech DRIP. Edtech solutions that claim to leverage on analytics are only good if they not only help the technical system get better at analysis, but also help the teacher and learner take powerful and meaningful action. Edtech solutions that are data rich but information poor only help themselves. Edtech solutions that turn rich data into meaningful information help us.

I reserved this read, Why We Must Embrace Benevolent Friction in Education Technology, for the new year.

A few concepts from the article jumped out at me, but the one that stood out was DRIP — Data Rich, Information Poor. What does this have to do with edtech?

DRIP is a criticism of edtech companies and providers that tout data analytics as a means of controlling, feeding, manipulating, or enabling learners. Data is just that, data. It is not organised information that might become internalised as knowledge and then externalised as intervention.

What edtech providers, particularly LMS and CMS companies, have yet to do is help their clients and partners make sense of the data. This is in part because programmer or provider speak is not the same as teacher and educator speak. There are relatively few people — like me — who can bridge that gap.

So what these providers do is reach out to administrators and policymakers because they all deal with numbers and data. They do so in a way that makes sense to them. It does not help that these discussions are not transparent and also make little sense to teachers and educators.

A while ago I heard about an interrogative torture technique that involved slowly dripping water onto a victim’s head. The slow drips quickly wear down psychological resistance and the interrogators get what they want.

That method does not transfer via DRIP in edtech. It will only drown clients and partners in meaningless data that does not actually help teachers or their learners.

Being “data-driven” seems to have garnered a bad name in some schooling and education circles.

This is probably because of its misuse by edtech vendors for so-called analytics and misinterpretations of what being data-driven means by policymakers. Each is bad enough on its own. Both are lethal in combination.

But here are two recent examples of how being reliant on data is a good thing.

In a recent contest in Singapore, teams of students relied on shared pools of data to create visualisations.

Video source

The video above used data to create awareness of the difficulties that face families who have children with special needs.

Video source

The next video presented data to question commonly held misconceptions about ex-convicts.

Providing concrete visualisations of abstract data is not the same as being data driven. The former is about seeing what is not immediately apparent. The latter can sometimes be about playing the numbers game above all else, and that often ignores or harms the people that make up those numbers.

When being data-driven loses its original intent to inform decisions to actually help people, perhaps data visualisations like the ones above are a timely reminder of what good data might do.

I had an uncomfortable gut feeling when I read this CNA article about biometric payments being available to schools here in 2018.

I had to dig deep for why I was uncomfortable. After all, I am all for technology making lives better. And therein lay the problem: In doing good, there was also the potential for harm.

The good is the sheer convenience of going cashless while being able to track spending. This might be the start of basic financial literacy.

According to the news article, the system has safety measures:

Fingerprint information will not be stored on the device. Instead, the prints will be encrypted and stored securely in a cloud database.

Anti-spoofing technology will also be put in place to ensure that the fingerprints are real and that the person making the payment is present.

This is the trifecta of data accuracy (reading), data security (keeping), and data integrity (reliably identifying). If just one to fails, the system’s users are harmed. Take the recent Instagram hack, for example.

For the sake of argument, let us assume that the three data concepts are sound in practice. What is the harm then?

To answer this question, we need to ask at least one other question: What else can vendors do with the data that is accurate, “stored securely”, and reliable?

The short answer is lots. One needs only look at what Facebook and Google did (and continue to do) with our data. They offer their services for “free” to us because our data serves up advertisements which make these companies money. Lots of it.

One needs only to casually search for data breaches and infringements involving these two companies. For example:

The last item was not so much about the privacy of data as about the use and manipulation of data. That is my point: Assuring stakeholders that data is accurate, authentic, and safe is not enough; it is the lack of transparency and foresight about what can be done with that data.

Students are particularly vulnerable because adults make decisions about their data and the kids have no say in the biometric scheme. By this I am referring to the scheme being employed as a Smart Nation initiative, not the choice of whether to join the scheme.

The issue is so serious that the Electronic Frontier Foundation (EFF) has tips for teachers about student privacy. These include:

  • Making digital literacy part of the curriculum
  • Advocating for better training for teachers
  • Getting parental consent
  • Selecting technology tools carefully
  • Building community of like-minded privacy advocates

A Smart Nation needs people to make smart choices. To do that, people need good information. Where is the information about how the data might be used both intentionally and peripherally? What promises and standards of practice can service vendors and providers be held to? Where is the public debate on the data privacy of the especially vulnerable?

The London Underground system will get 4G coverage by 2019. Yay?

The writer’s reaction summarised in the tweet above was one of dismay. Mine was simply welcome to 2012.

I visited the UK twice two years ago and can relate to the wireless-less experience. I discovered during my second visit that some stations deep underground had wifi so I enjoyed intermittent access.

The article’s writer seems to be predicting some sort of social pandemonium brought about by people yammering loudly and incessantly.

Will it happen? Yes, but not likely to the extent and frequency he projects. Our own train system gets a few loud mouths who have no volume control or social awareness. But really, how many people actually talk that often on their phones?

The writer might get actual anecdotes and data from other systems that have 4G access about loud mouth frequency. He might also find out how such access actually helps commuters.

Being able to communicate by voice, video, text, or emoji provides a crucial channel for alerts and in emergencies. 4G access also activates many eyes in a human monitoring system of nefarious activities.

Writers might like making predictions based solely on opinion and limited experience. They could do better with critical data and lived experiences.

Now if only more readers learnt to tell the difference between these writers…

When I conduct workshops or do talks, I often bring my own Internet connection with a mifi device. 

When I had the presence of mind, I took note of how data I consumed. 

I estimated the amount of data by comparing the before and after quota of my prepaid 4G SIM account from M1.

My talks are interactive and I rely on Google Slides. But these sessions rarely extend beyond an hour and do not require as much data. 

My workshops last about three hours and rely on more media-rich resources. But I save on data consumption by making local versions of online videos. 

I share this as a reminder to myself and as a tip for others roughly how much data to set aside when bringing your own connection while providing professional development with others.

Click to see all the nominees!

QR code

Get a mobile QR code app to figure out what this means!

My tweets


Usage policy

%d bloggers like this: