Another dot in the blogosphere?

Posts Tagged ‘data

More people are probably aware of the importance of managing their personal data post-Cambridge Analytica. But how many know how data is mined, managed, or misused?


Video source

The video above provides a broad explanation of data mining. It distills the processes and purposes of data mining into five strategies:

  1. Classsification
  2. Regression
  3. Clustering
  4. Anomaly detection
  5. Association learning

I enjoyed the video. Unlike the press that focuses on negatives, the video highlighted the benefits of data mining, e.g., predicting disease before it emerges.

It also made a subtle point that is easy to miss, i.e., the bias introduced by humans who make decisions on which data to focus on and what to do with it.

Data, its collection, and its management are not right or wrong in themselves. It is what we chose to do with it that makes the difference.

We shape our tools and then our tools shape us. -- Marshall McLuhan.

Tags: ,

I reflected twice on getting a mobile connection while travelling in Malaysia. The first time I relied on a Digi prepaid SIM; the second time I went with Maxis Hotlink.

I just returned from a short trip, this time with neither a mifi device and nor a Malaysian prepaid SIM card.

Local telco providers have made it a bit more convenient to get connected overseas. Emphasis on “a bit“ and not on “convenient“.

If you are on a postpaid plan, you might have the option of applying for a data plan without removing your sim card and not breaking the bank. However, these options are not likely to be as cheap as getting a Malaysian SIM the moment you land in a Malaysian airport. The telco kiosks for such prepaid SIMs are typically positioned right before you hit immigration counters.

A better deal might be had with a Singapore prepaid SIM. I use StarHub and I could use my allotted local data overseas. I ensured that I had:

  • enough purchased data
  • activated the data roaming option in the app (see screenshot below)
  • activated the data roaming setting in the phone
  • ensured the APN was set correctly (see screenshot below)
  • at least $3 in the prepaid app’s wallet

Data roaming setting in StarHub prepaid app.

The prepaid app provided clear instructions and automated the APN setting. I only found out the minimum wallet amount after receiving an SMS from StarHub once I arrived in Malaysia.

$3 minimum wallet amount required in StarHub prepaid app for roaming.

Your telco might disable the tethering function. This means that you cannot share the prepaid data plan with other devices. This was the case with my prepaid plan with StarHub. However, I discovered that the tethering was enabled once connected to Malaysian providers. Your mileage might vary with the overseas country’s telco service you connect to.

It has taken years for us to reach this “seamless” state and I very much appreciate it. I can still remember a fellow traveller and I getting anxious about getting connected in Denmark just four years ago.

Note: I have not been asked to describe or promote the service by StarHub nor have I been paid by the telco to do so. I am sharing my experience as a reminder of my travel needs and to help others in their decision-making.

You need to read and watch these exposés on how Facebook enabled Cambridge Analytics to access and capitalise on user data.


Video source
 

Video source

If your time is precious, just watch the video immediately above of the whistleblower revealing his role and the impact of his actions.

TLDR?

  • Cambridge Analytica tapped a professor’s idea to get users’ Facebook data with a personality survey app.
  • According to the whistleblower, it took a few hundred thousand initial users to generate the corpus of data that came from millions of users who were associated with the initial few.
  • The tool mined the initial users’ families, friends, and acquaintances so that the company then had access to tens of millions of users’ data.
  • According to the New York Times, only the 270,000 survey participants gave their permission for their data to be used; the rest from the resulting raw corpus of 50 million profiles did not.
  • The company then fed users with targetted resources to sway opinion.
  • Cambridge Analytica would not have been able to do this if not for Facebook’s fast and loose data use policies. This was not Facebook’s first strike (see my curated resources).

This incident is unlikely to be Facebook’s last because most people seem to close their eyes and mouths to such misuse of data. If Facebook does not lose face, it will brazenly continue along this path.

You can choose to block its path or get off this well-beaten track. Unless you are powerful, influential or heavy-handed, you are unlikely impede the Facebook juggernaut.

If you are like me, you can choose to stay off Facebook or try not to provide it data that it and its partners can turn against you.

Yesterday I mentioned how the edtech vendor DRIP — data rich, information poor — approach was like torture. Today I elaborate on one aspect of data-richness and link that to an under-utilised aspect of game-based learning.

The data-richness that some edtech providers tout revolves around a form of data analytics — learning analytics. If they do their homework, they might address different levels of learning analytics: Descriptive, diagnostic, predictive, prescriptive.

A few years of following trends in learning analytics allows me to distill some problems with vendor-touted data or learning analytics:

  • Having data is not the same as having timely and actionable information
  • While the data is used to improve the technological system, it does not guarantee meaningful learning (a smarter system does not necessarily lead to a smarter student)
  • Such data is collected without users’ knowledge or consent
  • Users do not have a choice but to participate, e.g., they need to access resources and submit assignments to institutional LMS
  • The technological system sometimes ignores the existing human system, e.g., coaches and tutors

I define learning analytics and highlight a feature in Pokémon Go to illustrate how data needs to become information to be meaningful to the learner.

First, a seminal definition from Long and Siemens (2011):

… learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs

ERIC source

The processes of measurement, collection, analysis, and reporting are key to analytics. I use a recent but frustrating feature of Pokémon Go to illustrate each.

My PoGo EX Raid Pass.

The Pokémon Go feature is the “EX Raid Pass” invite system (I shorten this to ERP). Players need to be invited to periodic raids to battle, defeat, and catch the rare and legendary, Mewtwo. The ERP seemed to be random like a lottery and rewarded few like a lottery as well.

Even though Niantic (Pokémon Go’s parent company) provided vague tips on how to get ERPs, players all over the world became frustrated as they did not know why they were not selected despite playing by the rules and putting in much effort.

To make matters worse, a few players seemed to strike the lottery more than once. At the time of writing, I know of one player who claimed on Facebook that he has eight ERPs for the next invite on 9 Jan 2018.

Eight Ex Raid Passes!

Players have swarmed Reddit, game forums, and Facebook groups to crack this nut. Some offered their own beliefs and tips. Much of this was hearsay and pseudoscience, but it was data nonetheless — unverifiable and misleading data.

A few Facebookers then decided to poll ERP recipients about where their EX Raids were. This was the start of measurement as they looked for discrete data points. As the data points grew, the Facebookers compiled lists (data collection).

Such data measurement and collection was not enough to help non-ERP players take action. The collected data was messy and there was no pattern to it.

I know of at least one local Pokémon Go player who organised the data as visualisations. He created a tool that placed pinned locations in a Singapore map as potential EX Raid venues. With this tool, it became obvious that locations were reused for EX Raids.

Potential EX Raids hotspots.

Pattern of reuse of venues for EX Raids.

However, such a visualisation was still not information. While the data pointed to specific spots where EX Raids were likely to happen, they still did not provide actionable information on what players might actually do to get an ERP.

To do this, Facebooker-players asked recipients when their ERPs were valid and when they raided those spots previously. One of the patterns to emerge was normal raids of any levels (1 to 5) at hotspot gyms a few days before Ex Raids. So if an Ex Raid was likely to happen on Saturday at Gym X, the advice was to hit that gym on Wednesday, Thursday, and Friday to increase the likelihood of receiving an ERP.

Collectively, these actions were a form of analysis because of the attempts to reduce, generalise, and ultimately suggest a pattern of results. This actionable information was reported and communicated online (social media networks) and in-person (auntie and uncle network).

The advice to players seeking ERPs is a reduction of much data, effort, and distilled knowledge from a crowd. It illustrates how data becomes information. I have benefitted from the data-to-information meta process because I followed the advice and received an ERP (see image embedded earlier).

The advice does not constitute a guarantee. With more players using this strategy, more will enter the pool eligible for selection. There is still a lottery, but you increase your chances with the scientific approach. You do not just rely on lucky red underwear; you create your own “luck”.

Now back to edtech DRIP. Edtech solutions that claim to leverage on analytics are only good if they not only help the technical system get better at analysis, but also help the teacher and learner take powerful and meaningful action. Edtech solutions that are data rich but information poor only help themselves. Edtech solutions that turn rich data into meaningful information help us.

I reserved this read, Why We Must Embrace Benevolent Friction in Education Technology, for the new year.

A few concepts from the article jumped out at me, but the one that stood out was DRIP — Data Rich, Information Poor. What does this have to do with edtech?

DRIP is a criticism of edtech companies and providers that tout data analytics as a means of controlling, feeding, manipulating, or enabling learners. Data is just that, data. It is not organised information that might become internalised as knowledge and then externalised as intervention.

What edtech providers, particularly LMS and CMS companies, have yet to do is help their clients and partners make sense of the data. This is in part because programmer or provider speak is not the same as teacher and educator speak. There are relatively few people — like me — who can bridge that gap.

So what these providers do is reach out to administrators and policymakers because they all deal with numbers and data. They do so in a way that makes sense to them. It does not help that these discussions are not transparent and also make little sense to teachers and educators.
 

 
A while ago I heard about an interrogative torture technique that involved slowly dripping water onto a victim’s head. The slow drips quickly wear down psychological resistance and the interrogators get what they want.

That method does not transfer via DRIP in edtech. It will only drown clients and partners in meaningless data that does not actually help teachers or their learners.

Being “data-driven” seems to have garnered a bad name in some schooling and education circles.

This is probably because of its misuse by edtech vendors for so-called analytics and misinterpretations of what being data-driven means by policymakers. Each is bad enough on its own. Both are lethal in combination.

But here are two recent examples of how being reliant on data is a good thing.

In a recent contest in Singapore, teams of students relied on shared pools of data to create visualisations.


Video source

The video above used data to create awareness of the difficulties that face families who have children with special needs.
 

Video source

The next video presented data to question commonly held misconceptions about ex-convicts.

Providing concrete visualisations of abstract data is not the same as being data driven. The former is about seeing what is not immediately apparent. The latter can sometimes be about playing the numbers game above all else, and that often ignores or harms the people that make up those numbers.

When being data-driven loses its original intent to inform decisions to actually help people, perhaps data visualisations like the ones above are a timely reminder of what good data might do.

I had an uncomfortable gut feeling when I read this CNA article about biometric payments being available to schools here in 2018.

I had to dig deep for why I was uncomfortable. After all, I am all for technology making lives better. And therein lay the problem: In doing good, there was also the potential for harm.

The good is the sheer convenience of going cashless while being able to track spending. This might be the start of basic financial literacy.

According to the news article, the system has safety measures:

Fingerprint information will not be stored on the device. Instead, the prints will be encrypted and stored securely in a cloud database.

Anti-spoofing technology will also be put in place to ensure that the fingerprints are real and that the person making the payment is present.

This is the trifecta of data accuracy (reading), data security (keeping), and data integrity (reliably identifying). If just one to fails, the system’s users are harmed. Take the recent Instagram hack, for example.

For the sake of argument, let us assume that the three data concepts are sound in practice. What is the harm then?

To answer this question, we need to ask at least one other question: What else can vendors do with the data that is accurate, “stored securely”, and reliable?

The short answer is lots. One needs only look at what Facebook and Google did (and continue to do) with our data. They offer their services for “free” to us because our data serves up advertisements which make these companies money. Lots of it.

One needs only to casually search for data breaches and infringements involving these two companies. For example:

The last item was not so much about the privacy of data as about the use and manipulation of data. That is my point: Assuring stakeholders that data is accurate, authentic, and safe is not enough; it is the lack of transparency and foresight about what can be done with that data.

Students are particularly vulnerable because adults make decisions about their data and the kids have no say in the biometric scheme. By this I am referring to the scheme being employed as a Smart Nation initiative, not the choice of whether to join the scheme.

The issue is so serious that the Electronic Frontier Foundation (EFF) has tips for teachers about student privacy. These include:

  • Making digital literacy part of the curriculum
  • Advocating for better training for teachers
  • Getting parental consent
  • Selecting technology tools carefully
  • Building community of like-minded privacy advocates

A Smart Nation needs people to make smart choices. To do that, people need good information. Where is the information about how the data might be used both intentionally and peripherally? What promises and standards of practice can service vendors and providers be held to? Where is the public debate on the data privacy of the especially vulnerable?


http://edublogawards.com/files/2012/11/finalistlifetime-1lds82x.png
http://edublogawards.com/2010awards/best-elearning-corporate-education-edublog-2010/

Click to see all the nominees!

QR code


Get a mobile QR code app to figure out what this means!

Archives

Usage policy

%d bloggers like this: