Another dot in the blogosphere?

Posts Tagged ‘data

Recently I tweeted a comprehensive opinion piece that critiqued the amendments to the Children and Young Persons Act.

I agree that Singapore needs to do more by way of legislation and regulation to protect the data privacy and rights of minors. I also favour doing the same for young adults in the context of higher education and LMS use.

But I wonder what unwanted signals this declaration makes:

Thankfully, Singapore has not experienced such high-profile incidents relating to the breach of children’s digital privacy or online harms.

Does it take a “high-profile incident” for us to take action? It should not. It speaks poorly of us as a people if we only react when tragic things happen to important people.

Does the the paucity or absence of data for a phenomenon mean it does not happen? I think not.

I recall having a passing conversation with a university staff about how much abuse was swept under the table. This was right before high profile cases made the news and universities here seemed to react with countermeasures.

Universities were already aware of what was happening in their campuses. It was just that the data was not shared and so the issues were not transparent to the public.

So shit still happens and about as regularly as we have bowel movements. They seem mundane or are unpleasant to talk about. But if they reveal a health problem we might have, are we going to try to flush the symptoms of an underlying problem away?

Today I combine a quick reflection on a video I just watched and a book that I am reading.


Video source

If you want to know what a dead fish, an MRI machine, and statistics have in common, watch the video above.

The salmon experiment was a red herring. If you focus on the living results coming from dead fish in an MRI machine, you miss the point of the video: Research needs to be based on good design, not led by a foregone conclusion.

That should seem like a given, but the fact that the point needed to be proven and made is evidence that people and scientists need constant reminders.

Here is another reminder and it comes from page 109 of Charles Wheelan’s book, Naked Statistics.

Our ability to analyse data has grown far more sophisticated than our thinking about what to do with the results. — Charles Wheelan

This quote was from Wheelan’s chapter about starting with good data. He was trying to make the point that no amount of elaborate or sophisticated analysis was going to make bad data any better.

For example, we might need a representative sample but select a biased one instead. There is no analysis process that is going to improve a bad sample. That data and the research is tainted from the start.

So the next time someone declares “Research says…” we know better than to take what follows at face value.

I have avoided reading and reviewing this opinion piece Analytics can help universities better support students’ learning. When I scanned the content earlier this month, my edtech Spidey sense got triggered. Why?
 

 
Take the oft cited reason for leveraging on the data: They “provide information for faculty members to formulate intervention strategies to support individual students in their learning.”

Nowhere in the op piece was there mention of students giving permission for their data to be used that way. Students are paying for an education and a diploma; they are not paying to be data-mined.

I am not against enhancing better study or enabling individualisation of learning. I am against the unethical or unsanctioned use of student data.

Consider the unfair use of student-generated data. Modern universities rely on learning management systems (LMS) for blended and online learning. These LMS are likely to integrate plagiarism checking add-ons like Turnitin. When students submit their work, Turnitin gets an ever-increasing and improving database. It also charges its partner universities hefty subscription fees for the service.

Now take a step back: Students pay university fees while helping a university partner and the university partner makes money off student-generated data. What do students get back in return?

Students do not necessarily learn how to be more responsible academic writers. They might actually learn to game the system. Is that worth their data?

Back to the article. It highlighted two risks:

First, an overly aggressive use of such techniques can be overbearing for students. Second, there is a danger of adverse predictions/expectations leading to self-fulfilling prophecies.

These are real risks, but they sidestep the more fundamental issues of data permissions and fair use. What is done to protect students when they are not even aware of how and when their data is used?

This is not about having a more stringent version of our PDPA* — perhaps an act that disallows any agency from sharing our data with third parties without our express consent.

It is about not telling students that their data is used for behavioural pattern recognition and to benefit a third party. While not on the scale of what Cambridge Analytica did to manipulate political elections, the principle is the same — unsanctioned and potentially unethical use of a population’s data.

*I wonder why polytechnics are included in the list of agencies (last updated 18 March 2013) responsible for personal data protection but universities are not.

The first episode of CrashCourse’s series on artificial intelligence (AI) is as good as the other series created by the group.


Video source

The introductory episode started by making the point that AI is not the scary robot made popular by books or movies. We use everyday AI when we ask a voice assistant to play some music, sort photos using facial recognition, or vacuum clean our floor with a Roomba.

These are ordinary events that we do not question or fear, but it is still AI at a low level. Regardless, how did basic AI become commonplace?

AI has no sense organs, so it needs to be fed a lot of data. The availability of now ubiquitous data has enabled the rise of AI.

Then there is the constant improvements in computing power. What a current supercomputer might process in one second would take the IBM 7090 — the most advanced computer in 1956 — 4735 years to solve.

Finally, the commonness of AI is due to the information that we create and share, and how we transact on the Internet.

So the seemingly rapid rise of AI is due to three main things: Massive data, computing power, and the Internet as we know it.

AI is not terrifying in itself. What should scare us is ignorance of what it is and what it might become through irresponsible design.

The tweet above is an example of how NOT to start research.

You do not start with a conclusion and look for data to prove it. Instead, you gather data based on one or more research questions, then only do conclusions possibly emerge.

So how might the tweeted issue be investigated? It might start with the questions: How does the new surge pricing scheme affect drivers? How does it affect passengers? What are the effects by each company?

These questions allow for different data sources and types to shed light on a complex phenomenon. They may reveal that the surge pricing is “unfair” (however that is defined) or not. They do not exclude data that might reveal the contrary or uncover even more issues.

This week’s episode of Crash Courses’s Navigating Digital Information focused on data and its visual representation.


Video source

Data, whether represented by raw numbers or graphics, can seem objective. However, they are not neutral because people gather and interpret them. (As a former academic, I shuddered whenever I overhead colleagues talking about “massaging data”.)

In evaluating data, host John Green reminded us to ask:

  • Does the data support the claim? (Is it relevant?)
  • How reliable is the source of data? (Who commissioned the research and why? Who conducted it and why?)

As for data visualisations, Green reminded us to check if the graphic was based on real data (check its source) and that the data was transferred and presented accurately.

Another consideration specific to data visualisations like infographics is how complex phenomena are simplified in the creative process. This might sacrifice the accuracy of the data.

If we combine both sets of principles, we might be in a stronger position to evaluate the following example. Two organisations, used the same set of data to send messages on climate change.

Organisation A’s image is on the left and B’s is on the right.

Screenshot of graphs from https://www.youtube.com/watch?v=OiND50qfCek&t=201s.

Organisation A had already concluded that temperatures were not rising globally over time, so it manipulated the y-axis to range from -10 to 110 deg F. Organisation B zoomed in a smaller range and the average temperature increase was more pronounced. B critiqued A’s representation as misleading.

Both organisations used relevant data that supported their claims. The data was sourced from a neutral third party (NASA’s GISS). However, the presentation was manipulated by A to obscure the trend.

My perspective: Seeing should not immediately lead to believing because the data might be selectively or “sexily” presented. The first is only sharing data that supports preconceived notions; the second is using elaborate or compelling-looking visuals to disinform or lie.

A side note: Have you ever noticed that “lie” is central to believe?

More people are probably aware of the importance of managing their personal data post-Cambridge Analytica. But how many know how data is mined, managed, or misused?


Video source

The video above provides a broad explanation of data mining. It distills the processes and purposes of data mining into five strategies:

  1. Classsification
  2. Regression
  3. Clustering
  4. Anomaly detection
  5. Association learning

I enjoyed the video. Unlike the press that focuses on negatives, the video highlighted the benefits of data mining, e.g., predicting disease before it emerges.

It also made a subtle point that is easy to miss, i.e., the bias introduced by humans who make decisions on which data to focus on and what to do with it.

Data, its collection, and its management are not right or wrong in themselves. It is what we chose to do with it that makes the difference.

We shape our tools and then our tools shape us. -- Marshall McLuhan.

Tags: ,

http://edublogawards.com/files/2012/11/finalistlifetime-1lds82x.png
http://edublogawards.com/2010awards/best-elearning-corporate-education-edublog-2010/

Click to see all the nominees!

QR code


Get a mobile QR code app to figure out what this means!

Archives

Usage policy

%d bloggers like this: