3 Misconceptions about Data Privacy


Misconception 1: Thinking that data privacy is about content, not context. Data collectors are often more interested in our metadata than data.
Misconception 2: Thinking that data is collected purely for our benefit. Instead, our behavioral data is sold to companies who use our data to predict and shape consumer behavior.  
Misconception 3: Thinking that data privacy is only for people with something to hide. It is a value that should mean something to all of us.

I share my data on the internet daily, but I barely think about what it is that I’m sharing or whether it matters. When a website invites me to read its Cookie Policy, I automatically click “accept”. When the General Data Protection Regulation was passed in 2016, I was annoyed at the flurry of emails I got about changes to data policies. I didn’t think “Great, the government’s doing more to protect my data!”. When I found out that Google Photos provides unlimited storage, I was ecstatic and immediately used it to free space up on my iPhone. I didn’t ask “What will happen to my photos?”

Recently, I tried to learn what’s at stake when companies ask for my data. I discovered that I had at least three big misconceptions about data privacy.

Misconception 1:

Thinking that data is about content, not context

We often think about data privacy in terms of content rather than context. But companies who collect our data often aren’t as interested in the words you use in an Instagram post as much as when you posted, where you posted, and who saw that post. This is the distinction between data and metadata.

Metadata is data about data. It is a cluster of tags and markers that is generated automatically whenever you do anything online. Here are some examples of the distinction between data and metadata for various forms of digital communication:

  • Email. Data: the words in the email; Metadata: the time you sent the email, the location from where you sent that email, who owns the computer from which the email is sent.
  • Phone calls. Data: the sound of your voice, the words you say; Metadata: the date and time of the call, how long the call lasted for, the number from which the call was made, the number being called.
  • Social media posts: Data: the actual post, the picture you posted, the words below the picture; Metadata: where you posted, the time you posted, the likes it has, the shares it has, how long people spent looking at the post.

From the perspective of the data collector, metadata is invaluable. First, metadata is often what makes data useful. The volume of digital communication on any platform is much too large for anyone, including sophisticated tech companies and government agencies, to sift through rigorously. Metadata helps narrow down the field of content to pay attention to. Second, the unwritten and unspoken information contained in metadata can tell data collectors much more about patterns of behavior than actual data.

From the perspective of the data producer (you), learning that metadata is what data collectors care about can be worrying. While content is something that you produce knowingly, metadata is produced automatically and unconsciously as a byproduct of engaging in most online activities. You might pause and think about what to write in a Facebook post, but you probably don’t pause and consider whether to click on a post. What’s more, to the extent that data collectors allow you to access data about yourself, the data you can access is the stuff that you are conscious about in the first place. The stuff that you don’t think about is what remains in the hands of the data collector.

After the Cambridge Analytica scandal in 2018, Facebook allowed its users to download a range of personal data. But this data only had to do with stuff that users themselves had provided, including information that they have deleted: friends, photos, videos, pokes, posts and so on. This data does not allow users to access most types of metadata, including details on the targeting analyses that determine the ads that you are shown on Facebook. Consider a situation where you download a meme from a group that spreads propaganda. You share this meme on a family Whatsapp group chat. Using metadata on your downloading and sharing behavior, Facebook now tags you as a user who believes in such propaganda. They sell this metadata to political parties or companies to target ads or more propaganda on your page. Your behaviour determines what tags you get, but only the data collector knows how you’ve been tagged.

Misconception 2:

Thinking that data is being collected only for your benefit

It’s easy to think that any data a private company collects is for the users’ benefit. To some extent, that’s true - the cookie policies tell me that the websites want to enhance my experience, don't they?! But examining the economic structure that has enabled Google, Amazon, and Facebook to become the most valued companies in the world today shows that the main entities that benefit from data collection are not the people producing the data. Instead, it’s private companies who use this data to predict and shape consumer behavior.  

In her book “The Age of Surveillance Capitalism”, Shoshana Zuboff argues that surveillance capitalist firms often collect data for reasons unrelated to product improvement. Data from our consumption behavior (the posts we like on Facebook, the links we click on Amazon, the words we type into the Google search bar) are collected and fed into an artificial intelligence machine. These machines use algorithms to predict our future behavior (what we’ll buy next on Amazon, watch next on Netflix). These prediction products are then sold to other companies who use that information to market their products to the right groups of people. In short, our behavior on Google, Amazon, Facebook become “products” that these companies can sell to other businesses. Our behavioral data is not necessarily recycled into the system to improve our experience on these platforms.

This switch from collecting data for the benefit of the consumer to collecting data for revenue is nicely illustrated by Google’s journey. Early on, Google used data byproducts like the pattern of search terms, query phrases, click patterns to improve the search engine. By using data on what people clicked on when they searched, Google’s algorithms learned to produce more relevant and comprehensive search results for its users. However, more and more data byproducts began to be collected so that each time a user queries Google’s search engine, the system presents a specific configuration of a particular ad, generated by a matching algorithm. The more users actually clicked on an ad (the “click-through” rate), the more private companies would want to advertise on Google. This is the essence of targeted advertising.

In her book, Zuboff distinguishes between standard market capitalism and surveillance capitalism. In a standard market economy, the main economic exchange takes place between the user who buys a product from the firm (e.g. a car from Ford). The transaction between consumer and producer ensures some degree of reciprocity between the two parties. Data from the user is collected with the purpose of improving the product so that users remain loyal to that company. In contrast, in a surveillance capitalist economy, you don’t pay Google anything when you type words into a search bar or watch a YouTube video. The main economic exchange takes place between Google and other private businesses that want to make use of the data that Google has collected. The lack of transaction between the consumer and producer eliminates any need for reciprocity between the two parties. The dominance of these companies in the marketplace reduces the need for reciprocity even further as there’s little perceived threat that the consumer would leave for another platform.

Misconception 3:

Thinking that data privacy is only for people with something to hide

My excuse for not thinking more about data privacy has always been something along the lines of: I don't produce anything that’s worthy of being collected or analyzed! But data privacy isn’t something just reserved for people who think their safety is at stake.

First, privacy means something to everyone. This makes data privacy a value that we should all care about to some degree. Furthermore, data privacy is connected to many other values that you probably do care about – autonomy, liberty, democracy. So not paying any attention to data privacy is a disservice to these other values.  

Second, data privacy is an inter-temporal matter. The data you disclose today can affect your future self. Just because privacy doesn’t mean much to you today doesn’t mean that it won’t mean something to you tomorrow.

Third, data privacy is an inter-personal affair. In most social networks, your behavior online says something about whom you’re connected to. So, one person’s behaviour can have knock-on effects on the data security of others whom they’re connected to. As Edward Snowden writes in his autobiography,

“Because a citizenry’s freedoms are interdependent, to surrender your own privacy is really to surrender everyone’s… saying that you don’t need or want privacy because you have nothing to hide is to assume that no one should have, or could have, to hide anything – including their immigration status, unemployment history, financial history, and health records.”

Going forwards

Valuing data privacy requires more than just saying that I care about privacy. It requires actively doing things that is consistent with this value. Maybe I’m not ready to abandon Google for DuckDuckGo yet. I’m probably also not going to give up Google Photos. But maybe I will pause a little to read the fine print before I click “I accept” the next time a website presents me with a policy.