here's why web privacy matters (archived post)

Websites can collect a massive amount of data about you. Most people, though, respond to this fact with some variant of “I don’t have anything to hide.”

Keeping your data safe online doesn’t mean you’re hiding; let’s explore some perfectly legitimate reasons to keep your data away from websites.

What counts as “data”?

First, let’s look at what websites mean when they say they’re collecting your data.

It’s simple: “data” is any information that goes into a site’s database instead of out when you visit it. Websites collecting this can have various levels of danger, but there are two fundamental types of data.

Anonymized Data

Anonymized data contains no information about who you are, just about what you’ve done. Ideally, websites can’t even tell if more than one piece of data comes from the same user.

The only data this website collects from you (other than the comment and contact forms) is anonymous. This currently comes in the form of two pieces of data:

Hit counts – Each time an article is loaded—by human or bot—a counter ticks up by one, helping me gauge how popular the article is. This method isn’t very accurate since multiple reloads of a page count separately. Still, it gives me a sense of which types of articles are popular, helping me focus on topics people are interested in.
Reaction counts – On this site, no personal information is collected when you react to an article with an emoji. While this does mean people can cheat the system, it gives me more accurate data than the hit count, which I may remove entirely in light of how successful this feature has been. For more information about how the reaction counter works, visit its dedicated article.

Identifiable data

If data contains any information about you that could be cross-referenced with other data to provide more insights into your behavior, I refer to it as “identifiable” data. In short: if there is any way for a website to tell who you are or whether you’ve visited before, it is collecting identifiable data.

Sometimes websites have perfectly legitimate reasons to collect this data:

To put your name next to your comment, like this site does. On this site, though, you are not required to use your real name, and you do not need an account to post a comment.
To provide a service. Most sites where you have an account store information about you to help complete the jobs you’ve given them. Be careful, though; it’s been said that if a product is free, you are the product. In other words, the way most free services make money is through the data you give them. We’ll talk more about this in a moment.
To show the information you’ve chosen to reveal. Social media, for example, takes the information you give it and displays it to those who follow you. Most social media platforms will also collect data for their own purposes, but not all.com/what-is-mastodon/).
To save settings. This website stores data about your chosen color theme, font size, and other settings. However, it uses cookies stored on your local computer rather than the server and only so it can display to you what you’ve selected. I, as the website owner, never see any of that data.
To identify spam. Sometimes, so many messages will come from the same computer that sites can tell it is probably owned by a spammer. However, to recognize this, they need to keep track of which computers different messages come from.

There are more legitimate uses, but they’re not the focus of this article.

More and more, data on the web is being used in ways that are not in users’ best interests.

Ways Sites Make Money from Data

Here are some ways you probably didn’t realize sites you’re visiting are making money through the data they collect about you.

YouTube and most social media platforms

Social media websites collect an enormous amount of data about their users. They feed all of this data into special programs called Machine Learning models, which learn how to trap you on the site from that data. (I plan to do a dedicated article on Machine Learning and how it works soon.)

Social media platforms (and YouTube) make money by keeping you on the app/site as long as possible so you’ll watch/view more ads. Comparing how likely people are to keep watching if recommended specific videos or with certain design choices allows these sites to make decisions that waste our time far more efficiently.

That’s right; you waste more time than you mean to on some websites because they are intentionally designed that way. Not only that, the more data they have about you, the better they can achieve the goal of wasting your time.

That is not how I want data about me to be used. Using privacy-protecting apps and services helps me avoid being trapped on sites.

reCAPTCHA

This is an odder one.

You’ve all seen reCAPTCHA, Google’s tool that checks if “I’m not a robot” is really true.

Realistically, though, it doesn’t stop robots. I won’t go into the reasoning behind that now; I plan to write a full post on reCAPTCHA and its ineffectiveness in a week or two. For now, go ahead and assume this premise is true.

Even though reCAPTCHA has little practical value, Google has an excellent monetary incentive to keep it running: all the valuable data they’re collecting.

Remember the Machine Learning models I mentioned earlier? A commonly-known application of them is to teach self-driving cars what to do in different situations. The problem is that these models need massive amounts of data to make significant conclusions.

Some of you may have already put together the connection: Google uses your selection of which pictures contain traffic lights or crosswalks to teach real self-driving cars how to recognize traffic lights and crosswalks. In part, Google’s self-driving cars were taught by your frustrated clicking of squares with blurry images.

I certainly don’t think this use is justified: First, I’m not clearly told that Google will do this; a “privacy policy” link is not enough. Second, it makes me responsible—in part—for any issues self-driving cars have. Finally, I don’t have much choice in whether or not I fill out these “I’m not a robot” boxes, even if I have read the privacy policy and decided I disagree with it. (Though don’t worry; my dedicated reCAPTCHA article will offer a way around them.)

reCAPTCHA’s other moneymaking quality ties in with our next entry.

Web Analytics (primarily Google Analytics)

Some companies like Google provide a free service to website owners: automating visitor data collection, management, and analysis.

This is not an altruistic service; Google, for example, runs the world’s most profitable advertising business, and every scrap of data they can collect about what sites people visit and what they’re doing on those sites helps them learn how to show people ads they’re more likely to click on.

In fact, Google’s influence goes beyond sites that use its analytics software; Google collects users’ data through reCAPTCHA, embedded YouTube videos, sites with Google Ads, and web searches through Google. By cross-referencing this data, Google can get a highly detailed picture of who you are and what you do online, especially if you’re logged into your Google account—which you probably are.

I’ll bet you didn’t realize how pervasive Google’s view of your web browsing is, did you? In a study of the 50,000 most-visited websites, 86% had code that let Google track their users. Until a few days ago (when I removed reCAPTCHA), this site was one of them. There are still a couple pages that have Google code, but I’m going through those and removing that.

The right browser settings can stop Google from tracking you between sites, but most people use Google’s own web browser—Chrome—which certainly does not have those settings enabled by default. Using Chrome is an invitation to Google to track you. Instead, I recommend Brave, which is based on the same code as Chrome and has a very similar look and feel, or Firefox with the right settings enabled—I’ll be discussing which settings those are in an upcoming article.

The Weakest Link

You may not care about all of this. “Who cares,” you say, “if these big companies have my data. At least they’ve promised not to share it.”

Well, there’s still a problem: websites get hacked. If your data is in a companies database and that database gets exposed to someone who shouldn’t have access to it, bad things can happen. (What those things are, we’ll explore later in this series.)

You’d be surprised how often this happens. You can actually check this website to see if any accounts connected to your email address or phone number have been exposed in a data breach. In my case, I discovered that two companies I had accounts on—Canva and Wattpad—had been hacked, revealing emails, names, dates of birth, locations, and more.

Luckily, I hardly used those accounts, but it demonstrates how easy it is for larger companies—often ones we think impervious to hackers—to lose control of our data.

Whether or not you trust the companies you’re letting collect your data, each company is another access point for hackers to find that data. And believe me, hackers are far less benevolent than even Google, Facebook, and the like.

Conclusions

On Saturday, I wrote about trust, who should be trusted, and what they should be trusted with. In this article, I’ve looked more closely at whether we should be trusting the sites we visit with our data.

Companies that collect our data do not usually have our best interests in mind. On the contrary, most for-profit websites will do anything legal that has a monetary benefit, regardless of how users feel.

In this article, we’ve looked at privacy on the web and decided that it affects all of us. “I have nothing to hide” doesn’t address the main issue; ignoring data privacy allows websites to learn how to better waste our time and take our money. To better control our time and money, we need to control the data websites collect about us.

Application

In the next article in this series, we’ll examine one of the most important steps you can take to ensure better web privacy: choosing a better web browser. Before then, though, you can get started by adjusting your Google privacy settings.

Google is one of the biggest offenders when it comes to digital privacy. However, they are legally required to give you options to control what data they collect. Here are some steps you can take to exercise those rights:

Take Google’s Privacy Checkup. Keep in mind that this checkup only contains things Google doesn’t mind you changing, so this is only a start.
Go through the Data and Privacy dashboard and make sure that Google only has access to the information you want them to have. I’d especially recommend you switch off Location History; otherwise, Google keeps a record of everywhere they know you’ve been since you created your account. I imagine this applies the most if you have an Android phone or use Google maps. “Web and App Activity” is another item to switch off; it involves a large portion of Google’s tracking.
Consider changing your password. I know this won’t protect you from Google, but it does help keep other people from gaining access to your account. I recommend choosing a password at least twelve characters long. Though it’s okay for you to have a way to remember it, your password should not include any personal information such as your name, birth date, anniversary, zip code, or any other information that might be available online or to anyone you know—even close friends.

See you next time, where we’ll look at how to choose and configure web browsers!