the learning machine (archived post)

by benjamin hollon on february 8, 2022

Multiple times in my series on digital citizenship, I’ve referred to “Machine Learning models.” I want to unpack what they are and why they affect the ways we behave online.

I’ve written about this topic before, but I’d like to start by summarizing what Machine Learning is anyway, perhaps from a different perspective than that article did.

What is Machine Learning?

Even before computers were powerful enough, we dreamt of programs that could make intelligent decisions. One common term for this is “Artificial Intelligence” (though I prefer “Synthetic Intelligence,” which avoids the incorrect views people often have of AI).

To allow machines to successfully mimic intelligence, we need to teach them to learn. Computer scientists have built systems called “neural networks” to mimic how people make connections between concepts, which has been a significant step toward computers that can learn. The finished program is known as a “Machine Learning model,” which can be reused in future scenarios to solve a common problem.

To create a machine learning model, we basically give a neural network a question and the answer, asking it to work out the connection between the two. For example, we might plug in health data, associating each file with whether or not the person involved developed cancer. The computer could put together these ideas and find relationships that help us detect cancer sooner.

Uses of Machine Learning

While there are amazing uses of Machine Learning, such as detecting when patients are at risk of cancer, there are also far less altruistic uses. In fact, Machine Learning models provide the primary motives websites have for collecting all your data.

Google

Take YouTube (owned by Google), for example. YouTube collects detailed data about whether or not people play the next videos recommended to them. YouTube can then train a Machine Learning model on that data, instructing it to learn which videos it should recommend to get you to watch the next video. More specifically, YouTube wants you to watch as many advertisements as possible. Most social media websites use Machine Learning for similar purposes.

Gmail has recently begun using Machine Learning for a novel purpose: suggesting replies to emails. I’ve found it disturbingly accurate at times, matching my voice extremely well.

Here’s my problem with that: it means Google is letting a script read and process all of my email, trying to learn how to mimic me. I find that amount of power disturbing. I may have been fine with Google having my emails, but I was not expecting them to use them without asking.

GitHub/Microsoft

Disturbing uses of Machine Learning abound; GitHub (owned by Microsoft) recently launched Copilot, which is able to code for you, given instructions.

Now, GitHub may hold more code than any other site; most open source projects are hosted on it, and many large organizations use it for collaboration. To get data to train Copilot, they fed all public code on the site into a Neural Network.

This is primarily concerning to me for copyright reasons; even though most of this code was open source, many open source licenses don’t simply allow use for any purpose. For example, my preferred license for large projects, the GNU Public License or GPL, requires anything made using my code to also be open source, under the same license. Linux, one of the most successful Open Source projects ever, is able to thrive in part due to its GPL-2.0 license, requiring any operating systems that use Linux—such as Ubuntu, Fedora, and Android—to be open source as well.

Copilot, on the other hand, uses all of the code from public projects without paying attention to the license. While the output of Machine Learning models may not technically be able to violate copyright in most nations, it certainly violates the spirit of Open Source. Code from Verbose Guacamole, my Free and Open Source novel editor, might show up in some nonfree software because Copilot made it available.

There are other cans of worms related to Copilot, including that it seems to copy code verbatim, not just plug it into ML models, but even using the code in a model disappoints me, and has been making me seriously consider moving all my own code elsewhere.

Data

The crucial piece of a machine learning model is data: the more data you have, the more accurate the results will be. On the other hand, processing more data takes the neural network more time and, by extension, more money. This limits the most effective business solutions to big companies:

Machine Learning has been a viable tool in businesses’ toolboxes for years now, and they’ve honed some strategies to get hold of more data:

On a positive note, I read that Google Analytics was just ruled illegal in the European Union—the way it processes data violates the GDPR. I’m hoping this will cause change in both directions: Google will mend some of its policies, and fewer websites will use its services.

Not all data used in Machine Learning models is user data, but models using user data are often the most harmful. Google takes data from reCAPTCHA, for example, and uses it to train their self-driving cars. That’s not directly harmful to users, though it does incentivise Google to push a false view of reCAPTCHA’s actual usefulness. For users, the more harmful access is that reCAPTCHA allows Google to know all the sites you’re visiting, tracking what you’re interested in so they can train their Machine Learning models to give you better ads and waste your money.

The Good

Despite all this, Machine Learning is a very good thing. I don’t want to leave you with a wrong impression, so I’ll finish with some good things Machine Learning has helped us achieve.

I could go on, but I’ll cut the list off there for now. I really don’t want to spread FUD about Machine Learning; it really is an amazing system, and I’m immensely excited to see where it takes us in the coming decades, despite its negative privacy consequences.

Conclusions

Machine Learning is an amazing field of software that allows computers to learn in ways that mimic our own amazing brains. The positive uses of it are overwhelming, but as conscious digital citizens we need to be aware of the ways companies use it against us:

Remember those issues with Machine Learning—it’s important to know what companies are doing wrong so that you know what to avoid—but never forget that there are plenty of companies doing very amazing things with Machine Learning. It’s our responsibility to make sure companies do the right thing and use Machine Learning responsibly.

The future is bright, but the box says “Some Assembly Required.”

Application

First off, I recommend you do some of your own research on Machine Learning. I’m not an expert, and it’s perfectly possible I’ve made some incorrect inferences. Look at the information available and decide for yourself what you think about Machine Learning.

If you want to have your own chance at making a Machine Learning model without any required coding knowledge, Google made an amazing tool named Teachable Machine that’s worth checking out.

And yes, I am recommending a Google product. Not everything they make is bad.

Perhaps that’s the moral of this article: not everything is bad.


Liked what you read?

I'm really glad you did! What's next?