AI Content Generation Is Bad For Internet. Here’s Why

In this blog, we put our opinions regarding AI content writing tools and show you why AI content generation is going to make the internet a bad place.

Before we start, we need you to relive your childhood. Take yourself to the time when you were just 3 years old. You know nothing about this universe and the world. You see a bright fluid-like thing dancing in the air.  

You get excited about it and your eyes brighten up. You move ahead and touch it only to feel the pain caused by the extreme temperature. You quickly move your finger away. A few days later, you again see that thing but this time you know it is hot and hurtful to touch. You no longer feel safe to touch it. 

Later in life, you learn that it is called fire and you associate brightness with hotness. Brighter light means more hot temperature. Now come back to this time. We learnt that we, humans, learn via experiences, feedbacks and adjustments. 

First, we experience something (like the fire above) and our system creates feedback about it (how we felt about it – hot) and we adjust our behaviour (not touching the fire). Any intelligent system has to have these three things in place. These three things form a kind of loop and we call it a feedback loop system. Through this feedback loop, we keep updating our brain with new information and new adjustments. So, experiences, environments and contexts are primary recipes for any species to develop intelligence.  

Following a similar approach, we developed artificial intelligence systems. Replace experience with Data, feedback with loops (programming) and adjustments with output. Let’s take a classic example of an “intelligent machine” predicting house prices accurately. We feed data to the machine. The data includes the size of the house, number of floors, bedrooms, bathrooms, etc and the actual price of the house. This data would count as experience to our intelligent system. Our machine learns from this data and tries to predict house prices. The machine compares the predicted price of the house with the actual house prices and notes the differences. The difference acts as a feedback to the system. The machine continues to adjust itself until the difference between predicted values and actual values are small. Although this is not how all of the AI systems work but this gives you the idea that every intelligent system requires data (experience), loops (iterations) and adjustments. 

Now, if we turn our attention to the AI systems that generate content, we realise that such systems also require data as experience to learn how to form sentences, words, essays and topics. AI systems that deal with language are called Natural Language Processing (NLP) systems. At this point, it is obvious that an NLP system that has a lot of experience (data) will be better at writing essays than the one having less experience (data). In other words, we can say that an artificially intelligent system is only as good as its data. 

Take GPT-3 by OpenAI as an example. This NLP model was provided with almost all of the content available on the internet to train it! This is the reason why it is so good at writing content. The content on the internet is written by people of every age, race, time and more. It is as if GPT-3 is having such vast experience of people belonging to different age groups, races, and times, or it may be that we are just exaggerating its capabilities. Whatever the case may be, it is tempting to use it for creating content. And we say, please don’t! Because there’s a fundamental difference between how humans and AI systems work and we want the web to be a better place. 

So far, we have seen almost no difference between how humans and machines learn. But there is a difference and it is a fundamental one. Machines tune their outputs (behaviour) based on the data that we feed into it and this data is mostly historical. Our “intelligent system” predicting house prices was only able to do its work smartly because it knew what the house prices have been in the past for houses of different number of rooms, size, number of bedrooms, etc. Any present age artificial intelligent system has access only to the data that we feed into it. This has some serious consequences.  

One of the consequences is that such AI systems will have some kind of bias. That bias is introduced through the data. Faulty input data will cause faulty output data. Sometimes, data is incomplete or some vital data points are missing which can cause the AI system to be faulty. But we are not concerned with that kind of problem here because such problems can be solved (to some extent) by improving on the input training data and algorithms. What we are concerned about is what we call conformism. 

Conformity is when someone tries to match his attitude, lifestyle, decision making to the commonly accepted norm. The question is who sets the norm for the AI system? It is the data that is fed into it. Any AI system will be able to form its decisions based on the data it was trained on. If we ask such an AI system to give us something new – like a new solution to the ‘never seen before’ problem – it simply won’t be able to do that. Instead, it will throw some solutions which are in line with its historical data. How can it come up with a solution for a problem it has never dealt with before? It can only give ideas and solutions which are conforming in nature. Humans also suffer from conformism but they have something called meta-thinking. They can figure out that they are being conformists and can choose to break out of it. If we had assigned AI to solve the problem of transportation back in the early 80s, it would have found various means to make horses run faster. It would never have brought about something like a car! Car was a ‘never seen before thing.  

We can follow the same logical path and argue that an AI content writing tools can never come up with new ideas, thoughts and essays that break our conventional wisdom and accepted norms. This is one of the biggest negative impacts of artificial intelligence on the content. At best, it can come up with essays that are built on previous ideas. Humans are able to break out of conformity because they have metacognition – thinking about thinking or thoughts about thoughts. Humans can question what they produce, they form new ideas out of intuition. This is what sets humans apart from superintelligent AI systems. What would happen to the web if each article is produced by a bot which is merely an extended or refined version of previous articles? Each piece of content is reinforcing the same old content? Growth would stop. Content would move on a smooth horizontal line without experiencing any vertical jumps! It would create a closed shell of understanding for anyone reading content online.  

Even if you ignore conformism, there is another basic problem with present NLP systems. Presently, all the NLP models like GPT-3 by OpenAI are merely statistical beasts who can only predict what the next word and letter should be without understanding what it means. They lack understanding. They don’t understand the meaning behind symbols, letters and words. They are only prediction machines by using historical data. They can only guess the next word, sentence or missing words but they can never check if the guess is wrong. Such language models can guess what one plus one is but can never guess what 2113 + 800.008 is! Because language models are trained on examples while humans learn principles. Humans can add any two numbers not because they have seen examples of every kind but they know the underlying principle of addition. 

Now, if we assume a very high level of technological advancement in the future and AI systems possess understanding, they would still suffer from conformism and can never break out of it until humans retrain them on new data. Now you see the problem here. It is deeper than you think. Let’s know about your thoughts in the comments below. 

Leave a Reply

Your email address will not be published.

2 × three =