As artificial intelligence (AI) becomes more popular, some people are starting to “dream big,” literally. They are considering the concept of Artificial General Intelligence, or AGI as it’s more commonly known, feasible. I’m cringing at nearly every mention of “we have almost achieved AGI”.
Well, let me be the first to disabuse you of that notion — it’s a hallucination at worst and pure science fiction at best.
Understanding AGI
For the uninitiated, AGI is a theoretical AI system that aims to match human versatility and problem-solving capabilities across multiple disciplines. This contrasts narrow AI, which is designed to see to specific tasks. Essentially, AGI would be able to:
- Work on diverse tasks without the usual confinement to a specific set of functions using general intelligence equivalent to humans;
- Apply knowledge to contexts it hasn’t previously experienced.
So, if the AGI evangelists pull this off, machines would basically reach the stage of “whatever a human can, I can too, maybe even better.” Except, this is not possible. It’s a theory, but a completely infeasible one.
I work with AI and data every day as a software engineer at the Wikimedia Foundation (WMF), the non-profit that runs Wikipedia. Because of this, I see a lot of talk about AGI. I'm writing this to explain why it's a pipe dream.
Flaws in AI Training Dataset
When I argue that AGI is impossible, I think about the dataset we’re already using to train our AI systems to be intelligent. There’s a significant flaw in this approach. For one, the amount of inaccurate and irrelevant data being fed into these models is staggering, and as the dataset increases, the risk of training data being AI-generated also increases, leading to the dreaded dog eating its own tail scenario.
Not many are willing to do the hard work of vetting the data used for the training, mainly because it becomes tedious or straight-up impossible once the volume of data enters the petabyte range.
It’s similar to the cryptocurrency craze—people are excited about the output without understanding the underlying blockchain.
At this point, hundreds, even thousands of websites, are producing AI-generated content. This content, often filled with jokes, output that makes no sense without context, and outright lies, is used to train models. These models, in turn, accept this content as real because, let’s face it, they can’t tell the difference between what’s true or bluff. This causes classic hallucinations..
Let the best of tech news come to you
Give it a try, you can unsubscribe anytime. Privacy Policy.
Say, we managed to pull off an almighty AGI system, imagine the amount of biased, incomplete data aka "poison" that would flood into it. As the saying goes, "garbage in, garbage out."
It begs the question, would the AGI system then be intelligent?
Human Intervention will always be necessary
No matter how advanced AI becomes, human intervention remains crucial. At Wikipedia, we place immense importance on preserving reliable human-generated data because we are technically one of the most trusted sources of information on the Internet.
It's almost humorous, considering how Wikipedia was viewed with scepticism in the past.
Students weren’t even allowed to cite the website on their term papers. However, we've realised that human intervention is essential. Unlike bots, humans can be held accountable for their actions and decisions.
Compared to humans, bots work faster. They can produce vast amounts of content quickly, but ensuring the accuracy of this information is another matter entirely. It would require another bot to verify, and as you might have already guessed we’ll face the same problem all over again.
We’ll never reach a point where AI can accurately encompass all knowledge. In fact, the larger the training sets, the more likely they are to hallucinate, aka return false, imaginary or out-of-context outputs..
And if we can neither fully trust AGI to be accurate nor hold it accountable for its mishaps, how intelligent can it really be?
Developing an AGI requires computational power that doesn't yet exist
There are several difficulties with developing AGI models.
But an obvious one, which I don’t see AGI enthusiasts talk about often, is the enormous computational resources it would require. Given our current technology, it would be unnecessary and impractical. The idea of training an AGI on vast amounts of data is not only slow but also requires an insane amount of manpower to maintain the system.
I’m not saying it’s impossible forever—I don’t know the future. However, with the infrastructure we have now and the current state of AI technology, AGI is a distant dream. Companies may showcase impressive demos. But the reality is that demos don’t tell the whole story. For one, they are extremely siloed and rehearsed. They also do not account for how users will interact with your products.
Where AI stands now
At the moment, the term “artificial intelligence” or “AI” is a misnomer. There, I said it!
An AI system is supposed to perform tasks that require intellectual processes, such as the ability to reason and other human characteristics, like intuition. Except that, in its current state, it can’t do that. The truth is, it probably can’t ever.
A lot of what is branded "Artificial Intelligence today is actually machine learning (ML).
First, the systems are fed training datasets with specific characteristics, e.g., a bunch of text files, images, videos, etc. Next, this training set is broken down into smaller units of words, phrases, sub-words, pixels for images, etc. (tokens) during a process called tokenisation. These Tokens provide the primary foundation for training models
When given a prompt, the “AI” simply predicts the next token given a preceding token. So, the systems are not learning to become “intelligent” per se; they are merely predicting outcomes based on what they’ve been fed.
For instance, Excel has been doing the same thing for years. Technically, ChatGPT is just an advanced form of Excel. That’s why narrow AI works fairly well: it can accurately predict within a narrow range.
For example, when given three options, the chances of you accurately predicting the correct answer is around 30%. When you up that number to ten thousand, your chances run down to a measly 0.01%
Even then, there are still limitations. Now, imagine what will happen if we start mixing diverse datasets. That’ll result in a lot of confusion and inaccuracy.
People are often surprised when AI provides silly or incorrect results, but those familiar with the backend understand the limitations and challenges.
AGI is not a practical goal. The closest we are likely to get to building an AGI is a bunch of narrow AIs “talking” to one another.
We shouldn’t let AGI distract us from the future
Beyond products and creators, ordinary users should be able to train their desired models in the future, e.g., systems trained on human-only data.
Additionally, people will be able to build their own models at home to perform petty tasks and enhance home automation. We’ll be able to train our AIs. Apple is on to something with Apple Intelligence.
So, at its peak, AI systems will be like personal assistants that help us perform monotonous or boring tasks like analysing documents and writing code, not a god as many fear or AGI enthusiasts continue to tout.
A good way to think about the direction AI should/will follow is to consider how computers evolved from building bigger mainframe computers to personal laptops and cell phones.
Moving forward, the focus should remain on developing AI systems that are reliable and accurate within their specific domains. At Wikipedia, for instance, we ensure that our data pipeline is human-generated. This provides assurance and reliability that AI alone cannot achieve.
In theory, AGI is interesting. But in reality, it’s impossible at best and at worst.
At least, for now. I’d argue that maybe even ever, but who knows the future, right? Right now, we still have to deal with the limitations of the current state of AI technology, the infrastructure and the necessity for human intervention.
I believe we'll continue to rely on narrow AI systems for a while, if not forever.
For any of these — AI or its white elephant elder sibling, AGI — to work towards ensuring that AI-generated content is accurate and reliable, we’ll need humans to provide oversight and hold accountable.