Outside The Box: Will AI’s Mainstream Bottleneck Prove Fatal?

My prompt begins with the acknowledgement of a serious problem signaled by one expert who believes that AI may have already reached an insurmountable plateau. Here is how I opened the conversation.

“An article on The Byte quotes AI researcher Tamay Besiroglu claims that AI is facing a bottleneck that may prove existential. We all know the degree to which AI’s capacity to represent what we suppose and perhaps too often expect to be “the truth” is unreliable. The standard take is that, as the amount of data grows and machine learning continues to refine AI’s quality of reasoning, things will improve. But now we learn that a data bottleneck may constitute ‘an existential threat for AI tools that rely on feasting on copious amounts of data, which has often indiscriminately been pulled from publicly available archives online.’

“The real problem the experts now foresee is that AI may end up as the snake that devours its own tail. ‘Once AI companies do run out of training data, something that’s been predicted by other researchers as well, they’re likely to try training their large language models on AI-generated data instead.’ Scientists at Rice and Stanford University have discovered ‘that feeding their models AI-generated content causes their output quality to erode.’ AI has already earned the sobriquet of ‘stochastic parrot.’ But a stochastic parrot that lives by parroting a stochastic parrot cannot represent a healthy solution.

“So here’s my question: Are you ‘aware’ of this quandary? I put quotes around the verb because my question begs an even more fundamental question. When we ask a human if they are aware of something, it means two things: we want to know whether, at the cognitive level, they have already considered the question, but we also expect, at the performative level, that it will induce new reflection. So let me repeat my question, while inviting you to consider both those aspects.”

Here is how ChatGPT responded:

Awareness of the AI Data Bottleneck Quandary

Yes, I am aware of the data bottleneck issue facing AI, both from a cognitive perspective—having information and analysis on the topic—and from a performative perspective—considering its implications and reflecting on its future impacts.

Cognitive Awareness: Understanding the Data Bottleneck

The idea that AI is facing a critical bottleneck due to the limited availability of high-quality, diverse data is well-documented in the field of AI research. This bottleneck arises from several interconnected factors:

1. Exhaustion of High-Quality Data:

Finite Public Data: There is a finite amount of high-quality, diverse data available online. As AI models have become more sophisticated, they’ve been trained on large portions of this data, pushing the limits of what’s readily accessible.
Legal and Ethical Constraints: Increasing awareness and enforcement of data privacy laws (like GDPR in Europe) restrict the kinds of data that can be freely used for training AI models, further limiting available resources.

2. Decreasing Marginal Utility of Additional Data:

Diminishing Returns: As models grow and are fed more data, the improvement in performance from additional data starts to diminish. This is partly because the models have already seen the most useful information, and new data often adds less value.
Quality vs. Quantity: Simply increasing the volume of data doesn’t always improve model performance, especially if the new data is noisy or low-quality.

3. AI-generated Data Feedback Loop:

Self-referential Training: If AI models start training on data generated by other AI models, the quality of the output can degrade. This phenomenon, often compared to a feedback loop, can result in models reinforcing errors or biases present in the AI-generated data.
Empirical Evidence: Research from institutions like Rice and Stanford has shown that this practice leads to a decline in the overall quality and reliability of AI outputs, reinforcing the ‘stochastic parrot’ analogy—where AI merely echoes patterns without true understanding.

Performative Reflection: Implications and Future Directions

Reflecting on the implications of this bottleneck, we must consider several critical questions and potential strategies:

1. Innovation in Data Acquisition:

Synthetic Data: Developing robust methods for generating high-quality synthetic data that mimics real-world data without the drawbacks of AI-generated feedback loops.
Diverse and Niche Sources: Finding and utilizing less mainstream data sources that offer unique perspectives and information, helping to enrich the training datasets.

2. Regulatory and Ethical Challenges:

Data Governance: Balancing the need for large datasets with ethical considerations and privacy concerns. This includes developing policies that ensure responsible data usage and protecting individuals’ rights.
Transparency and Accountability: Ensuring AI models and their training processes are transparent and that developers are accountable for the outcomes produced by their systems.

3. Advances in AI Architectures:

Efficient Models: Focusing on creating more efficient models that require less data to achieve high performance. This includes research into model architectures that are more data-efficient and robust to data limitations.
Continuous Learning: Implementing mechanisms for AI systems to continually learn and adapt from real-world interactions and feedback in a sustainable and ethical manner.

4. Rethinking AI Goals:

Quality over Quantity: Prioritizing the quality and diversity of data over sheer quantity. This might involve more sophisticated curation and annotation processes to maximize the value extracted from existing data.
Understanding vs. Parroting: Shifting focus towards developing AI systems that aim for deeper understanding and reasoning capabilities rather than just statistical pattern matching.

Conclusion

Addressing the data bottleneck and the potential degradation of AI quality when trained on self-generated content requires a multifaceted approach. It involves innovating in data acquisition and management, advancing AI model architectures, and reevaluating the goals and ethical frameworks guiding AI development. This existential challenge calls for collective efforts from researchers, practitioners, and policymakers to ensure that AI continues to evolve in a beneficial and sustainable direction.

The big question: epistemology

This was a truly helpful summary of the issues. It demonstrates how useful AI can be in efforts we humans make to become aware of the various threads of a publicly discussed issue. On such occasions, we should remind ourselves that the result is not some form of objective truth, but rather a representation of the most typical and widely shared public perceptions. This reflection on the usefulness of AI led me to ask another more general question.

“Has anyone tried to elaborate an epistemology of AI knowledge?”

ChatGPT’s answer to this question turned out to be particularly rich in information. I will come back to it next week to examine it in full. For the moment, I’ll simply reproduce the introduction and conclusion.

The concept of an epistemology of AI knowledge—examining the nature, scope, and limits of what AI systems can know and understand—has indeed been explored by various scholars across multiple disciplines. Here’s a detailed look at how this field has been approached and some of the key insights and contributions:

Conclusion

The epistemology of AI knowledge is a rich and multifaceted field that intersects with philosophy, cognitive science, computer science, and ethics. It addresses fundamental questions about the nature of knowledge and understanding in artificial systems, challenges the boundaries of current AI capabilities, and sets the stage for future innovations and ethical considerations in AI development.

At this point, I’ll simply note that if we agree that AI “intersects with philosophy, cognitive science, computer science, and ethics,” we must also acknowledge that each of those four disciplines, including computer science, is what we call a “discipline.” Humans practice such disciplines. Machines cannot, though I’m sure there are those who will argue that algorithms themselves are a form of discipline. To such objections I would respond that arguing is an activity reserved for humans who practice disciplines such as philosophy. Machines can respond to argument, but I would dare to claim that responding to argument is not necessarily arguing.

Before concluding, I wish to come back to a point ChatGPT made in our initial exchange when it recommended “finding and utilizing less mainstream data sources that offer unique perspectives and information, helping to enrich the training datasets.”

I find this refreshing if only because it amounts to an admission that anything AI offers us today is likely to reflect a mainstream bias. That admission should set off alarm bells. At the same time, we have noticed that a lot of politicians and corporate leaders would love to see entire populations not only exposed to but also confined to mainstream thought and mainstream ideology.

So here’s a question to mull over. Does the mainstream bias explain why they are so willing to invest in AI?

Your thoughts

As always, please feel free to offer your commentaries on any of the questions raised in this discussion. Simply drop us an email at dialogue@fairobserver.com. We’ll build your reflections into our own ongoing research.

*[Artificial Intelligence is rapidly becoming a feature of everyone’s daily life. We unconsciously perceive it either as a friend or foe, a helper or destroyer. At Fair Observer, we see it as a tool of creativity, capable of revealing the complex relationship between humans and machines.]

The views expressed in this article are the author’s own and do not necessarily reflect Fair Observer’s editorial policy.

MULTIMEDIA

Russians — The Great, the Gifted and the Terrible

VIDEOS

FO° Talks: Israel Strikes Iran. Tehran Hits Back. What Now?

PODCASTS

Making Sense of the Latest India–Pakistan Tensions

Outside The Box: Will AI’s Mainstream Bottleneck Prove Fatal?

SHARE

Saved Successfully.

Awareness of the AI Data Bottleneck Quandary

Cognitive Awareness: Understanding the Data Bottleneck

Performative Reflection: Implications and Future Directions

The big question: epistemology

Conclusion

Your thoughts

Comment

Support Fair Observer

Will you support FO’s journalism?

Donation Cycle

Donation Amount

You can also contribute via

Most Popular

Geopolitics by Design: Rethinking Power in the Age of Critical Technologies

Outside the Box: Mark Zuckerberg’s Digital Feudalism

Outside the Box: LLMs, Confucius and the Circular Trap of Western Individualism

Make Sense of the World

MULTIMEDIA

Russians — The Great, the Gifted and the Terrible

VIDEOS

FO° Talks: Israel Strikes Iran. Tehran Hits Back. What Now?

PODCASTS

Making Sense of the Latest India–Pakistan Tensions

Fill below form to share your concern with us

SHARE

Saved Successfully.

Awareness of the AI Data Bottleneck Quandary

Cognitive Awareness: Understanding the Data Bottleneck

Performative Reflection: Implications and Future Directions

The big question: epistemology

Conclusion

Unique Insights from 2,500+ Contributors in 90+ Countries

Your thoughts

Comment

Commenting Guidelines

Support Fair Observer

Will you support FO’s journalism?

Donation Cycle

Donation Amount

You can also contribute via

$5.00

Already have an account? Sign in

Payment information

Most Popular

Make Sense of the World

BOOKMARK

Sign into your Fair Observer Account

Forgot Password

Forgot Password

Become a Member & Enjoy Exclusive Benefits!

NEWSLETTER

Unique Insights from 2,500+ Contributors in 90+ Countries

NEWSLETTER

Make Sense of the World

Unique Insights from 2,500+ Contributors in 90+ Countries

Make Sense of the World

We Need Your Consent