In recent tests, Bard, Google's AI chatbot, hallucinated 91.4% of the time, failing to retrieve any relevant papers and achieving 0% precision in its responses, according to a study published in Hallucination Rates and Reference Accuracy of ChatGPT and Bard. Critical information generated by such models can be fundamentally misleading, carrying significant risks for users relying on its factual accuracy.
AI is often heralded as a solution to global challenges and a driver of efficiency. However, its current iterations are prone to fundamental errors and biases that can actively harm users and widen societal divides.
Companies and governments are increasingly deploying AI for critical functions, yet without stringent oversight and a fundamental shift in development priorities from fluency to accuracy and fairness, the technology risks eroding trust and deepening existing societal inequities.
Understanding AI's Factual Unreliability
Leading AI models frequently fail to provide accurate or complete information, challenging their perceived reliability. A study found that hallucination rates were 39.6% for GPT-3.5 and 28.6% for GPT-4, while Bard's rate was a striking 91.4%, according to Hallucination Rates and Reference Accuracy of ChatGPT and Bard Precision rates for these models also varied significantly, with GPT-3.5 at 9.4%, GPT-4 at 13.4%, and Bard at 0%.
These figures reveal that despite rapid advancements, even widely adopted AI models frequently generate convincing falsehoods rather than factual data. The technology, instead of augmenting human capabilities with reliable insights, can actively disseminate misinformation, impacting decision-making across various sectors.
Defining AI's Core Flaws: Hallucinations and Biases
AI's errors are not merely minor inaccuracies; they can manifest as outright fabrications or systemic distortions, particularly in critical fields like medicine. For instance, models like ChatGPT can fabricate citations or misinterpret clinical guidelines, as reported by Nature. A hallucination could be as severe as inventing a medical paper, defined by researchers as incorrect information in any two of the title, first author, or year of publication.
The study also noted geographical and open-access biases in the papers retrieved by the LLMs, indicating that certain regions or types of information are underrepresented or misinterpreted. Biases often stem directly from the data AI models are trained on, leading to outputs that reinforce existing inequalities or misdiagnose needs, especially in underserved populations.
The Root Cause: Prioritizing Fluency Over Fact
The fundamental design choice to prioritize fluent, confident responses over factual accuracy is a primary driver of AI hallucinations. Many Large Language Models (LLMs) are optimized to maximize response fluency, prioritizing plausible guesses over admitting uncertainty, according to Nature. Hallucination is often a feature of their design, not a bug.
Companies deploying unmitigated AI are not merely risking occasional errors, but actively embracing a system engineered to generate convincing falsehoods. The trade-off for conversational ease is a direct compromise on reliability, which inevitably leads to significant trust issues and propagates incorrect information on a wide scale, undermining the very utility AI promises.
Addressing the Flaws: Mitigation Strategies
Targeted architectural improvements offer a promising path to significantly enhance the factual reliability of AI outputs. Retrieval-Augmented Generation (RAG) systems, for example, restrict AI to responding only based on retrieved and verifiable biomedical sources, significantly reducing hallucination rates, as detailed in Nature. Retrieval-Augmented Generation (RAG) directly tackles the issue of AI fabricating information by grounding its responses in external, credible data.
While RAG offers a technical patch to reduce hallucinations by restricting AI to verifiable sources, it sidesteps the core issue that many LLMs are fundamentally optimized for fluency over truth. Un-augmented AI will likely remain unreliable, making mitigation strategies critical for any deployment requiring high factual accuracy.
Beyond Errors: The Broader Societal and Economic Impacts
The uncritical deployment of AI, particularly in sensitive sectors, risks exacerbating existing inequalities and disrupting labor markets. AI can reinforce biases, misdiagnose needs, or result in flawed decisions without appropriate safeguards, according to the World Bank. The uncritical deployment of AI is especially true when considering its deployment in lower-income countries.
The World Bank also indicates that AI could widen the gap between high- and lower-income countries due to requirements for computing power, data, and skills. The PMC study's revelation of models like Bard achieving 0% precision and 91.4% hallucination confirms that uncritical AI adoption will not bridge global skills gaps. Instead, it will deliver a flood of unreliable, biased information, disproportionately harming lower-income countries that lack the resources for verification.
AI's Promise: Filling Critical Gaps
What are the potential benefits of AI in education?
While AI faces challenges, it can address teacher shortages and provide personalized learning experiences, particularly in remote or underserved areas. The World Bank notes AI's potential to fill skills gaps in education services. AI tutors or adaptive learning platforms tailored to individual student needs could mean a future where educational access is less constrained by geographical or economic barriers, provided accuracy is guaranteed.
How can AI improve health services globally?
AI holds promise for enhancing diagnostics and treatment in health services, especially where medical professionals are scarce. For instance, AI could assist in analyzing medical images or managing patient data, thereby extending expert capabilities to more regions, as highlighted by the World Bank. A significant reduction in diagnostic delays and improved patient outcomes in underserved areas is implied, contingent on rigorous validation and bias mitigation.
What role does AI play in bridging global skills gaps?
AI can act as a force multiplier by automating routine tasks and providing accessible training, helping to bridge skills gaps across various sectors. Human workers can focus on more complex problems, creating new efficiencies in areas like technical support or data analysis. AI could democratize access to specialized skills and elevate human capital, if deployed with a focus on equitable access and verifiable information.
Navigating the Future of AI: A Call for Vigilance
The trajectory of AI's societal impact will largely be determined by a collective commitment to address its fundamental flaws. Without significant improvements in reliability and bias mitigation, AI's deployment in critical sectors appears poised to exacerbate existing problems rather than solve them, particularly in underserved regions.
Developers and policymakers must prioritize accuracy and fairness over mere fluency in AI design and deployment. Google's Bard, for example, will need to demonstrate substantial improvements in factual precision by late 2026 to rebuild user trust and avoid further entrenching digital divides.










