Over the past few years, the use of AI solutions such as ChatGPT, Google Gemini, Perplexity, and many others has increased drastically around the world. Google even integrate AI overviews into search results now. However, how reliable can these be?
In early 2023, Google’s new Bard (now Gemini) chatbot made headlines for confidently offering a false answer about the James Webb Space Telescope, claiming a discovery that never happened. This is just one of many examples of publicly available generative AI tools hallucinating answers to user queries. An issue that has become so common that it has now become its own term, “AI hallucination...”
Simply put, a hallucination is when an AI tool, like a large language model (LLM), confidently produces an incorrect answer, sometimes even a completely fabricated one, and even cites a research paper or website/domain address that doesn’t exist.
As businesses have raced to embrace these new AI toolsets, these glitches have raised understandable concern. After all, if an AI can just make things up, how can your business trust it?
A recent global survey found nearly 47% of enterprise businesses had executives who unknowingly based a major decision on AI’s false output. That sounds scary, but it’s a wake-up call: if AI is going to be in the decision loop, we must ensure it’s as accurate as possible.
Thankfully, businesses are responding to this growing concern. Today, 76% of enterprises include a human review of AI-generated content before it goes live. In other words, companies are treating AI a bit like a new junior employee, double-checking its work until it proves itself.
However, before we dismiss AI for occasional fakery, it’s worth remembering that your human employees are far from infallible, too. From typos in documents to misreading a contract, mistakenly typing a plus instead of a minus in a spreadsheet formula because the buttons are next to each other, or even mistakenly tagging the wrong person/business in a social media post, the human element of your business is just as vulnerable to making mistakes.
But where do we draw the line? And how can businesses strike the balance between embracing AI solutions, securing their data, improving productivity, efficiency and having that peace of mind?
“AI hallucinations” have become a key consideration for many businesses looking to adopt generative AI solutions into their workflows. The problem is that these are not malicious lies, only the AI’s tendency to generate plausible-sounding answers when it doesn’t know the facts. Rather than saying “I do not have the data to confidently answer your query,” it tries to produce a plausible-sounding answer to your question.
Only a couple of years ago, these “AI hallucinations” were alarmingly frequent. A mid-2023 study found that OpenAI’s then “GPT-3.5 model” made up information in 40% of test cases. However, reports showed that the more advanced GPT-4 reduced that hallucination rate to about 29%. Despite this, many users would still regularly report these and other models producing completely fictitious responses.
It is important to consider that OpenAI’s models were not indicative of overall industry performance, with other models from other providers found to be hallucinating answers over 90% of the time.
No, that isn’t what we are saying. When using public models, it is important to always verify the responses provided.
These AI systems have been getting far more reliable, but you should always ensure that any information provided is accurate. Industry-wide analyses show the rate of AI hallucinations plummeted by 32% in 2023, then 58% in 2024, and some models reduced that by another 64% in 2025.
As of 2025, models such as ChatGPT and Google Gemini are integrating Deep Learning and web searching models. Whilst these models take far longer to produce their results, there are already a few AI models with sub-1% hallucination rates because of the deeper on-demand research that takes place, and an accuracy level approaching that of a well-trained human analyst.
However, with the vast amount of data that people have been publishing unverified from these models in recent years, the danger now becomes, how much of the data produced from these models is based on AI-poisoned data?
A Generative AI model can produce results from a source that has previously used AI to create their own data with hallucinated statistics and sources.
Since these models were launched to the public only a few years ago, researchers and engineers have been focused on the problem of ensuring AI factual accuracy. Major efforts from institutions such as the University of Oxford are finding ways to detect and prevent AI-poisoned data before it reaches the end-user.
In a study published in 2024, Oxford scientists demonstrated a method to tell when a generative AI or large language model is likely “just making something up” versus giving a trustworthy answer. This has become an essential advancement as hallucinations remain a top concern for businesses, holding back wider AI adoption in fields where accuracy is of the utmost concern in data analysis, such as legal services & healthcare. Inaccurate AI output in industries like these isn’t just embarrassing, it can be dangerous or costly.
Imagine an AI assistant that fabricates a legal precedent, a previous case, or even a patient symptom; the fallout could be severe.
It is essential that users remember that even with modern advancements, AI outputs can appear to be extremely confident and convincing, even when wrong or completely fabricated, which makes vigilance surrounding the data produced important. It is essential that users validate information, data, and sources, rather than trusting them on face value.
In one experiment, researchers asked ChatGPT to generate scientific abstracts and had experts try to identify which were real. The result? Human reviewers struggled to tell the AI-generated research apart from actual, genuine articles without a more in-depth review of the produced results. This underscores why business leaders can’t simply take a public AI’s answer at face value.
Gartner analysts warn that AI hallucinations that go unchecked and verified could mislead decision-making and damage a brand’s reputation. Every falsehood an AI produces that your businesses publish, no matter how confident it appears to be, risks eroding trust among your customers and even employees.
Think of your most trustworthy employee, the one you can always go to for an honest opinion or for information on their most knowledgeable subject. Now, imagine if that person had made something up when you asked them, and you had been called out on it; you would double-check all their work going forward, too.
The silver lining is that today’s best AI models, when professionally trained and used, are the worst they will be in the future. They’re being built with larger, cleaner data sets and better guardrails to ensure the accuracy of data.
It is important to note that OpenAI acknowledges that tackling hallucinations is an ongoing research priority as models evolve. Newer 4o and o4 models now include Deep Research tools that, whilst they take longer, do vastly reduce the number of errors and hallucinations.
The broader generative AI industry has similarly responded by pivoting its tools toward techniques including Deep Research and active web searches that make AI outputs more accurate as they are providing information based on real-world search results and data from across the internet that can be verified with multiple sources.
This is why we stress the importance of tools such as Microsoft Copilot in your business, as they are rooted in your business data. Referencing files and databases that your employees have access to, to be able to produce answers and content that are truly based on your business and the data you hold.
Okay, we have talked for quite a while about the hallucinations that have plagued generative AI services over the past few years. But it is important to consider that we as humans are fallible! Let’s go back to our article on human errors being one of the biggest security holes for businesses; humans make mistakes... a lot of them.
It is important to remember that entire business departments and protocols exist to catch and mitigate human error, precisely because even the most diligent of people slip up.
One study of data handling found that, on average, 4 out of every 100 data entries are wrong, a 4% error rate for humans manually transcribing data. Even well-trained staff members operating with care might only achieve 96–99% accuracy on routine data tasks. But this is not the only example, let's look at some others;
Yes, some of these are extreme cases, but they reinforce the facts: human errors can be just as problematic as AI hallucinations and are often far more expensive!
Even in day-to-day operations, human errors carry hidden costs. Mistakes in spreadsheets, misfiled documents, or forgotten follow-ups all impact productivity and profitability. This then leads to new processes within businesses that mean spending additional time double-checking and correcting each other’s work.
One report estimates businesses spend 20% of their time (or budget) just detecting and fixing human errors in processes like logistics and data management.
For compliance and quality assurance, many businesses implement multilayer approval workflows to catch human errors. This redundancy is necessary but inefficient, and even though these workflows... mistakes still happen. Businesses are now looking to AI to help them mitigate these errors by allowing these tools to access their business data directly to automate certain tasks and then be verified by users, leading to a reduction in mistakes.
So, AI can hallucinate, and humans can make mistakes... but that is not what we are looking to find out. What is more important is which errors are more frequent, or riskier, and how can we minimise them?
The business landscape is shifting quickly, and businesses are looking to adopt AI solutions and automations to improve productivity and efficiency, but they do not want to risk inaccurate information. Thankfully, as we have mentioned, the top AI models’ accuracy has improved dramatically in just the last two years, especially when rooted in Deep Research or directly from data provided to them.
Of course, context matters, and we are by no means saying that AI should replace human roles... merely complement them to improve productivity and efficiency. AI might outperform a person in factual recall, data analysis, or reporting accuracy, but a person might still have better judgment in situations that require more... common sense.
It is important to remember that AI should not be a replacement for your employees, but rather to enable them to succeed and become more productive.
One of the biggest ways that businesses are seeing the benefits of AI, while minimising risks, is by using integrated AI assistants within their own secure environment and using their own business data. Microsoft 365 Copilot is a prime example of how to implement this securely, effectively, and easily within your business tenancy (your own walled garden).
Instead of pulling information from across the entire internet (which could be biased or not have accurate information for your queries), Copilot works from within your business tenancy. This means that it only has access to your business’ siloed internal data and resources that you permit, and is even restricted by which users have access to different data.
Because the Copilot’s knowledge is grounded in your actual documents, emails, calendars, intranet, and other data, and not the wild west of the wider internet, this greatly reduces the chance of hallucination.
If Copilot is asked a question about last quarter’s sales figures, it will fetch the answer from sales reports in SharePoint or Excel, rather than guessing or going off-script. It won’t randomly fabricate a number because it can find the real number.
Essentially, Copilot cuts off the AI’s temptation to improvise, by anchoring it to a library of truth that you control. There are several key benefits to this approach, especially when comparing an integrated tool like Copilot to external third-party AI services:
Research by the Adecco Group shows that each of your employees (who effectively use AI) saves at least an hour per day, with 20% involved in the study saying they saved 2hrs+.
Data stays within your tenancy: Microsoft 365 Copilot operates entirely inside your business’s secure cloud environment, so sensitive business information never leaves your business’s secure data infrastructure during AI processing. As we have seen with public AI chatbots, the security implications of sharing confidential information with these models can be severe.
Built for privacy and compliance: By Design, Copilot is built to meet strict compliance requirements around the world, and we work with businesses to extend those security measures by ensuring businesses follow strict policies of least privileged access for these regulations and industry certifications (including GDPR, Cyber Essentials & ISO27001).
The AI model doesn’t send you or your business data to any third-party service; it remains encrypted in your business cloud environment. Copilot automatically inherits your business’s security, compliance, and privacy policies and will only surface data that a user is permitted to see. AI assistance without compromising data privacy or sovereignty.
A massive 95% of data breaches can be traced back to human error, often from employees inadvertently mishandling data. This could be by sending a file to the incorrect email, using an unsafe app, or even uploading data to an untrusted source that does not securely handle your data.
Did you know that tools like ChatGPT train their models on the data that you upload by default? By adopting an integrated AI tool like Copilot, you eliminate the need for users to resort to unsanctioned tools, where they might upload or paste company information into a third-party service. This keeps sensitive data within approved channels and greatly reduces the chances of accidental leaks.
Your Data, Verified Answers: Because Copilot exists within your business environment, with access to your business data, it delivers answers that are rooted in that data, often with citations of the source document it used. It is easy to see why Copilot gave an answer and quickly verified the details against the original source, just as they would fact-check a colleague or a source in a news article, except Copilot makes it easier by pointing directly to the evidence.
The answers produced by a tenancy-based AI model, such as Copilot, build confidence among your employees in the tools they are using. Decisions can be made faster and with greater trust in the information at hand since there’s less need to check and double-check everything.
Unlike a human, the AI doesn’t get tired or rush through a task at 5 PM and make a careless mistake. Copilot can even catch many errors in your own reports, data, and communications. For example, spotting inconsistencies or pulling the latest figures when a human may accidentally reference outdated data. Copilot empowers your team to succeed by delivering speed, accuracy, and attention to detail in all its responses, ensuring your team can work more accurately and effectively.
Microsoft 365 Copilot (and similar tenancy-based integrated AI solutions) are bridging the gap between human error and AI hallucination. These AI tools can recall your business data with speed and consistency, and your team can understand the nuance, context, and final sense-checking of information the AI produces and apply it to their workflows.
It is easy to see the benefit that a tenancy-based AI solution can provide to your business. By rooting itself in your business data, ensuring that your employees can access and report on (only) the data they need to accomplish their roles effectively, and keeping your data secure, these tools can empower your business to succeed.
However, it is essential that before your business implements solutions such as Microsoft 365 Copilot, you ensure that your data is secure, that your infrastructure is ready to support these tools, and that your team understands the benefits of using these new tools.
At TwentyFour IT Services, we have been supporting businesses to implement new AI and automation tools to improve their business efficiencies. If you would like to find out more, book a meeting with us or take our AI-readiness check.
    Help Desk