Data Quality in AI

Bad Data, Bad AI: Fix The Foundations

February 28, 2025

The garbage in, garbage out reality

No matter how sophisticated your AI tools are, they're only as good as the data feeding them. Think of data as the fuel for your AI engine. Low-quality fuel? Your engine sputters and stalls.

As Mona Chadha of AWS aptly put it, the quality of an AI's predictions "depends strongly on the data used to train the models. Poor data quality can result in inaccurate results and inconsistent model behaviour". Outcomes no business wants.

Let's explore why data quality matters, practical strategies to improve it, and how proper data management can transform AI from a costly disappointment into a genuine competitive advantage.

The real cost of poor data quality

Companies spend millions on a state-of-the-art AI system only to watch it make expensive mistakes because it's processing faulty information.

Studies show poor data quality is the leading cause of AI project failures. IBM estimated that bad data costs the UK economy billions annually. For individual companies, the toll is just as stark: the average organisation loses £9.8 million per year due to bad data.

Beyond financial loss, poor data undermines trust. AI models reflect the quality of their training data – if that data contains errors or biases, so too will the AI's outputs. Amazon learned this lesson the hard way with a recruiting AI that became biased against women because it was primarily trained on male candidates' CVs. The tool "taught itself that male candidates were preferable," and Amazon had to scrap it entirely.

These failures aren't rare exceptions – they're the norm. Experts estimate 70-80% of AI projects fail, with poor data quality often cited as the primary culprit. Up to 87% of AI initiatives never even reach production deployment.

Consider these real-world examples:

  • Retail: Walmart's early AI for inventory management struggled with inconsistent product categories and incomplete sales data, resulting in costly inventory errors. These data problems directly cause excess inventory, stockouts, and delivery delays that hurt both revenue and customer satisfaction.

  • Healthcare: IBM's Watson for Oncology promised AI-driven cancer treatment recommendations but faltered because hospitals recorded patient information inconsistently. Different record formats and terminology meant Watson often gave unreliable suggestions. The issue wasn't the algorithm – it was the messy data feeding it.

  • Finance: Banks using AI for fraud detection found themselves overwhelmed with false alarms because they fed noisy transaction data into their systems. These institutions ended up "drowning in false positives," wasting analysts' time on non-existent threats.

The pattern is clear: AI's value is directly tied to data quality. Companies with clean, unified data reap far greater rewards than those with siloed, messy information.

Unorganised data: The untapped gold mine

Here's an important distinction worth making: unorganised data isn't necessarily bad data.

Unorganised (or unstructured) data simply means information that doesn't fit neatly into databases or spreadsheets – think emails, documents, images, sensor readings, or social media feeds. Poor-quality data, on the other hand, refers to information that's actually wrong, outdated, or incomplete.

Most organisations are sitting on mountains of unorganised data. Analysts estimate 80-90% of all data in organisations is unstructured. This represents enormous untapped potential, but surveys show only 18% of companies effectively leverage their unstructured data.

Think of unorganised data like crude oil – it needs refining, but there's tremendous value locked inside. With the right approach, you can transform seemingly chaotic information into structured insights without compromising its integrity.

For example, a retailer might have years of customer support emails. By applying natural language processing to categorise and tag these messages, they could reveal patterns in customer pain points or product issues. Similarly, a logistics company could combine GPS logs, weather data, and driver notes to improve delivery routes and forecast delays.

The key is data organisation. Even if information is scattered across various documents and systems, technologies like Retrieval-Augmented Generation (RAG) can pull the relevant pieces together for AI processing. But remember: if those source documents contain inaccuracies, even the best RAG system will retrieve incorrect information.

For AI success, businesses need both organised AND accurate data. Organisation without accuracy is dangerous; accuracy without organisation limits potential.

Practical strategies for better data quality

Improving data quality isn't a mystical process – it's manageable with the right approach. Here are straightforward strategies any business can implement:

  1. Start with a data audit: Begin by thoroughly evaluating your existing data. Where is it coming from? How is it collected? What gaps or errors exist? This audit will surface inconsistencies like duplicate records or missing fields, establishing a clear baseline for improvement.

  2. Break down data silos: When different departments maintain separate data systems that don't communicate, it's impossible to get a "single source of truth." Integrate data across your organisation using a central data warehouse or linked databases that share standards. Unifying information (like connecting in-store sales data with online purchases and supply chain records) ensures everyone works from the same accurate picture.

  3. Implement data governance: Establish clear policies, standards, and ownership for data management. Who's responsible for data quality in each domain? What formats and definitions should everyone follow? For example, set rules that all product entries must use standard units of measure and categories. Many companies now have a Chief Data Officer to champion governance; over 84% of firms surveyed have senior data leadership in place.

  4. Clean and enrich data continuously: Data cleaning isn't a one-off project – it's an ongoing necessity. Use data preparation tools to remove duplicates, correct errors, and fill in missing values regularly. Standardise formats (dates, addresses) across datasets. Consider enriching data by adding context – such as appending geolocation to addresses or sentiment scores to customer feedback.

  5. Structure your unorganised data: Invest in tools that organise and label unstructured data for AI use. This could include using natural language processing to extract key topics from text, image recognition to tag photos, or building an indexed knowledge base for documents. A customer service department might deploy a system that categorises support tickets by issue type, transforming unstructured text into organised insights that an AI assistant can leverage.

  6. Monitor quality continuously: Data management doesn't end when the model is deployed. Continuously monitor your data pipelines and AI outputs. Establish metrics for data quality (percentage of blank fields, frequency of inconsistencies) and track AI performance over time. If results start looking odd, investigate whether bad data has crept into the system. Many organisations implement periodic quality reviews or automated alerts that notify teams when potential issues arise.

  7. Foster a data-conscious culture: Technology alone won't solve everything – people and mindset matter. Encourage employees to treat data as a valuable asset. Simple training can have a significant impact: when people understand why accurate data matters (and the consequences of errors), they're more likely to be careful. Some companies even implement incentives around data quality. When everyone sees that better data leads to better decisions, maintaining quality becomes a shared responsibility rather than an IT headache.

By adopting these strategies, businesses can dramatically improve their data infrastructure. High-impact industries like retail and supply chain stand to gain enormously. Clean, unified data enables retail AI to accurately forecast demand, personalise marketing, and manage inventory, while supply chain AI can precisely predict delays, optimise routes, and prevent costly disruptions.

Whatever your sector, the principle holds: when your data house is in order, AI can deliver on its promise of efficiency, innovation, and insight.

In summary

Treat data quality as the foundation of your AI strategy. By recognising that AI is only as good as its data and taking concrete steps to improve that data, you set your organisation up for success rather than frustration. Clean, well-structured data allows AI systems to deliver consistent, trustworthy results that drive growth. The path forward is clear: invest in your data foundation. And if you need a trusted partner in that journey, we are ready to help turn your data into your greatest asset in the AI era.

Sources

Other Insights

Explore more of our thoughts and insights.

Want to learn more?