
Ask many corporate leaders about the foundation for their organization’s success with artificial intelligence and they will likely talk about using the latest models with the most sophisticated algorithms to generate “game-changing” insights. However, this laser focus on choosing the best AI model is flawed.
Inside the ‘Dirty Data’ Dilemma
AI models depend on data to deliver trusted outputs, which is why data accuracy and cleanliness are essential. Yet, as insurers and other businesses strive to adopt AI and remain competitive, they often focus on an algorithm-first approach. However, without equal attention to data, proving return on investment can become a real challenge.
Considerations for Risk Managers
Just a few years ago, companies focused on “big data,” trying to collect as much information as possible. To do so, they used a process called “extract, transform and load” (ETL) that flattened all incoming data into a single, clean version before it was used.
For AI, ETL no longer makes sense. When you standardize data before it is analyzed, you end up training AI models on diluted, one-size-fits-all information that lacks context. This approach makes it difficult for companies to verify decisions made by AI tools.
A smarter approach is to seek carrier partners who preserve all raw data, no matter how ugly and dirty it might be. Companies that do so can take the best of that raw data, remove any anomalies or biases, and then train their AI models to understand and enhance specific use cases, such as streamlining the underwriting process. Doing so reduces risks and preserves raw data for future use, if needed.
Many carriers and organizations rely on optical character recognition (OCR) solutions to extract unstructured data from policy forms and claims submissions and turn it into a structured format. Just because data is extracted, however, does not mean it is ready for AI.
OCR used in isolation does not ensure that your data can deliver an accurate underwriting or claims decision, nor does it remove bias from historical inputs. It only structures the data. Carriers must decide how that information is governed and contextualized for AI engines.
This is where integration matters the most. Carriers and organizations with agile, cloud-based core systems can embed AI easily, meaning they can implement OCR tools with AI features that also clean and enrich extracted data, thereby improving accuracy. Organizations and carriers with legacy systems cannot do this, which means their AI-related risks are amplified.
When sensitive data such as PII leaves an organization’s direct control, risk managers must understand where it goes, how long it is retained, who can access it, and what happens if something goes wrong.
If your organization or partner sends PII to an external AI tool such as ChatGPT, that tool becomes a data sub-processor. Once that happens, your company may be required to disclose the relationship and ensure that the sub-processor complies with all required privacy and security regulations. The organization also assumes responsibility for how that data is handled downstream. Risk managers must understand these potential pitfalls and develop strategies to address them.
As organizations and carriers advance from generative AI, which requires user input at every step, toward agentic AI, which creates step-by-step processes autonomously, they must determine when humans need to intervene.
Risk managers should verify that their organizations and carrier partners have proper human-in-the-loop controls in place so humans can verify accuracy, especially as automation expands to complex use cases. At a minimum, organizations using AI to automate multi-step processes like underwriting and claims should require the same level of decision-making criteria within those processes that humans use.