John Davison, Chief Information Officer, speaks to ITProPortal about the opportunities data lakes can have in improving insurance fraud detection.
There has been a great deal of debate recently about how motor insurers can best clamp down on false whiplash claims. The Ministry of Justice (MoJ) has recently pledged to change the way whiplash claims are handled by setting fixed compensation amounts for claims and banning the practice of settling cases without medical proof.
The Civil Liability Bill, which is still going through parliament, is raising a number of questions about how to best tackle and discourage insurance fraud. Citing whiplash injury has often been a common way for fraudsters to extract money from car insurance providers, on the basis that whiplash can be hard to prove, as it involves no broken bones, and doesn’t always have the vehicle damage to accompany it.
Is whiplash insurance fraud being properly detected?
Whilst the UK government is exploring ways in which to tackle the situation, questions are being posed around whether fraud of this kind is being adequately assessed. It’s in the highest interests of insurers to tackle insurance fraud, so that companies can afford to set the best possible price for their honest, loyal policy holders, who may genuinely need to claim for personal injury, and who deserve the best deal in the market; one that has not been inflated by a small minority of opportunists.
The industry must therefore develop better systems to detect all types of insurance fraud at the earliest possible opportunity, by properly assessing risk. The answer may lie in part with ‘data lakes’.
How can data lakes play a part in improving the insurance fraud detection process?
Thanks to constantly evolving technology, customer fraud is being detected more rapidly by insurance companies – sophisticated data analysis systems are making it easier for insurers to spot anomalies in patterns of data, often associated with insurance fraud. Data lakes, and the supporting capabilities will enable insurers to go further, by providing access to a vastly wider pool of data and may hold the answer to more efficient fraud detection.
What are data lakes?
A database can typically only store structured data – that is, data which has known attributes which can be placed in the relevant columns and rows. By definition, any database storage system must understand the data being stored. A data lake however is able to store information in its raw form and thus does not need a predefined model. Typically, a data lake contains both structured and unstructured data and it is this defining principle which also supports the other key difference; size.
Data lakes, as the name suggests, are vast, capable of storing inconceivably large amounts of data efficiently as there is very little computation required to place the data onto disk. By storing data in raw form, an insurer is able to later analyse and interpret existing or new data in a wider variety of ways, without having to define those requirements at the point the data is physically stored.
Capture everything, discard nothing
So how can data lakes help insurers? As technology changes and systems become more complex, a database storage methodology creates a trade-off; keep pace with the change by altering the database structures to ensure no data is lost, or accept that the new data not in the predefined format will not be available to the analytical models. With a data lake and its principle of storing data in raw form, the data assets may change over time – without necessitating changes to the storage routines.
An insurer with a ‘capture everything, discard nothing’ approach to their data strategy is able to be on the front foot when detecting anomalies in systems which are constantly changing. A few thousand policy sales in a single day will typically generate over one million new data points with each representing a change to the data being captured. Those changes could be as significant as paying for a policy or as simple as navigating between pages on a website. By leveraging the scale of data lakes an insurer is able to capture and store each data point, not just the final insurance policy – which vastly improves insight.
However, this model of capturing everything and discarding nothing creates huge demands on data storage systems, requiring more and more sophisticated technology and processes. Producing and analysing such high volumes of data requires infrastructure than can scale horizontally (adding more servers) and vertically (adding more capacity to existing servers). Doing so cost effectively also mandates the insurer has good quality systems and good quality metrics. Thanks to the development of data lakes facilitating the flexible storage of extremely large volumes of data insurers are better placed to detect fraud.
In what way can data lakes help detect and fight fraud?
As with most risks, the risk of fraud can be modelled by an insurer. These models look at historical outcomes and apply scoring mechanisms over new data. Thanks to a wider pool of data, insurers can use more factors to refine the accuracy of the fraud models. Those customers wishing to commit insurance fraud, perhaps misleading the insurer with a whiplash claim, are likely to have other indicators present in the data collected at the time the policy was written.
Data lakes enable insurers to analyse more data, in more depth, using models which can be refined over time as the systems above them change to meet the demands of the market. By constantly and iteratively adding new factors and new data relationships, an insurer can spot patterns in data that would not otherwise have been detectable from a database.
Combination is best
Whilst it is important for the insurance industry to recognise the significant value of data lakes, it is important to note that it is best for insurers to retain a combination of data lakes and databases. Databases are still valuable, in providing structured ‘certified’ data. However, many believe that data lakes are the way forward, and will continue to play a growing role in data storage processes, with the insights generated set to transform the way in which an insurance business operates.
Traditional underwriting versus data lakes
Data lakes are indeed valuable tools and should be used as such. They can help an insurer leverage the age-old principles of statistical modelling across huge new data sets, increasing the breadth and depth of modelling to improve accuracy. The insurance industry is predicated on an understanding of the risk of loss and by leveraging data lakes effectively, the insurer is able to more accurately quantify that risk.
By using emerging technologies such as machine learning and artificial intelligence over these vast new data stores the insurer is also able to refine its models quickly. What could take a traditional underwriter a month to change can be completed in minutes using the power of cloud computing and machine learning.
Data lakes won’t clean up whiplash claims overnight but will certainly help in the fight. False whiplash claims may never be completely eradicated in the motor insurance sector because of the ambiguity around their diagnosis, and the varying severity of motor incidents that cause them. However, data lakes will help identify those who are more inclined to trick the system. Over time, insurers will be able to make more accurate predictions based on data and that will continuously improve insurance fraud detection.
Putting customers first
Data lakes should be considered by all insurers looking to improve the customer experience. Early detection of insurance fraud ultimately reduces the premiums our policy holders pay by minimising the amounts paid out to undeserving customers. Policy holders deserve a fair price, and this is inextricably linked to the insurance company protecting itself from insurance fraud. This will undoubtably mean investing in more sophisticated data storage, analysis and detection systems. In the future, data lakes and related technologies may very well be the difference in the insurance industry.