Auto insurance companies everywhere are enthralled by a sweeping new wave of big data analytics based on the systematic observation of meaningful patterns in their customers’ driving habits. Collectively, this discipline is known as Telematics—a term commonly associated with a multi-faceted approach to collecting and processing asset telemetry with the purpose of identifying areas of business improvement as well as growth opportunities.
This concept, borrowed from earlier attempts at reducing the total number of claims, swivels on the increased availability of sensory hardware to deliver an entire set of enhanced predictive models. For years now, the use of telematics has also been keenly suggested in areas ranging from collision avoidance, otherwise referred to as near-miss telematics, to in-vehicle mayday systems, although a handful of more pedestrian use cases points to fuel efficiency metrics based on accelerometer data. Telematics is also the technological bedrock on which financial incentives such as pay-as-you-drive insurance policies are built.
With the availability of so many indicators signaling vehicle usage and driver behavior, data contextualization is key. Part of what makes Gigasheet so unique in this space is its ability to handle volumetric information quickly and seamlessly. As hinted, this is precisely the sort of scenario required when trying to discover hidden patterns in the data, leading to improving analysis.
With this principle in mind, we’ll explore the value of cost-estimating efforts like UBI (short for Usage-Based Insurance) using a proprietary dataset courtesy of a Canadian-based insurance agency, whose portfolio of 70,000 policies generated between 2013 and 2016 served as the groundwork for a University of Connecticut paper on Synthetic Dataset Generation anchored in real-world telematics, with some noted deviations.
The variables that we are mainly interested in suggest that instances such as “Brake.xxmiles”, or “Accel.xxmiles” (to be discussed ahead), can be accurate covariates of driver profiling. With this information on hand, we can start answering relevant questions using our telematics data in true insurance agency fashion; but first things first.
In Gigasheet, loading a few hundred records, or even millions for that matter, in any given format (e.g., CSV or ZIP) normally takes mere seconds:
And here is the uploaded file in our Gigasheet library:
To help us advance our visibility of the policyholder population (100,000 in total), we can examine some traditional types, like gender—this can be easily accomplished using Gigasheet’s grouping capabilities:
The grouped data:
The same can be said for “Car use”: let’s see how many categories we have; again, we’ll use the “Group by” property, this time showing the median for both car and driver age as subgroups. The results are as follows:
Doing aggregations in Gigasheet without the need for complex formulas is as intuitive as selecting the column in number format that you wish to work with (the operand), and selecting the operation from the resulting dropdown.
For example, the “Sum” operation on “NB_CLAIM” above, or the total number of processed claims during observation, gives us an idea of what marital status kind had the most insurance claims overall. Other popular functions in this selection include average, median, min, and max.
Finally, let’s take a look at some of the telematics information we have available, and whether we can derive any significant insights from it. For instance, as an underwriter, I’d probably be interested in understanding the relationship between car use (e.g., commercial) and claim amounts across all age groups—this effort can be part of a larger goal aimed at consolidating pricing models, to name one purpose.
From the chart below, we can confirm that vehicles of type “Commercial”, on average, have the largest claim amounts per observation period.
Car use vs. average claim amounts
There is, however, an easier way to achieve similar results: Pivot Tables—a no-nonsense, hassle-free and fundamental way of contextualizing data for quick analysis.
With Gigasheet, the addition of Pivot tables is as intuitive as dragging and dropping the fields that need transformation onto the appropriate user interface areas, removing most of the guesswork and heavy lifting commonly associated with this stage.
Another distinctive scenario may look like this: Driver X is an xx-year-old customer whose combined sum of sudden acceleration and brake events (per every 1000 miles) show that he or she is above the nth percentile—in terms of risky behavior—compared to a sample population of drivers in the same age group.
With Gigasheet, grouping and sorting to obtain this information can be performed in a matter of seconds—perhaps surprisingly, it so happens that not one, but three 87-year-old drivers met the above criteria, placing drivers 28620, 32139, and 50019 well above the 90th percentile:
Exporting these results to match the exact filtering and pivot conditions we’ve established is as easy as clicking on the “File” → “Export” functionality; consequently, the resulting file will not contain anything other than the columns and fields we’re interested in.
The exported file ready for download:
Uploading the resulting CSV file to a data visualization tool like Rawgraphs, can help us make further use of our data for cross-validation purposes if necessary. As seen below, our three 87-year-old customers are indeed part of an “elite” group of very risky drivers!
Proportion of sudden braking and acceleration events based on age
Beyond simple cost optimization or asset tracking, commercial Telematics is under immense pressure as the entire transportation domain undergoes global scrutiny. With sensory-rich applications being the driving force behind any prospective solutions, the field is bound to expand in both size and complexity as we struggle to gain valuable insight from the very data models we employ.
As readers of this blog are well aware, Gigasheet affords the long-standing promise of giving its users an adequate level of data analysis proficiency both quickly and intuitively; an assurance unrivaled by any other off-the-shelf product at the time of this writing.