Normal Deviance – Burying the GLM

My previous column praised the virtues of the generalised linear model. This month we dismember it and look to more modern approaches to fill the void.

I love generalised linear models (GLMs) and still fit a lot of them. However, they were developed in a world very different to today’s—datasets were smaller (both the number of records and the number of variables), computers were slower, and applications were not as extensive. This context shows—GLMs do carry a number of important limitations:

  • Time to fit: Choosing the structure of a GLM is a fairly manual process. Not only do you have to decide which predictor variables to leave in or out, but there are a host of subsidiary questions. Do we need to add splines or polynomial effects? Do we need to group up categorical levels? Do we to search for 2-way or 3-way interactions? This all takes time. Scaling analysis to cover additional variables and new models means that this time cost becomes prohibitive.
  • All-or-nothing effects: Effects in a GLM are either in or out—either you add a risk term for drivers of a certain car or not. This creates a degree of ad hoc judgement. While facilitating human judgement is good, the choice is not always clear cut. There will be effects that are marginally significant but material, and the all-or-nothing approach is not ideal. It is far better to allow for an in-between parameter value, as is often the case in other algorithms.
  • Distributional contortions: GLM fits can go awry if the underlying distribution of the response variable is not close to what is assumed. This is a particular problem for modelling continuous response variables—I have fit many gamma models, but have rarely (never?) seen a distribution that would satisfy a formal test for the gamma distribution. The problem is often in the tails. GLMs usually assume thin-tailed distributions so it’s common to have too many unusually large or small values.
  • Dangerous extrapolation: GLMs seem particularly prone to one type of error where the combined effect of two strong linear terms leads to an implausibly extreme prediction (too large or small). Because these edge cases tend to have fewer data points, they can be overlooked in diagnostic checks but will cause problems down the track when the model is implemented. Capping variables and checking interactions for particularly strong effects can help, but a model structure that naturally prevents the problem would be better.
  • Accuracy: While the predictive accuracy of GLMs is often commendable, it cannot offer the depth and complexity that modern algorithms do. In some contexts, the extra performance counts.

So if we are to discard GLMs, what do we replace it with? There are lots of options—the whole area of prediction has been an active research area for decades. The answer can vary too. If interpretation is important, than a decision tree or a GLM could work. If accuracy is paramount, then a more complex model such as a decision tree ensemble or neural network might be best. If something has to adapt and evolve over time, a model built within a Bayesian framework might be best.

My personal interest is in extensions of the GLM that overcome some of its shortcomings. Penalised regression is a good example, where some of the challenges around variable selection and all-or-nothing can be overcome. However, this often sacrifices some of the GLM’s advantages, including inference on parameters and some of the more subtle user control. Perhaps one day we’ll reach nirvana where one framework will cover the majority of problems. But for the foreseeable future, we’re in a world where a toolbox of approaches is needed.

CPD: Actuaries Institute Members can claim two CPD points for every hour of reading articles on Actuaries Digital.