Exploring the balance between interpretability and performance with carefully designed constrainable neural additive models

The interpretability of an intelligent model automatically derived from data is a property that can be acted upon with a set of structural constraints that such a model should adhere to. Often these are in contrast with the task objective and it is not straightforward how to explore the balance between model interpretability and performance. In order to allow an interested user to jointly optimise performance and interpretability, we propose a new formulation of Neural Additive Models (NAM) which can be subject to a number of constraints. Accordingly, our approach produces a new model that is called Constrainable NAM (or just CNAM in short) and it allows the specification of different regularisation terms. CNAM is differentiable and is built in such a way that it can be initialised as a solution of an efficient tree-based GAM solver (e.g., Explainable Boosting Machines). From this local optimum the model can then explore solutions with different interpretability-performance tradeoffs according to different definitions of both interpretability and performance. We empirically benchmark the model on 56 datasets against 12 models and observe that on average the proposed CNAM model ranks on the Pareto front of optimal solutions, i.e., models generated by CNAM exhibit a good balance between interpretability and performance. Moreover, we provide two illustrative examples which are aimed to show step by step how CNAM works well for solving classification tasks, but also how it can yield insights when considering regression tasks.

keywords: Generalised Additive models, Explainable AI, Explainable Artificial Intelligence, Interpretable Artificial Intelligence, Interpretability, Accuracy-interpretability tradeoff