Argumentative Conversational Agents for Explainable Artificial Intelligence

Recent years have witnessed a striking rise of artificial intelligence (AI) algorithms that are able to show outstanding performance. However, such a good performance is oftentimes achieved at the expense of explainability. Not only can the lack of explainability undermine the user's trust in the algorithmic output, but it also can cause adverse consequences. In this thesis, we contribute to extending the body of knowledge in the research field of Explainable AI (XAI). We advocate the use of interpretable rule-based models that can serve both as stand-alone applications and as proxies for black-box models. More specifically, we design an explanation generation framework that outputs textual factual and counterfactual explanations for interpretable rule-based classifiers. As such, counterfactual explanations suggest minimal changes in feature values for the classifier's prediction to change in the desired way. In addition, we model a communication channel between the explainer and the explainee (i.e., the user) to effectively convey the automatically generated explanations. Implemented in form of a conversational agent, it enables the user to explore the explanation space in its entirety for him or her to be able to make an informed decision about the given prediction. As part of the present thesis, we first perform a thorough literature review of the state-of-the-art contrastive and counterfactual explanation generation algorithms as well as that of the corresponding theories and inspect a degree of their interconnection. Relying on the insights from the literature review we design a model-specific method for counterfactual explanation generation for interpretable classification systems (e.g., decision trees and fuzzy rule-based classification systems) operating on the internals of the classifier. In addition, we design and implement a model-agnostic counterfactual explanation generation method based on a genetic algorithm. Subsequently, we propose a metric of perceived explanation complexity that allows us to assess how complex the given textual explanation appears for the end user. We empirically evaluate the proposed methods by performing a comparative analysis of the human evaluation rankings obtained via two surveys offered to end users. We show that the proposed metric of perceived explanation complexity correlates well with such explanation aspects as informativeness, relevance, and readability. In this regard, the proposed metric can be used, at least for the target audience in our use case, to substitute human evaluation of the aforementioned explanation aspects. As we evaluate the argumentative explanatory dialogue model, we highlight that all the proposed request types are actively used in information-seeking dialogue settings. Further, a large number of alternative counterfactual explanation requests testifies that user preferences should be taken into consideration when generating automated explanations. All in all, we show that the resulting explanations are appreciated by a good number of users across different application domains. The research results presented in this thesis indicate several directions for future work. From the theoretical point of view, the proposed framework can be extended to introduce causal relations between the predicted data and related features. From the algorithmic point of view, the proposed explanation generation framework should be further extended with a surrogation approach to handle other types of classifiers (including non-interpretable classifiers). Further, different settings may require changes in the dialogue protocol that models the communication process between the explainer and the user for the sake of deeper customisation. Importantly, it seems impossible to achieve the state of human-centric AI without formalising and modelling ethical relations on the basis of the data being processed. Thus, bias mitigation, yet another highly relevant line of research in the XAI community, is another algorithmic challenge to address. Finally, it is of our particular interest to further adapt the designed human evaluation framework for future experiments on explanation, trustworthiness, and satisfaction. Altogether, the prospective extensions of the work presented in this thesis are believed to have great potential for moving forward from XAI to Trustworthy AI.

keywords: Artificial intelligence (AI), Computer science