Dealing with hallucination and omission in neural Natural Language Generation: A use case on meteorology

Hallucinations and omissions need to be carefully handled when using neural models for performing Natural Language Generation tasks. In the particular case of data to text applications, neural models are usually trained on large-scale datasets and sometimes generate text with divergences in respect to the data input. In this paper, we show the impact of the lack of domain knowledge in the generation of texts containing input-output divergences through a use case on meteorology. We propose a novel approach for the detection of hallucinations and omissions when using neural models for automatic generation of meteorological descriptions from tabular data. Main contributions are: (i) we provide the research community in Natural Language Generation with new resources (dataset and corpus curated by meteorologists); (ii) we explain how to adapt a Transformer-based model to generate meteorological texts from tabular data; and (iii) we explain how to detect divergences (i.e., hallucinations and omissions) between the output texts and the input data, regarding common sense knowledge.

keywords: Natural language generation, Hallucination, Omission, Neural models