Automatic generation of textual descriptions in data-to-text systems using a fuzzy temporal ontology: Application in air quality index data series

In this paper we present a model based on computational intelligence and natural language generation for the automatic generation of textual summaries from numerical data series, aiming to provide insights which help users to understand the relevant information hidden in the data. Our model includes a fuzzy temporal ontology with temporal references which addresses the problem of managing imprecise temporal knowledge, which is relevant in data series. We fully describe a real use case of application in the environmental information systems field, providing linguistic descriptions about the air quality index (AQI), which is a very well-known indicator provided by all meteorological agencies worldwide. We consider two different data sources of real AQI data provided by the official Galician (NW Spain) Meteorology Agency: (i) AQI distribution in the stations of the meteorological observation network and (ii) time series which describe the state and evolution of the AQI in each meteorological station. Both application models were evaluated following the current standards and good practices of manual human expert evaluation of the Natural Language Generation field. Assessment results by two experts meteorologists were very satisfactory, which empirically confirm that the proposed textual descriptions fit this type of data and service both in content and layout.

keywords: fuzzy linguistic terms, linguistic descriptions of data, Data to text systems, natural language generation