Validation of a linguistic summarization approach for time series meteorological data

Linguistic summaries of data are brief and precise general textual descriptions of (usually numeric) datasets. Computational methods which generate linguistic summaries have been developed over recent years, and their usefulness has been proved in different application domains. However, means to validate them objectively as well as subjectively by experts are still in early development and need to be explored and discussed in depth. It is therefore a challenging open problem where new proposals of measures for testing/validating the linguistic summaries obtained or new methodologies for assessing its quality can be contributed. A heuristic approach is described for the automatic generation of operative weather forecasts for Galicia (NW Spain), that are built as spatio-temporal linguistic summaries that describe the results of the numerical meteorological prediction models executed at the Galician Weather Service. Summaries involve linguistic values of the cloud coverage meteorological variable, fuzzy quantifiers (‘a few’, ‘many’,...), spatio-temporal references (e.g. ‘the sky will be cloudy in the south coast’). This realm is used as a case of study for proposing new validation measures and quality assessment criteria and procedures that are applied to the obtained summaries in order to confront them with the ones generated manually by the human experts.