Nowadays, large amounts of data are consumed and generated. However, this data has real value to the extent that it can be transformed into relevant and useful information that can be effectively transferred to be taken into account in decision-making processes. In this context, although they are already very common, the tools for communicating the results of these data analyses are still under development. This is where data-to-text (D2T) systems, a discipline that focuses on the automatic generation of text from various numerical or symbolic data sources, are an emerging approach of undoubted usefulness. Within the area of Natural Language Generation (NLG), D2T systems are capable of processing large amounts of numerical data, converting them into texts containing relevant and understandable information for users.
A problem within the NLG field is that the semantics of the terms used are not modeled, especially those that are imprecise, which means that the generated texts lose their naturalness. Therefore, with the aim of handling this imprecision, in the field of fuzzy logic several proposals arise to model the semantics of imprecise terms, among which stand out, due to their importance in human communication, fuzzy linguistic quantifiers. Three objectives have been addressed in this thesis:
To extend and improve the content determination phase in D2T systems to represent imprecise knowledge and intelligent search. For this, we have considered metaheuristic approaches with the aim of obtaining a good compromise between solution quality and computational cost.
Measure and compare the impact of fuzzy quantization method selection to analyze its behavior empirically in the evaluation of fuzzy quantified sentences.
Design a new D2T model to describe time series data, which has been successfully used in two real impact applications: health and environmental information.
Keywords: Natural language generation, Fuzzy Logic, Linguistic Description of Data, Data-to-text systems