Descripciones linguisticas de fenomenos complejos: Aplicaciones a Big Data (Linguistic Descriptions of Complex Phenomena: Applications to Big Data)

In a highly connected world, the volume and variety of data is growing and growing. New technologies allow us to acquire and store these vast arrays of data. The aim is to extract valuable knowledge from data. Hopefully, this new knowledge will facilitate our daily life, either at work, at home or at social environments. Accordingly, engineers face the challenge of developing systems for a huge number of potential scenarios and users. Moreover, the man-machine interaction arises as a key cornerstone in this context. One of the most effective ways of human interaction is through natural language. Therefore, it would be highly appreciated if man-machine interaction may be carried out in natural language. Nowadays, automatic text generation is a challenging task that is receiving attention of data scientists. It arises with the aim of generating more human-friendly reports. It is focused on computational systems that automatically process data with the aim of generating understandable information using natural language. These linguistic reports can be seen as a complement to other ways of knowledge representation. They actually reduce the effort of interpreting tables and graphs. The two research lines for text generation –from numerical and symbolic data– are Natural Language Generation for the so-called data-to-text applications and Linguistic Descriptions of Data. The latter is supported by the Computational Theory of Perceptions, introduced by Zadeh in 1999. This theory provides a framework to develop computational systems with the capacity of computing with the meaning of natural language expressions, i.e., with the capacity of computing with imprecise descriptions of the world in a similar way how humans do. The main goal of this dissertation is to contribute with significant advances in the research line related to automatic generation of Linguistic Descriptions of Data. Namely, we focus on the linguistic modeling of complex phenomena. In other words, we focus on developing computational systems ready to describe in natural language the data coming out of the phenomena under study. It is noteworthy this is the third thesis which has been developed in the context of this research line which was born in the Computing with Perceptions Research Unit of the European Centre for Soft Computing. This doctoral thesis is built up on the basis of the outcomes provided by the previous dissertations published in the same research group. It has contributed with the theoretical definition and practical implementation of novel concepts which constitute a significant breakthrough for the underlying research line. The main contributions are summarized as follows: characterizing and measuring the reliability of data; defining new types of computational perception; customizing linguistic reports in accordance with the needs of each specific user; and the possibility of interacting in real time with the user through linguistic commands. This dissertation addresses some technical issues such as dealing with Big Data in real-world problems as well as the development of a novel open source library which makes easier the practical implementation of this type of computational systems. This dissertation presents several illustrative experiments on natural language generation, namely, linguistic reports about (1) the velocity of buses in an urban area; (2) the deforestation in the Amazon region; (3) the perception of comfort in a room; (4) the energy consumption at home; (5) the USA census; and (6) the man-machine interaction in a mobile application for assisting blind people to be framed in a profile photo.

keywords: computing with words, linguistic description of complex phenomena, big data-to-text