Generating Automatic Linguistic Descriptions with Big Data

In highly connected world, the volume and variety of data is growing and growing. The Big Data era opens new challenges to address. Dealing with Big Data, we have identified and analyzed seven issues: (1) scalability, (2) efficient processing, (3) incomplete and inaccurate data, (4) specific domains, (5) relevance of information, (6) levels of detail, and (7) intuitive and effective knowledge representation. The analysis reveals that five of these issues are related to knowledge representation and human perception. Linguistic Descriptions of Complex Phenomena is a technology aimed to compute and generate linguistic reports customized to the user needs. In this paper, we present and describe an approach to Big Data based on this technology that faces the seven issues under study. Namely, we generate linguistic reports from Big Data that fulfill with the user requirements. To evaluate the generated linguistic reports we propose specific evaluation criteria based on the maxims of Grice. We illustrate the usefulness of the proposed solution by presenting a practical experiment based on the census data of the United States of America.

keywords: Linguistic descriptions, Big data, MapReduce, Fuzzy logic