Assisting Forensic Identification through Unsupervised Information Extraction of Free Text Autopsy Reports: The Disappearances Cases during the Brazilian Military Dictatorship
Anthropological, archaeological, and forensic studies situate enforced disappearance as a strategy associated with the Brazilian military dictatorship (1964–1985), leaving hundreds of persons without identity or cause of death identified. Their forensic reports are the only existing clue for people identification and detection of possible crimes associated with them. The exchange of information among institutions about the identities of disappeared people was not a common practice. Thus, their analysis requires unsupervised techniques, mainly due to the fact that their contextual annotation is extremely time-consuming, difficult to obtain, and with high dependence on the annotator. The use of these techniques allows researchers to assist in the identification and analysis in four areas: Common causes of death, relevant body locations, personal belongings terminology, and correlations between actors such as doctors and police officers involved in the disappearances. This paper analyzes almost 3000 textual reports of missing persons in São Paulo city during the Brazilian dictatorship through unsupervised algorithms of information extraction in Portuguese, identifying named entities and relevant terminology associated with these four criteria. The analysis allowed us to observe terminological patterns relevant for people identification (e.g., presence of rings or similar personal belongings) and automate the study of correlations between actors. The proposed system acts as a first classificatory and indexing middleware of the reports and represents a feasible system that can assist researchers working in pattern search among autopsy reports.
keywords: information extraction, named entity recognition, terminology extraction, autopsy reports