Smilegate AI Center releases its hate comments datasets for the study … 2022-01-24

■ Smilegate AI Center created the datasets of trolls and hate comments... Around 10,000 datasets selected from 50,000 data

■ Segmenting the types by considering the social context of hate comments... Its potential to be used in various areas such as game communities, chatbots for customer services, and polls    


Smilegate AI Center (Han Woo-jin, the center director) announced that it will release its datasets of trolls and hate comments on the 20th. 

Smilegate AI Center has created datasets of trolls and hate comments to proactively detect and respond, considering the recent widespread of such hate comments in online space, leading to social issues. This project proceeded through the cooperation with Underscore, the knowledge-content startup. 

The trolls and hate comments were collected from the posts on various websites such as portal sites and communities from Jan. 1, 2019, to Jul. 1, 2021. In the collecting process, the characteristics of timeliness and inclination of data related to hate comments were taken into consideration, and about 10,000 datasets were built based on 550,000 data. 

In particular, in the process of building datasets, the data was categorized into 8 subjects such as ‘woman/family,’ ‘sexual minority,’ ‘Man,’ ‘race/nationality,’ ‘age,’ ‘area,’ ‘religion,’ and others.’ Also, the center released the standard model to classify the hate comments and extract them.

The datasets of trolls and hate comments by Smilegate AI Center are expected to be used in various areas in the future. In many areas, including comments in game communities, chatbots for customer services, and polls, users can identify the hate comments about certain objects. Based on the technology obtained by performing continuous R&D, the center plans to advance the technology to find hate comments with higher accuracy.

Meanwhile, the data collected is planned to be released through the GitHub page of Smilegate AI Center in January. 

Director Han Woo-jin in Smilegate AI Center said, “Our AI center is a research institute which studies the problems in the society, which are caused by AI with no ethics as well as the studies related to natural language processing and classification. And we research with responsibility and awareness of the problems. Especially, I hope this data can lay a foundation for the safer use of AI by classifying and preventing the hateful expressions or ethical issues of AI in advance. Smilegate AI Center will continuously try and put its effort to create results and achievements to contribute to the society in addition to its technology.” 

Related Contents