text generation

IberAuTexTification

A dataset generated for the shared task focused on detecting machine-generated text and model attribution in the six main languages of the Iberian Peninsula: Catalan, English, Spanish, Basque, Galician, and Portuguese. The dataset includes human and machine-generated texts in seven domains: Chat, How-to, News, Literary, Reviews, Tweets, and Wikipedia. The generated texts are obtained using six language models: BLOOM-1B1, BLOOM-3B, BLOOM-7B1, Babbage, Curie, and text-davinci-003.

RefutES

The RefutES corpus is a dataset designed for the task of refuting hate speech messages through counter-narratives. It consists of a set of pairs of offensive messages and their respective responses, generated with the aim of being reasoned, respectful, non-offensive, and containing specific and truthful information. The corpus is presented in CSV files with the following columns: