(named) entity recognition

The corpus consists of (i) the translation and manual curation of documents with tmVar3 annotations (Wei et al., 2022), which include PubMed summaries, to which associated diseases and symptoms were added; and (ii) the manual annotation of PubMed summaries in Spanish.

MultiCoNER-ES

Read more about MultiCoNER-ES
Log in or register to post comments

MULTICONER is a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets. This dataset is designed to represent contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, and long-tail entity distributions.

SocialDisNER

Read more about SocialDisNER
Log in or register to post comments

The goal of SocialDisNER is the automatic recognition of disease mentions in tweets.

LivingNER

Read more about LivingNER
Log in or register to post comments

DIANN-2018-ES

Read more about DIANN-2018-ES
Log in or register to post comments

The corpus is a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain collected between 2017 and 2018. It is divided into two disjoined parts: training set (80%) and test set (20%). It is annotated with disabilities and negations and their scope.