Various Ner dataset for multiple domains and languages
Project description
=============================== Datasets for Entity Recognition
This repository contains datasets from several domains and languages annotated with a variety of entity types, useful for entity recognition and named entity recognition (NER) tasks.
**NOTE: I am actively adding datasets to this list
Datasets for NER in English
.. |check| unicode:: 0x2714
The following table shows the list of datasets for English-language entity recognition (for a list of NER datasets in other languages, see below). The data
directory
contains information on where to obtain those datasets which could not be shared
due to licensing restrictions, as well as code to convert them (if necessary)
to the CoNLL 2003 format. Links to NER corpora in other languages
are also listed below.
============== =============== ======================= =============================== ================================== Dataset Domain License Language Reference ============== =============== ======================= =============================== ================================== CONLL 2003 News en CONLL 2002 en-nl-es ============== =============== ======================= =============================== ==================================
Licenses
Notes on licenses:
The data set are under various type of licences. I do not have the time to worry about the licences now Datasets for NER in other languages
Lexical Named Entity resources
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for ner_dataset-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7304030ed56cd230cdb0ea448c76ece0d1622ed60d0d0a33ba7bacf970f5c70f |
|
MD5 | 092f07790e7937721e5bff6842e20e2f |
|
BLAKE2b-256 | 5adc7fb84d05d84208f902fbd0ecc217befeeb291595cc75b8b2fee6b7bcb6b1 |