A NER Data Preparing Tool

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

NER Data Processor

NER Data Processor is a Python library to help you easily prepare datasets for Named Entity Recognition (NER) and Coreference Resolution tasks. It transforms raw text into formats ready for training token classification models using Hugging Face or other frameworks.

📚 Documentation

📦 Installation

✅ From PyPI (Recommended)

pip install ner-data-processor

🛠️ From GitHub

git clone https://github.com/rajboopathiking/NER_DATA_PREPROCESSING.git
cd NER_DATA_PREPROCESSING
pip install -r requirements.txt

🚀 Getting Started

from ner_data_processor.Ner_Data_Preparation import Custom_Ner_Dataset

ner = Custom_Ner_Dataset()

📊 Dataset Format

Input should be a pandas DataFrame with two columns:

text: Sentence or paragraph
entities: List of labeled entities with their tags

Example:

text	entities
Arun Kumar Jagatramka vs Ultrabulk AS on 22 Sept	[Arun Kumar Jagatramka - PLAINTIFF, Ultrabulk AS - Defender]
Author Biren Vaishnav	[Biren Vaishnav - PERSON]

⚙️ API Overview

`extract_DataFrame(df)`

Convert the annotated DataFrame into span-based entity format.

data = ner.extract_DataFrame(df)

Output:

text	entities
Arun Kumar Jagatramka vs Ultrabulk AS on...	[(0, 21, PLAINTIFF), (25, 37, Defender)]
Author Biren Vaishnav	[(7, 21, PERSON)]

`to_dataset(data)`

Convert span-format data into token-label format for model training.

import pandas as pd
df = pd.DataFrame(ner.to_dataset(data))

Output:

id	tokens	ner_tags
0	[Arun, Kumar, Jagatramka, ...]	[B-PLAINTIFF, I-PLAINTIFF, I-PLAINTIFF, ...]
1	[Author, Biren, Vaishnav]	[O, B-PERSON, I-PERSON]

`create _label_maps`

labels = []
for i in df["ner_tags"]:
    labels.extend(i)
labels = np.unique(labels).tolist()

Output:

['B-DATE', 'B-Defender', 'B-LOC', 'B-ORG', 'B-PERSON', 'B-PLAINTIFF',
 'I-DATE', 'I-Defender', 'I-LOC', 'I-ORG', 'I-PERSON', 'I-PLAINTIFF', 'O']

`to_huggingface_dataset(df, labels)`

Convert your processed DataFrame into Hugging Face DatasetDict.

dataset = ner.to_huggingface_dataset(df, labels)
dataset = dataset.train_test_split(test_size=0.1)

Output:

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'ner_tags'],
        num_rows: 3
    }),
    test: Dataset({
        features: ['id', 'tokens', 'ner_tags'],
        num_rows: 1
    })
})

`coreference_model(text)`

Basic coreference resolution model.

text = "John is Victim. He is Innocent"
result = ner.coreference_model(text)

Output:

{
  "mentions": [
    {
      "text": "He",
      "refers_to": "John",
      "span": [13, 15]
    }
  ]
}

🪪 License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.1

Apr 4, 2025

0.2

Apr 4, 2025

0.0.8

Apr 4, 2025

0.0.7

Apr 4, 2025

0.0.5

Apr 4, 2025

0.0.3

Apr 4, 2025

0.0.2

Apr 4, 2025

0.0.1

Apr 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ner_data_processor-1.1.1.tar.gz (5.7 kB view details)

Uploaded Apr 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ner_data_processor-1.1.1-py3-none-any.whl (6.1 kB view details)

Uploaded Apr 4, 2025 Python 3

File details

Details for the file ner_data_processor-1.1.1.tar.gz.

File metadata

Download URL: ner_data_processor-1.1.1.tar.gz
Upload date: Apr 4, 2025
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for ner_data_processor-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a90927489b3260d649bf9b6dc618ae6c36fa88f79738af3a375c83c72f9cd7b1`
MD5	`b2c2b02731fc54c8c9d6104231c9fd03`
BLAKE2b-256	`1f90a7998876f37782ac30e453676ec17e64cf00165d59caf10e68d43494f762`

See more details on using hashes here.

File details

Details for the file ner_data_processor-1.1.1-py3-none-any.whl.

File metadata

Download URL: ner_data_processor-1.1.1-py3-none-any.whl
Upload date: Apr 4, 2025
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for ner_data_processor-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1373e928a460c4ef492afcf53a0e7bb6344939df91a6c790854c38d70acd858e`
MD5	`adda6334bdd286714abaf466db4247e7`
BLAKE2b-256	`bd1a620328ac96f32730841dc53c972d9f074ddaa40befbc2bbefe6638c9442f`

See more details on using hashes here.

ner-data-processor 1.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

NER Data Processor

📚 Documentation

📦 Installation

✅ From PyPI (Recommended)

🛠️ From GitHub

🚀 Getting Started

📊 Dataset Format

⚙️ API Overview

extract_DataFrame(df)

to_dataset(data)

create _label_maps

to_huggingface_dataset(df, labels)

coreference_model(text)

🪪 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`extract_DataFrame(df)`

`to_dataset(data)`

`create _label_maps`

`to_huggingface_dataset(df, labels)`

`coreference_model(text)`