A NER Data Preparing Tool

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

NER_DATA_PREPROCESSING :

Collecting Data For NER:

this tool helps to create a ner and corefer dataset easily . To train a Token classification and corefer resolution need a dataset. it not like a raw dataset. we want to convert text (sentence) to required format. lets see how this framework/library used in your project. lets go ...

Step - 1:

install via git

Download :

 ```bash
 git clone https://github.com/rajboopathiking/NER_DATA_PREPROCESSING.git
 ```

optional (if you already in correct folder)

 ```bash 
 cd NER_DATA_PREPROCESSING
 ```

requirements.txt -->> installation :

```bash
pip install requirements.txt
```

2 ) install via pypi

```bash
pip install ner-data-processor
```
```python
from ner-data-processor.Ner_Data_Preparation import Custom_Ner_Dataset
ner = Custom_Ner_Dataset()
```

Step - 2:

DataSet Format : pandas Dataframe with text(Arun Kumar Jagatramka vs Ultrabulk AS ) and exact word and entity (Arun Kumar Jagatramka - PLAINTIFF)

  text	                                               |             entities
0	Arun Kumar Jagatramka vs Ultrabulk AS on 22 Se...	  | [Arun Kumar Jagatramka - PLAINTIFF, Ultrabulk ...
1	Author Biren Vaishnav	                              |  [Biren Vaishnav - PERSON]
2	The Supreme Court ruled in favor of Jane Smith.	    |   [Supreme Court - LOC, Jane Smith - PLAINTIFF]
3	The Gujarat High Court issued a judgment in Ah...	  |  [Gujarat High Court - ORG, Ahmedabad - LOC]

API Documentation :

output for example only

install via Github

extract_DataFrame(df) >>

ner = Custom_Ner_Dataset()
data = ner.extract_DataFrame(df)

output :

text	entities
0	Arun Kumar Jagatramka vs Ultrabulk AS on 22 Se...	[(0, 21, PLAINTIFF), (25, 37, Defender), (41, ...
1	Author Biren Vaishnav	[(7, 21, PERSON)]
2	The Supreme Court ruled in favor of Jane Smith.	[(4, 17, LOC), (36, 46, PLAINTIFF)]
3	The Gujarat High Court issued a judgment in Ah...	[(4, 22, ORG), (44, 53, LOC)]

to_dataset(data) >>

import pandas as pd
import numpy as np
df = pd.DataFrame(ner.to_dataset(data))

output :

      id	                                                     tokens	ner_tags
 0	0	[Arun, Kumar, Jagatramka, vs, Ultrabulk, AS, o...	 [B-PLAINTIFF, I-PLAINTIFF, I-PLAINTIFF, O, B-D...
 1	1	[Author, Biren, Vaishnav]	[O, B-PERSON, I-PERSON]
 2	8	[The, Supreme, Court, ruled, in, favor, of, Ja...	  [O, B-LOC, I-LOC, O, O, O, O, B-PLAINTIFF, I-P...
 3	9	[The, Gujarat, High, Court, issued, a, judgmen...	  [O, B-ORG, I-ORG, I-ORG, O, O, O, O, B-LOC, O]

Create _label_maps to create Huggingface Dataset :

```python
labels = []
for i in df["ner_tags"].tolist():
  labels.extend(i)
labels = np.unique(labels).tolist()
labels
```

output :

   ['B-DATE',
 'B-Defender',
 'B-LOC',
 'B-ORG',
 'B-PERSON',
 'B-PLAINTIFF',
 'I-DATE',
 'I-Defender',
 'I-LOC',
 'I-ORG',
 'I-PERSON',
 'I-PLAINTIFF',
 'O']

to_huggingface_dataset(data,labels) >>

dataset = ner.to_huggingface_dataset(df,labels)
dataset = dataset.train_test_split(test_size=0.1)
dataset

output :

 DatasetDict({
train: Dataset({
    features: ['id', 'tokens', 'ner_tags'],
    num_rows: 3
})
test: Dataset({
    features: ['id', 'tokens', 'ner_tags'],
    num_rows: 1
})

})

coreference_model(text) >>>

ner.coreference_model(text:str)

input : text = "John is Victim. He is Innocent"
output : He mentions John it returns in json format which text,mentions,and span ...

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.1.1

Apr 4, 2025

0.2

Apr 4, 2025

0.0.8

Apr 4, 2025

0.0.7

Apr 4, 2025

0.0.5

Apr 4, 2025

0.0.3

Apr 4, 2025

This version

0.0.2

Apr 4, 2025

0.0.1

Apr 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ner_data_processor-0.0.2.tar.gz (3.5 kB view details)

Uploaded Apr 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ner_data_processor-0.0.2-py3-none-any.whl (2.7 kB view details)

Uploaded Apr 4, 2025 Python 3

File details

Details for the file ner_data_processor-0.0.2.tar.gz.

File metadata

Download URL: ner_data_processor-0.0.2.tar.gz
Upload date: Apr 4, 2025
Size: 3.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for ner_data_processor-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`07aeb3e584508d77d2e6bbaaa8c45f43c286a260ea08e6535995c8e627bd5152`
MD5	`f550af5e18100348ffcde755123b4d34`
BLAKE2b-256	`fe75a908adffe2d10cfe0baf1f2aac98a49d8c67e8be3ac5fa19b83a210e0891`

See more details on using hashes here.

File details

Details for the file ner_data_processor-0.0.2-py3-none-any.whl.

File metadata

Download URL: ner_data_processor-0.0.2-py3-none-any.whl
Upload date: Apr 4, 2025
Size: 2.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for ner_data_processor-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6dc1eb1dbbe103a7b98aa381609d143a7a3f432785f7a1e11b68d92200ac706`
MD5	`c72bc123816ae388561f54269ae6fe58`
BLAKE2b-256	`ef239f6c0f477407d1cb2f84d984f2873b52f07ec85f2c0e47799d1a2eadabc1`

See more details on using hashes here.

ner-data-processor 0.0.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

NER_DATA_PREPROCESSING :

Collecting Data For NER:

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes