A NER Data Preparing Tool
Project description
NER_DATA_PREPROCESSING :
Collecting Data For NER:
this tool helps to create a ner and corefer dataset easily . To train a Token classification and corefer resolution need a dataset. it not like a raw dataset. we want to convert text (sentence) to required format. lets see how this framework/library used in your project. lets go ...
Step - 1:
- install via git
Download :
```bash
git clone https://github.com/rajboopathiking/NER_DATA_PREPROCESSING.git
```
optional (if you already in correct folder)
```bash
cd NER_DATA_PREPROCESSING
```
requirements.txt -->> installation :
```bash
pip install requirements.txt
```
2 ) install via pypi
```bash
pip install ner-data-processor
```
```python
from ner-data-processor.Ner_Data_Preparation import Custom_Ner_Dataset
ner = Custom_Ner_Dataset()
```
Step - 2:
DataSet Format : pandas Dataframe with text(Arun Kumar Jagatramka vs Ultrabulk AS ) and exact word and entity (Arun Kumar Jagatramka - PLAINTIFF)
text | entities
0 Arun Kumar Jagatramka vs Ultrabulk AS on 22 Se... | [Arun Kumar Jagatramka - PLAINTIFF, Ultrabulk ...
1 Author Biren Vaishnav | [Biren Vaishnav - PERSON]
2 The Supreme Court ruled in favor of Jane Smith. | [Supreme Court - LOC, Jane Smith - PLAINTIFF]
3 The Gujarat High Court issued a judgment in Ah... | [Gujarat High Court - ORG, Ahmedabad - LOC]
API Documentation :
output for example only
-
install via Github
-
extract_DataFrame(df) >>
ner = Custom_Ner_Dataset() data = ner.extract_DataFrame(df)
output :
text entities
0 Arun Kumar Jagatramka vs Ultrabulk AS on 22 Se... [(0, 21, PLAINTIFF), (25, 37, Defender), (41, ...
1 Author Biren Vaishnav [(7, 21, PERSON)]
2 The Supreme Court ruled in favor of Jane Smith. [(4, 17, LOC), (36, 46, PLAINTIFF)]
3 The Gujarat High Court issued a judgment in Ah... [(4, 22, ORG), (44, 53, LOC)]
-
to_dataset(data) >>
import pandas as pd import numpy as np df = pd.DataFrame(ner.to_dataset(data))
output :
id tokens ner_tags 0 0 [Arun, Kumar, Jagatramka, vs, Ultrabulk, AS, o... [B-PLAINTIFF, I-PLAINTIFF, I-PLAINTIFF, O, B-D... 1 1 [Author, Biren, Vaishnav] [O, B-PERSON, I-PERSON] 2 8 [The, Supreme, Court, ruled, in, favor, of, Ja... [O, B-LOC, I-LOC, O, O, O, O, B-PLAINTIFF, I-P... 3 9 [The, Gujarat, High, Court, issued, a, judgmen... [O, B-ORG, I-ORG, I-ORG, O, O, O, O, B-LOC, O] -
Create _label_maps to create Huggingface Dataset :
```python
labels = []
for i in df["ner_tags"].tolist():
labels.extend(i)
labels = np.unique(labels).tolist()
labels
```
output :
['B-DATE',
'B-Defender',
'B-LOC',
'B-ORG',
'B-PERSON',
'B-PLAINTIFF',
'I-DATE',
'I-Defender',
'I-LOC',
'I-ORG',
'I-PERSON',
'I-PLAINTIFF',
'O']
-
to_huggingface_dataset(data,labels) >>
dataset = ner.to_huggingface_dataset(df,labels) dataset = dataset.train_test_split(test_size=0.1) dataset
output :
DatasetDict({ train: Dataset({ features: ['id', 'tokens', 'ner_tags'], num_rows: 3 }) test: Dataset({ features: ['id', 'tokens', 'ner_tags'], num_rows: 1 })})
-
coreference_model(text) >>>
ner.coreference_model(text:str)
input : text = "John is Victim. He is Innocent" output : He mentions John it returns in json format which text,mentions,and span ...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ner_data_processor-0.2.tar.gz.
File metadata
- Download URL: ner_data_processor-0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acc165fcb0aaf184854a044e543a63d9059f26b731b916bc1562db985a3bf70a
|
|
| MD5 |
511dc74a7ca8467fb958e5d0c1cff099
|
|
| BLAKE2b-256 |
a89adb16bf68ec6bbec973ce82a8fa3357d23281b493117a4ea4582181460cfc
|
File details
Details for the file ner_data_processor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ner_data_processor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48f946e62b28f5fe484d8943f6af717f9b48d1f92b7d399be3ba0b6498bd8114
|
|
| MD5 |
e103f22fff2df690385a731493c6e00d
|
|
| BLAKE2b-256 |
03ef349b6e5d9cced95abac8485cedd8e719fa1df5b16bd9850c34f2c9d16e53
|
File details
Details for the file ner_data_processor-0.2-py3-none-any.whl.
File metadata
- Download URL: ner_data_processor-0.2-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81030370df1bfd0298798a4084783519705f4365ba38cd6c09ae96bcef340e72
|
|
| MD5 |
bb419fcccff73ddf4069b8b0dea49950
|
|
| BLAKE2b-256 |
0150a8e41759a8cd7474adda2d125709b24f8a0f234057783517d24cc4354cf8
|