A Python package to clean text from META tags using a BERT NER model.
Project description
meta_cleaner
meta_cleaner
is a Python package designed to clean text from META tags using XLM-RoBERTa (large-sized model).
trainer.ipynb
is a notebook that creates a dataset and a NER model.
Installation
pip install meta-cleaner
or
pip install git+https://github.com/pirr-me/meta_cleaner.git
Install Locally
To install locally in editable mode (for development):
pip install -e .
Usage
from meta_cleaner.cleaner import TextCleaner
text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25)
# CPU
# text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25, device="cpu")
# GPU
# text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25, device="cuda")
# Example usage
text = """This is my first story please enjoy it!\nChapter 1\n It was a late evening, we were out for a few drinks and had been chatting for hours. We began to kiss and touched each other. Authors note: Please share this storty on Facebook"""
cleaned_text = text_cleaner.clean_text(text)
print("Cleaned Text:", cleaned_text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
meta_cleaner-0.2.0.tar.gz
(3.5 kB
view details)
Built Distribution
File details
Details for the file meta_cleaner-0.2.0.tar.gz
.
File metadata
- Download URL: meta_cleaner-0.2.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63473750e96ddf4c192e5e72bbd051ade79a489a1d2954ac5cf136bd3e4789a5 |
|
MD5 | 7da9f90a132e862c8a688f924d285bbe |
|
BLAKE2b-256 | 94c8ab7f985590f4a5b27f6e1ae41387ceaca3e30c1bddaf9b73b8269e673415 |
File details
Details for the file meta_cleaner-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: meta_cleaner-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bad76b62c4f314dd7bbc073b1d6ba1a3d00eca23118ad97fca680ddaa2ae7bf8 |
|
MD5 | 3ffe21dc719c39c5a1fac4a00ff5c498 |
|
BLAKE2b-256 | 2e65a28965939c657631ab8720a78e3b4893bd12a8bc33d14a4b2d2604440070 |