A Python package to clean text from META tags using a BERT NER model.
Project description
meta_cleaner
meta_cleaner
is a Python package designed to clean text from META tags using XLM-RoBERTa (large-sized model).
trainer.ipynb
is a notebook that creates a dataset and a NER model.
Installation
Install via GitHub
To install directly from GitHub:
pip install git+https://github.com/pirr-me/meta_cleaner.git
Install Locally
To install locally in editable mode (for development):
pip install -e .
Usage
from meta_cleaner.cleaner import TextCleaner
model_name = 'Pirr/xlmr-large-meta-ner-1464'
text_cleaner = TextCleaner(model_name, confidence_threshold=0.65)
# Any text, regardless of length
text = """This is my first story please enjoy it!\nChapter 1\n It was a late evening, we were out for a few drinks and had been chatting for hours. We began to kiss and touched each other. Authors note: Please share this storty on Facebook"""
# Clean the text
cleaned_text = text_cleaner.clean_text(text)
print("Cleaned Text:", cleaned_text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
meta_cleaner-0.1.0.tar.gz
(3.4 kB
view details)
Built Distribution
File details
Details for the file meta_cleaner-0.1.0.tar.gz
.
File metadata
- Download URL: meta_cleaner-0.1.0.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbebb0ee9b25e49e5e672536fb2124160ce9b32952eca15bf65f408805743753 |
|
MD5 | 4b9ce508dd3f294903d80344e6302f1b |
|
BLAKE2b-256 | 814468df64ecc57193be602e9469e77cd3646ac2544d8dcb2468e5383a4e268a |
File details
Details for the file meta_cleaner-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: meta_cleaner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfa0606403704b7312c5af407ec5eb2fa2c0b2de0c460c01ba345b8957e380d3 |
|
MD5 | 19fd66adf7aeb6869713f220e265dca6 |
|
BLAKE2b-256 | 7e07837718db4a972845433f67c469bf03b7d0b65437358bc09cf63b03a63962 |