Skip to main content

A Python package to clean text from META tags using a BERT NER model.

Project description

meta_cleaner

meta_cleaner is a Python package designed to clean text from META tags using XLM-RoBERTa (large-sized model).

trainer.ipynb is a notebook that creates a dataset and a NER model.

Installation

pip install meta-cleaner

or

pip install git+https://github.com/pirr-me/meta_cleaner.git

Install Locally

To install locally in editable mode (for development):

pip install -e .

Usage

from meta_cleaner.cleaner import TextCleaner

text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25)

# CPU
# text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25, device="cpu")

# GPU
# text_cleaner = TextCleaner(model_name='Pirr/xlmr-large-meta-ner-1464', confidence_threshold=0.25, device="cuda")

# Example usage
text = """This is my first story please enjoy it!\nChapter 1\n It was a late evening, we were out for a few drinks and had been chatting for hours. We began to kiss and touched each other. Authors note: Please share this storty on Facebook"""

cleaned_text = text_cleaner.clean_text(text)
print("Cleaned Text:", cleaned_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta_cleaner-0.2.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

meta_cleaner-0.2.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file meta_cleaner-0.2.0.tar.gz.

File metadata

  • Download URL: meta_cleaner-0.2.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.4

File hashes

Hashes for meta_cleaner-0.2.0.tar.gz
Algorithm Hash digest
SHA256 63473750e96ddf4c192e5e72bbd051ade79a489a1d2954ac5cf136bd3e4789a5
MD5 7da9f90a132e862c8a688f924d285bbe
BLAKE2b-256 94c8ab7f985590f4a5b27f6e1ae41387ceaca3e30c1bddaf9b73b8269e673415

See more details on using hashes here.

File details

Details for the file meta_cleaner-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for meta_cleaner-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bad76b62c4f314dd7bbc073b1d6ba1a3d00eca23118ad97fca680ddaa2ae7bf8
MD5 3ffe21dc719c39c5a1fac4a00ff5c498
BLAKE2b-256 2e65a28965939c657631ab8720a78e3b4893bd12a8bc33d14a4b2d2604440070

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page