Skip to main content

A Python package to clean text from META tags using a BERT NER model.

Project description

meta_cleaner

meta_cleaner is a Python package designed to clean text from META tags using XLM-RoBERTa (large-sized model).

trainer.ipynb is a notebook that creates a dataset and a NER model.

Installation

Install via GitHub

To install directly from GitHub:

pip install git+https://github.com/pirr-me/meta_cleaner.git

Install Locally

To install locally in editable mode (for development):

pip install -e .

Usage

from meta_cleaner.cleaner import TextCleaner

model_name = 'Pirr/xlmr-large-meta-ner-1464'
text_cleaner = TextCleaner(model_name, confidence_threshold=0.65)

# Any text, regardless of length
text = """This is my first story please enjoy it!\nChapter 1\n It was a late evening, we were out for a few drinks and had been chatting for hours. We began to kiss and touched each other. Authors note: Please share this storty on Facebook"""

# Clean the text
cleaned_text = text_cleaner.clean_text(text)

print("Cleaned Text:", cleaned_text)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta_cleaner-0.1.0.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

meta_cleaner-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file meta_cleaner-0.1.0.tar.gz.

File metadata

  • Download URL: meta_cleaner-0.1.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.4

File hashes

Hashes for meta_cleaner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cbebb0ee9b25e49e5e672536fb2124160ce9b32952eca15bf65f408805743753
MD5 4b9ce508dd3f294903d80344e6302f1b
BLAKE2b-256 814468df64ecc57193be602e9469e77cd3646ac2544d8dcb2468e5383a4e268a

See more details on using hashes here.

File details

Details for the file meta_cleaner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for meta_cleaner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bfa0606403704b7312c5af407ec5eb2fa2c0b2de0c460c01ba345b8957e380d3
MD5 19fd66adf7aeb6869713f220e265dca6
BLAKE2b-256 7e07837718db4a972845433f67c469bf03b7d0b65437358bc09cf63b03a63962

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page