Skip to main content

clean_plot simplifies cleaning text files for creation of embeddings and making plots from it

Project description

Welcome to clean_plot

The library simplifies cleaning text files for creation of embeddings and making plots from it

Install

pip install clean-plot

How to use

The library contains easy to use methods for cleaning text, tokenizing and lemmatizing sentences. These sentences can then be easily fed to a sentence encoder to create sentence embeddings.

fname = '../files/dummy.txt'
text = get_data(fname)
print(text)
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
signed by the clergyman, the clerk, the undertaker,
and the chief mourner. Scrooge signed it: and
Scrooge's name was good upon 'Change, for anything he
chose to put his hand to. Old Marley was as dead as a
door-nail.

Mind! I don't mean to say that I know, of my
own knowledge, what there is particularly dead about
a door-nail. I might have been inclined, myself, to
regard a coffin-nail as the deadest piece of ironmongery
in the trade. But the wisdom of our ancestors
is in the simile; and my unhallowed hands
shall not disturb it, or the Country's done for. You
will therefore permit me to repeat, emphatically, that
Marley was as dead as a door-nail.
sentences = make_sentences(text)
sentences
['MARLEY was dead: to begin with.',
 'There is no doubt whatever about that.',
 'The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner.',
 "Scrooge signed it: and Scrooge's name was good upon 'Change, for anything he chose to put his hand to.",
 'Old Marley was as dead as a door-nail.',
 'Mind!',
 "I don't mean to say that I know, of my own knowledge, what there is particularly dead about a door-nail.",
 'I might have been inclined, myself, to regard a coffin-nail as the deadest piece of ironmongery in the trade.',
 "But the wisdom of our ancestors is in the simile; and my unhallowed hands shall not disturb it, or the Country's done for.",
 'You will therefore permit me to repeat, emphatically, that Marley was as dead as a door-nail.']
no_punctuations = []
for sentence in sentences:
    new_sentence = remove_punctuations(sentence)
    no_punctuations.append(new_sentence)
no_punctuations
['MARLEY was dead to begin with',
 'There is no doubt whatever about that',
 'The register of his burial was signed by the clergyman the clerk the undertaker and the chief mourner',
 'Scrooge signed it and Scrooge s name was good upon Change for anything he chose to put his hand to',
 'Old Marley was as dead as a door nail',
 'Mind',
 'I don t mean to say that I know of my own knowledge what there is particularly dead about a door nail',
 'I might have been inclined myself to regard a coffin nail as the deadest piece of ironmongery in the trade',
 'But the wisdom of our ancestors is in the simile and my unhallowed hands shall not disturb it or the Country s done for',
 'You will therefore permit me to repeat emphatically that Marley was as dead as a door nail']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clean_plot-0.0.11.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

clean_plot-0.0.11-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file clean_plot-0.0.11.tar.gz.

File metadata

  • Download URL: clean_plot-0.0.11.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for clean_plot-0.0.11.tar.gz
Algorithm Hash digest
SHA256 e929c7f35152f8ae87bf0b8d0b8ba0d0088dee151b97a940ef230b7c4c14d090
MD5 b510f8138f1a16f1b32d1d2b15a1cf1b
BLAKE2b-256 f583997593af1a6d72ef1f4b86bc657f43ccb94a4db684a05225c264d1add643

See more details on using hashes here.

File details

Details for the file clean_plot-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: clean_plot-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for clean_plot-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 9dbd1a7604568a84b415134863af6ad05aaf141875fc29bd24a66d22314c6e69
MD5 9357e9a2c01512681ce5845ded50e31c
BLAKE2b-256 4b9692a4fb37542ff257ef34dd03b905483dd6f9e9afe9a55b7f0754cfd6d12e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page