Skip to main content

clean_plot simplifies cleaning text files for creation of embeddings and making plots from it

Project description

Welcome to clean_plot

Install

The easiest way to install the library is to simply do a pip install.

pip install clean-plot

Another way to install the library would be to build from source. It is more likely that the released version may contain bugs. The source would get updated more often. If you plan to add features to clean_plot yourself, or want to be on the cutting edge, you can use an editable install.

git clone https://github.com/deven367/clean_plot.git
cd clean_plot
pip install -e . 

How to use

The library contains easy to use methods for cleaning text, tokenizing and lemmatizing sentences. These sentences can then be easily fed to a sentence encoder to create sentence embeddings.

fname = '../files/dummy.txt'
text = get_data(fname)
print(text)
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
signed by the clergyman, the clerk, the undertaker,
and the chief mourner. Scrooge signed it: and
Scrooge's name was good upon 'Change, for anything he
chose to put his hand to. Old Marley was as dead as a
door-nail.

Mind! I don't mean to say that I know, of my
own knowledge, what there is particularly dead about
a door-nail. I might have been inclined, myself, to
regard a coffin-nail as the deadest piece of ironmongery
in the trade. But the wisdom of our ancestors
is in the simile; and my unhallowed hands
shall not disturb it, or the Country's done for. You
will therefore permit me to repeat, emphatically, that
Marley was as dead as a door-nail.

This is a new sentence.
sentences = make_sentences(text)
sentences
(#11) ['MARLEY was dead: to begin with.','There is no doubt whatever about that.','The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner.',"Scrooge signed it: and Scrooge's name was good upon 'Change, for anything he chose to put his hand to.",'Old Marley was as dead as a door-nail.','Mind!',"I don't mean to say that I know, of my own knowledge, what there is particularly dead about a door-nail.",'I might have been inclined, myself, to regard a coffin-nail as the deadest piece of ironmongery in the trade.',"But the wisdom of our ancestors is in the simile; and my unhallowed hands shall not disturb it, or the Country's done for.",'You will therefore permit me to repeat, emphatically, that Marley was as dead as a door-nail.'...]
no_punctuations = []
for sentence in sentences:
    new_sentence = remove_punctuations(sentence)
    no_punctuations.append(new_sentence)
no_punctuations
['MARLEY was dead to begin with',
 'There is no doubt whatever about that',
 'The register of his burial was signed by the clergyman the clerk the undertaker and the chief mourner',
 'Scrooge signed it and Scrooge s name was good upon Change for anything he chose to put his hand to',
 'Old Marley was as dead as a door nail',
 'Mind',
 'I don t mean to say that I know of my own knowledge what there is particularly dead about a door nail',
 'I might have been inclined myself to regard a coffin nail as the deadest piece of ironmongery in the trade',
 'But the wisdom of our ancestors is in the simile and my unhallowed hands shall not disturb it or the Country s done for',
 'You will therefore permit me to repeat emphatically that Marley was as dead as a door nail',
 'This is a new sentence']

Contributing

This library has come into existence because of nbdev (one of many amazing tools made by fast.ai). PRs and Issues are encouraged.

After you clone this repository, please run nbdev_install_git_hooks in your terminal. This sets up git hooks, which clean up the notebooks to remove the extraneous stuff stored in the notebooks (e.g. which cells you ran) which causes unnecessary merge conflicts.

Before submitting a PR, check that the local library and notebooks match. The script nbdev_diff_nbs can let you know if there is a difference between the local library and the notebooks.

If you made a change to the notebooks in one of the exported cells, you can export it to the library with nbdev_build_lib or make clean_plot.

If you made a change to the library, you can export it back to the notebooks with nbdev_update_lib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clean_plot-0.0.12.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

clean_plot-0.0.12-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file clean_plot-0.0.12.tar.gz.

File metadata

  • Download URL: clean_plot-0.0.12.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for clean_plot-0.0.12.tar.gz
Algorithm Hash digest
SHA256 898966fc499e5d39073a525e2ca835504ea43818ba628b1db7aa6bdd98f7c42c
MD5 1fad3993411f94ace3a8579fb1806afd
BLAKE2b-256 ce0372d1f2113e0ec06a787a8953d6028491d9af432466d0ee0b66f8a49bb338

See more details on using hashes here.

File details

Details for the file clean_plot-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: clean_plot-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for clean_plot-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 55607aee4a37dca52a2ff94125543836a404503618f68c0972b5f1a9da3d2047
MD5 3b6c424bc5e4b5d9336154b9dc7cb9b7
BLAKE2b-256 18bee04d2683dc9885f44e7ac69d9fcd6d5546015be761fe38ec5859c705e65d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page