Skip to main content

A package to make NLP easy, fast, and fun

Project description

Subclip 0.0.2

A package to make NLP fast and easy for beginners.

  • Efficient text prediction
  • Text pairing, equivalent to that of NLTK's n-gram.
  • Syllable Identification
  • Find frequencies of words in given text
  • Find matching words in two arrays

I still have a lot of plans for this package, for that reason, there would be a lot of frequent updates in the near future. The updates would include optimizations & more functions, so stay tuned.

Install

pip install subclip

Usage

First import the program using:

import subclip

Predict

A function that predicts the next x number of words based on the given string and phrase

Parameters

The function's parameters are:

predict(string, phrase, n=0, case_insensitive=False)
  • String: Main text
  • Phrase: The key phrase (prompt). The function would try to predict what would come after the given phrase.
  • n: The number of words it would return. It's automomatically set to 0, which would return all predictions regardless of their corresponding word counts.
  • case_insensitive: Set this to True if you want to.

Actual usage

So, let's try to use this.

string="I am a string. I am also a human being, but most importantly, I am a string."
print(predict(string, "I am", n=1))

This would output

{'a': 2, 'also': 1}

But, if you change the n value,

print(predict(string, "I am", n=2))

It would output

{'a string.': 2, 'also a': 1}

Pair

This function splits a string into pairs of strings.

Parameters

pair(string, n)
  • string is the string you're trying to split into pairs
  • n stands for the number of strings in each pair. (Equivalent to that of the n value in n-gram)

Usage

Let's set our string to:

string="Sometimes, I just go out and eat sand. I don't know why"

Don't ask. Let's turn this into pairs of 2:

print(pair(string, 2))

Which outputs

[['Sometimes,', 'I'], ['I', 'just'], ['just', 'go'], ['go', 'out'], ['out', 'and'], ['and', 'eat'], ['eat', 'sand.'], ['sand.', 'I'], ['I', "don't"], ["don't", 'know'], ['know', 'why']]

Identify Syllables

subclip.syllables("carbonmonoxide")

This outputs:

car-bon-mon-ox-ide

But take note that this only works with lowercase strings.

Countwords

Parameters

The function's parameters are:

countwords(string, case_insensitive=False)

Change that to True if you want it to be case-insensitive.

Actual usage

Get yourself a nice string

string = "Sometimes I wonder, 'Am I stupid?' then I realize, yeah. yeah, I am stupid."

Then put it in the function:

x = subclip.countwords(string)
print(x)

It should print:

{'I': 4, 'Sometimes': 1, 'wonder,': 1, "'Am": 1, "stupid?'": 1, 'then': 1, 'realize,': 1, 'yeah.': 1, 'yeah,': 1, 'am': 1, 'stupid.': 1}

Matchingwords

A function that finds & counts matching words in two strings

Actual usage

So in this case, our strings are:

string1, string2 = "God, I love drawing, drawing is my favourite thing to do", "God, I hate drawing, drawing is my least favourite thing to do"

If we run this through matchingwords, we would get:

{'God,': 1, 'I': 1, 'drawing,': 1, 'drawing': 1, 'is': 1, 'my': 1, 'favourite': 1, 'thing': 1, 'to': 1, 'do': 1}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Subclip-0.0.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Subclip-0.0.2-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file Subclip-0.0.2.tar.gz.

File metadata

  • Download URL: Subclip-0.0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Subclip-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7a92c10a508b42bba32364433092f7554667ccb5a950c8d2f2b9cf5942044465
MD5 7562ec35d7b76c93c645e158b1fe863d
BLAKE2b-256 d7ef6c2fd42535a89e9c092d0c5efd4763a637d7a2baae3e896b7bec474d8fa5

See more details on using hashes here.

File details

Details for the file Subclip-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: Subclip-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for Subclip-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2ee838a00aaf560ddf64d0844d3e27758098ffa729813f3da5a9b40472284c5f
MD5 fea4e116e276e4aed6ac1c654f73c0a1
BLAKE2b-256 4282f8a8ea0f6c678663a1cdeefc23319aa0ba5042480fc043de14a35a6256d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page