Skip to main content

Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.

Project description

Molda

Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.

The current version supports many algorithms denoted by the following classes:

  • TTestVectorizer
  • TficfVectorizer
  • ObservedExpectedVectorizer
  • LTUVectorizer
  • Gref94Vectorizer
  • ATCVectorizer

These classes are based on the sci-kit learn's CountVectorizer.

You need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:

from Tficf import TficfVectorizer

corpus = np.array([
    "Even though I enjoyed watching that, This is bullshit",
    "I really enjoyed watching that",
    "I resent watching this video"
])

y = [1, 0, 1]

v = TficfVectorizer()
v.fit(corpus, y)
v.transform(['Hello, there'])

Also, you can include the vectorizer in a pipeline, like in the following example:

pipe = Pipeline([
            ('vectorizer', TficfVectorizer()),
            ('scaler', StandardScaler(with_mean=False)),
            ('estimator', SVC())
        ])
pipe.fit(corpus, y)
pipe.score(corpus, y)
pipe.predict(['This is wonderful'])

Molda works with Pandas DataFrames too:

df = pd.read_csv('../irony-labeled.csv')
df = df.dropna()

corpus_ = df['comment_text'].to_numpy()
y_ = df['label'].to_numpy()

v = TficfVectorizer()
v.fit(corpus_, y_)
v.transform(['Hello, there', 'Goodbye'])

With love from Sigmoid.

We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molda-0.1.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

molda-0.1.2-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file molda-0.1.2.tar.gz.

File metadata

  • Download URL: molda-0.1.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for molda-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b95a25dea0cb813ac33b3afdf8fd3de7ad334000c1de2abb586577babf3029f3
MD5 aaba176e8926593bc1663f9a4019d190
BLAKE2b-256 59ec8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e

See more details on using hashes here.

File details

Details for the file molda-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: molda-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for molda-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7617192ab291e3db475d11532e2163163a85928a19e64c2c6217469a7542162f
MD5 d374007bf5384aebd5c67d956c124b76
BLAKE2b-256 c4ce5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page