Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.
Project description
Molda
Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.
The current version supports many algorithms denoted by the following classes:
- TTestVectorizer
- TficfVectorizer
- ObservedExpectedVectorizer
- LTUVectorizer
- Gref94Vectorizer
- ATCVectorizer
These classes are based on the sci-kit learn's CountVectorizer.
You need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:
from Tficf import TficfVectorizer
corpus = np.array([
"Even though I enjoyed watching that, This is bullshit",
"I really enjoyed watching that",
"I resent watching this video"
])
y = [1, 0, 1]
v = TficfVectorizer()
v.fit(corpus, y)
v.transform(['Hello, there'])
Also, you can include the vectorizer in a pipeline, like in the following example:
pipe = Pipeline([
('vectorizer', TficfVectorizer()),
('scaler', StandardScaler(with_mean=False)),
('estimator', SVC())
])
pipe.fit(corpus, y)
pipe.score(corpus, y)
pipe.predict(['This is wonderful'])
Molda works with Pandas DataFrames too:
df = pd.read_csv('../irony-labeled.csv')
df = df.dropna()
corpus_ = df['comment_text'].to_numpy()
y_ = df['label'].to_numpy()
v = TficfVectorizer()
v.fit(corpus_, y_)
v.transform(['Hello, there', 'Goodbye'])
With love from Sigmoid.
We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file molda-0.1.2.tar.gz
.
File metadata
- Download URL: molda-0.1.2.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b95a25dea0cb813ac33b3afdf8fd3de7ad334000c1de2abb586577babf3029f3 |
|
MD5 | aaba176e8926593bc1663f9a4019d190 |
|
BLAKE2b-256 | 59ec8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e |
File details
Details for the file molda-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: molda-0.1.2-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7617192ab291e3db475d11532e2163163a85928a19e64c2c6217469a7542162f |
|
MD5 | d374007bf5384aebd5c67d956c124b76 |
|
BLAKE2b-256 | c4ce5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e |