A generic interface for datasets and Machine Learning models
Project description
A generic interface for datasets and Machine Learning models
instancelib
provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.
© Michiel Bron, 2021
Quick tour
Load dataset: Load the dataset in an environment
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
data_cols=["fulltext"],
label_cols=["label"])
ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any
ins_labels = labels.get_labels(ins)
Dataset manipulation: Divide the dataset in a train and test set
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train) # May be true or false, because of random sampling
Train a model:
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),
])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
Installation
See installation.md for an extended installation guide.
Method | Instructions |
---|---|
pip |
Install from PyPI via pip install instancelib . |
Local | Clone this repository and install via pip install -e . or locally run python setup.py install . |
Documentation
Full documentation of the latest version is provided at https://instancelib.readthedocs.org.
Example usage
See usage.py to see an example of how the package can be used.
Releases
instancelib
is officially released through PyPI.
See CHANGELOG.md for a full overview of the changes for each version.
Citation
@misc{instancelib,
title = {Python package instancelib},
author = {Michiel Bron},
howpublished = {\url{https://github.com/mpbron/instancelib}},
year = {2021}
}
Library usage
This library is used in the following projects:
- python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- text_explainability. A generic explainability architecture for explaining text machine learning models
- text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.
Maintenance
Contributors
- Michiel Bron (
@mpbron
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file instancelib-0.5.0.tar.gz
.
File metadata
- Download URL: instancelib-0.5.0.tar.gz
- Upload date:
- Size: 69.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc2794a293e2035f200aee592d4bb4db3c15333d3214a40989a3c61b4199c149 |
|
MD5 | 72127171df527f4deccd56ce20eb5260 |
|
BLAKE2b-256 | 9163d4e33c9db838519161995fb561ae3b19c89849e87fef1ceca70c5f849d10 |
File details
Details for the file instancelib-0.5.0-py3-none-any.whl
.
File metadata
- Download URL: instancelib-0.5.0-py3-none-any.whl
- Upload date:
- Size: 108.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1785ce554e5b88bfadea6cacf6f07bb637183f5731290b136bf1080bc4275f6 |
|
MD5 | 5284b6abbc6089ec89f1cb60cd6aafb8 |
|
BLAKE2b-256 | a916e1f58ce33346f08b92ecd422dbc474e54e3818a292dec3126974937885f8 |