A typed dataset abstraction toolkit for machine learning projects
Project description
A generic dataset interface for Machine Learning models
instancelib provides a generic architecture for datasets.
© Michiel Bron, 2021
Quick tour
Load dataset: Load the dataset in an environment
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
data_cols=["fulltext"],
label_cols=["label"])
ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any
ins_labels = labels.get_labels(ins)
Dataset manipulation: Divide the dataset in a train and test set
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train) # May be true or false, because of random sampling
Train a model:
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfTransformer
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),
])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
Installation
See installation.md for an extended installation guide.
| Method | Instructions |
|---|---|
pip |
Install from PyPI via pip install instancelib. |
| Local | Clone this repository and install via pip install -e . or locally run python setup.py install. |
Documentation
Full documentation of the latest version is provided at https://instancelib.readthedocs.org.
Example usage
See usage.py to see an example of how the package can be used.
Releases
instancelib is officially released through PyPI.
See CHANGELOG.md for a full overview of the changes for each version.
Citation
@misc{instancelib,
title = {Python package instancelib},
author = {Michiel Bron},
howpublished = {\url{https://github.com/mpbron/instancelib}},
year = {2021}
}
Library usage
This library is used in the following projects:
- python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- text_explainability. A generic explainability architecture for explaining text machine learning models
- text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.
Maintenance
Contributors
- Michiel Bron (
@mpbron)
Todo
Tasks yet to be done:
- Implement support for ONNX models
- Implement support for Python DataLoaders
- Make the external dataset interface more user friendly
- Redesign LabelProvider to support more attribute levels
- CI/CD tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file instancelib-0.3.6.1.tar.gz.
File metadata
- Download URL: instancelib-0.3.6.1.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
160855e9c07ca16b862e1cee12e2c0ec99b94f65634c99a0ebda6f82ed50017a
|
|
| MD5 |
6d9b289a105a16ce529b4c06a2d8b0c6
|
|
| BLAKE2b-256 |
000f1f32cc9f138b916aad802539cc93d274b6a135be315a87f3f448cfe8a756
|
File details
Details for the file instancelib-0.3.6.1-py3-none-any.whl.
File metadata
- Download URL: instancelib-0.3.6.1-py3-none-any.whl
- Upload date:
- Size: 80.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b765fd14e7243de514c60ddef6c267d0d67d03f1c2dd1dd3898ea7f77f6926ac
|
|
| MD5 |
82e1d0cf2c4e15fb6a97874dbabbe146
|
|
| BLAKE2b-256 |
cb94cf9937cdf1f7920e6d2350eb3f492f3d2058f591aec02be6544d2a912a44
|