Skip to main content

A typed dataset abstraction toolkit for machine learning projects

Project description

A generic dataset interface for Machine Learning models

PyPI Python_version License


instancelib provides a generic architecture for datasets.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method Instructions
pip Install from PyPI via pip install instancelib.
Local Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

  • python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
  • text_explainability. A generic explainability architecture for explaining text machine learning models
  • text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Todo

Tasks yet to be done:

  • Implement support for ONNX models
  • Implement support for Python DataLoaders
  • Make the external dataset interface more user friendly
  • Redesign LabelProvider to support more attribute levels
  • CI/CD tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instancelib-0.3.6.1.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instancelib-0.3.6.1-py3-none-any.whl (80.4 kB view details)

Uploaded Python 3

File details

Details for the file instancelib-0.3.6.1.tar.gz.

File metadata

  • Download URL: instancelib-0.3.6.1.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for instancelib-0.3.6.1.tar.gz
Algorithm Hash digest
SHA256 160855e9c07ca16b862e1cee12e2c0ec99b94f65634c99a0ebda6f82ed50017a
MD5 6d9b289a105a16ce529b4c06a2d8b0c6
BLAKE2b-256 000f1f32cc9f138b916aad802539cc93d274b6a135be315a87f3f448cfe8a756

See more details on using hashes here.

File details

Details for the file instancelib-0.3.6.1-py3-none-any.whl.

File metadata

  • Download URL: instancelib-0.3.6.1-py3-none-any.whl
  • Upload date:
  • Size: 80.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for instancelib-0.3.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b765fd14e7243de514c60ddef6c267d0d67d03f1c2dd1dd3898ea7f77f6926ac
MD5 82e1d0cf2c4e15fb6a97874dbabbe146
BLAKE2b-256 cb94cf9937cdf1f7920e6d2350eb3f492f3d2058f591aec02be6544d2a912a44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page