Skip to main content

A typed dataset abstraction toolkit for machine learning projects

Project description

A generic dataset interface for Machine Learning models

PyPI Python_version License


instancelib provides a generic architecture for datasets.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method Instructions
pip Install from PyPI via pip install instancelib.
Local Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

  • python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
  • text_explainability. A generic explainability architecture for explaining text machine learning models
  • text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Todo

Tasks yet to be done:

  • Implement support for ONNX models
  • Implement support for Python DataLoaders
  • Make the external dataset interface more user friendly
  • Redesign LabelProvider to support more attribute levels
  • CI/CD tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instancelib-0.4.8.0.tar.gz (68.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instancelib-0.4.8.0-py3-none-any.whl (106.0 kB view details)

Uploaded Python 3

File details

Details for the file instancelib-0.4.8.0.tar.gz.

File metadata

  • Download URL: instancelib-0.4.8.0.tar.gz
  • Upload date:
  • Size: 68.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.16

File hashes

Hashes for instancelib-0.4.8.0.tar.gz
Algorithm Hash digest
SHA256 f45891c132414935f49bfd1d5fa5f8dd7104afb0fbc66bc2b89e41ea129d884d
MD5 cdda6d0e4217ddadb03b68ee2c925ad1
BLAKE2b-256 58608bdc0b6a56fa7450754d6a1825e9f3b9ef6d93ef094b8babdafd01cbecf6

See more details on using hashes here.

File details

Details for the file instancelib-0.4.8.0-py3-none-any.whl.

File metadata

  • Download URL: instancelib-0.4.8.0-py3-none-any.whl
  • Upload date:
  • Size: 106.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.16

File hashes

Hashes for instancelib-0.4.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ec0f5cf23bf6cd93f8e7fa18fe83841361f257f45070783e5a8af9c495fc179
MD5 cb6ab0679e450162cd7dde042d6ab494
BLAKE2b-256 6a9357793a249aea37e7c386d4fc38143f3bec31471b585d4007807889b217f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page