Skip to main content

A typed dataset abstraction toolkit for machine learning projects

Project description

A generic dataset interface for Machine Learning models

PyPI Python_version License


instancelib provides a generic architecture for datasets.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method Instructions
pip Install from PyPI via pip install instancelib.
Local Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

  • python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
  • text_explainability. A generic explainability architecture for explaining text machine learning models
  • text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Todo

Tasks yet to be done:

  • Implement support for ONNX models
  • Implement support for Python DataLoaders
  • Make the external dataset interface more user friendly
  • Redesign LabelProvider to support more attribute levels
  • CI/CD tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instancelib-0.3.6.0.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instancelib-0.3.6.0-py3-none-any.whl (80.4 kB view details)

Uploaded Python 3

File details

Details for the file instancelib-0.3.6.0.tar.gz.

File metadata

  • Download URL: instancelib-0.3.6.0.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for instancelib-0.3.6.0.tar.gz
Algorithm Hash digest
SHA256 605ade21b977e3c6c881d1111299145528b105163b474dbd085eb065c96670b1
MD5 a4403b302a9335282377f1d6cf78ac55
BLAKE2b-256 ad039ffb3f67c3894082501e5f58f98ad4190c20660443ae2c84080571ed8ea2

See more details on using hashes here.

File details

Details for the file instancelib-0.3.6.0-py3-none-any.whl.

File metadata

  • Download URL: instancelib-0.3.6.0-py3-none-any.whl
  • Upload date:
  • Size: 80.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for instancelib-0.3.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05ce6e1a438ec6a68ef2eee984f0ba982fd817a50a72d6040e178685df86e7e9
MD5 47c5642197073ff054db59ef23d531fe
BLAKE2b-256 83f4dabc3acc49c2a749624aae339e3991f056cb7969b33079d77ba7c43fbffe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page