Skip to main content

A generic interface for datasets and Machine Learning models

Project description

A generic interface for datasets and Machine Learning models

PyPI Python_version License DOI


instancelib provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method Instructions
pip Install from PyPI via pip install instancelib.
Local Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

  • python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
  • text_explainability. A generic explainability architecture for explaining text machine learning models
  • text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instancelib-0.5.0.tar.gz (69.5 kB view details)

Uploaded Source

Built Distribution

instancelib-0.5.0-py3-none-any.whl (108.0 kB view details)

Uploaded Python 3

File details

Details for the file instancelib-0.5.0.tar.gz.

File metadata

  • Download URL: instancelib-0.5.0.tar.gz
  • Upload date:
  • Size: 69.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for instancelib-0.5.0.tar.gz
Algorithm Hash digest
SHA256 dc2794a293e2035f200aee592d4bb4db3c15333d3214a40989a3c61b4199c149
MD5 72127171df527f4deccd56ce20eb5260
BLAKE2b-256 9163d4e33c9db838519161995fb561ae3b19c89849e87fef1ceca70c5f849d10

See more details on using hashes here.

File details

Details for the file instancelib-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: instancelib-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 108.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for instancelib-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1785ce554e5b88bfadea6cacf6f07bb637183f5731290b136bf1080bc4275f6
MD5 5284b6abbc6089ec89f1cb60cd6aafb8
BLAKE2b-256 a916e1f58ce33346f08b92ecd422dbc474e54e3818a292dec3126974937885f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page