Skip to main content

A generic interface for datasets and Machine Learning models

Project description

A generic interface for datasets and Machine Learning models

PyPI Python_version License DOI


instancelib provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method Instructions
pip Install from PyPI via pip install instancelib.
Local Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

  • python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
  • text_explainability. A generic explainability architecture for explaining text machine learning models
  • text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instancelib-0.5.2.tar.gz (69.8 kB view details)

Uploaded Source

Built Distribution

instancelib-0.5.2-py3-none-any.whl (108.1 kB view details)

Uploaded Python 3

File details

Details for the file instancelib-0.5.2.tar.gz.

File metadata

  • Download URL: instancelib-0.5.2.tar.gz
  • Upload date:
  • Size: 69.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for instancelib-0.5.2.tar.gz
Algorithm Hash digest
SHA256 cb911b313cd0bd56ef64f603d62f0019e4968ef6cdf400e4eaac6e3a2e18adaa
MD5 24eb6ecb86585e2d477bc6e1d487f005
BLAKE2b-256 32f464e40e71c69de4fad4cb9bbdfcd33caea2720220966df731d13c50a3462b

See more details on using hashes here.

File details

Details for the file instancelib-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: instancelib-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 108.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for instancelib-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 52effed634c5b6a4eff8084ab5364f76033229a04551b81ff2b456b6e93b4a7d
MD5 7c7ed94423efe12889a43d1f9be480e5
BLAKE2b-256 909bdb51980e3e1866139d9859ee90058ab021468f07efc620d2877d8393a335

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page