No project description provided

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Track, version, compare and review your data and models.

Quick Access

Documentation Watch Demo Quick example Get Instant Help Sign Up for free

Intro

PureML is an open-source version control for machine learning.

Quick start
How it works
Demo
Main Features
Tutorials
Core design principles
Core abstractions
Why to get involved

Quick start

You can install and run PureML using pip.

Using `pip`

Install PureML
```
pip install pureml
```

How it works

Just add a few lines of code. You don't need to change the way you work.

PureML is a Python library that uploads metadata to S3.

Generating Data Lineage

Load Data

@load_data(name='loading data')
def loading_data():
    
    return pd.read_csv('churn.csv')

Transform Data

@transformer(name='fill missing values')
def fill_missing_values(df):
    return df.fillna()
    

@transformer(name='encode ordinal')
def encode_ordinal(df):
    col_ord = ['state', 'phone number']
    df_ord = df[col_ord]
    feat = OrdinalEncoder().fit_transform(df_ord)    
    df[col_ord] = feat
    
    return df

@transformer(name='encode binary')
def encode_binary(df):

    df['voice mail plan'] = df['voice mail plan'].map({'yes':1, 'no':0})
    df['international plan'] = df['international plan'].map({'yes':1, 'no':0})
    df['churn'] = df['churn'].map({True:1, False:0})

    return df

@dataset(name='telecom churn', parent='encode binary')
def build_dataset():
    df = loading_data()

    df = fill_missing_values(df)

    df = encode_ordinal(df)

    df = encode_binary(df)

    return df

df = build_dataset()

This is how generated data lineage will look like in the UI

Demo

Live demo

Build and run a PureML project to create data lineage and a model with our demo colab link.

Demo video (2 min)

PureML quick start demo

</iframe>

_{Click the image to play video}

Main Features


Data Lineage	Automatic generation of data lineage
Dataset Versioning	Automatic Semantic Versioning of datasets
Model Versioning	Automatic Semantic Versioning of models
Comparision	Comparing different versions of models or datasets
Branches (Coming Soon)	Separation between experimentation and production ready models using branches
Review (Coming Soon)	Review and approve models, and datasets to production ready branch

Tutorials

Core design principles


Easy developer experience	An intuitive open source package aimed to bridge the gaps in data science teams
Engineering best practices built-in	Integrating PureML functionalities in your code doenot disrupt your workflow
Object Versioning	A reliable object versioning mechanism to track changes to your datasets, and models
Data is a first-class citizen	Your data is secure. It will never leave your system.
Reduce Friction	Have access to operations performed on data using data lineage without having to spend time on lengthy meetings

Core abstractions

These are the fundamental concepts that PureML uses to operate.


Project	A data science project. This is where you store datasets, models, and their related objects. It is similar to a github repository with object storage.
Lineage	Contains a series of transformations performed on data to generate a dataset.
Data Versioning	Versioning of the data should be comprehensible to the user and should encapsulate the changes in the data, its creation mechanism, among others.
Model Versioning	Versioning of the model should be comprehensible to the user and should encapuslate the changes in training data, model architecture, hyper parameters.
Fetch	This functionality is used to fetch registered Models, and Datasets.

Why to get involved

Version control is much more common in software than in machine learning. So why isn’t everyone using Git? Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t record information automatically from inside a training script.

GitHub wasn’t designed with data as a core project component. This along with a number of other differences between AI and more traditional software projects makes GitHub a bad fit for artificial intelligence, contributing to the reproducibility crisis in machine learning.

From manually tracking models to git based versioning systems that do not follow an intuitive versioning mechanism, there is no standardized way to track objects. Using these mechanisms, it is hard enough to track or get your model from a month ago running, let alone of a teammates!

We are trying to build a version control system for machine learning objects. A mechanism that is object dependant and intuitive for users.

Lets build this together. If you have faced this issue or have worked out a similar solution for yourself, please join us to help build a better system for everyone.

Reporting Bugs

To report any bugs you have faced while using PureML package, please

report it in Discord channel
Open an issue

Contributing and developing

Lets work together to improve the features for everyone.

Work with mutual respect.

License

See the Apache-2.0 file for licensing information.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.6

Mar 31, 2024

0.4.5

Mar 30, 2024

0.4.4

Nov 27, 2023

0.4.3

Nov 19, 2023

0.4.2 yanked

Nov 18, 2023

0.4.1

Jul 13, 2023

0.4.0

Jul 13, 2023

0.3.9

Jul 10, 2023

0.3.8

May 14, 2023

0.3.7

May 5, 2023

0.3.6

Apr 27, 2023

0.3.5

Apr 18, 2023

0.3.4

Apr 15, 2023

0.3.3

Apr 11, 2023

0.3.2

Apr 10, 2023

0.3.1

Apr 8, 2023

0.3.0

Apr 7, 2023

0.2.3

Feb 28, 2023

0.2.2

Feb 17, 2023

0.2.1

Feb 15, 2023

0.2.0

Feb 10, 2023

This version

0.1.6

Feb 2, 2023

0.1.4

Jan 9, 2023

0.1.3.0

Dec 31, 2022

0.1.2.2

Dec 14, 2022

0.1.2.1

Dec 9, 2022

0.1.2.0

Dec 5, 2022

0.1.1

Oct 8, 2022

0.1.0

Sep 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pureml-0.1.6.tar.gz (36.5 kB view hashes)

Uploaded Feb 2, 2023 Source

Built Distribution

pureml-0.1.6-py3-none-any.whl (59.6 kB view hashes)

Uploaded Feb 2, 2023 Python 3

Hashes for pureml-0.1.6.tar.gz

Hashes for pureml-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`033a1c2d7a672b8c912a2b1fa473b74b8553fc7ea95d79d1ed64f26508c5ef1b`
MD5	`6759a9b1723a403dc8ec1868394b2ee8`
BLAKE2b-256	`b3014f1f4f5bcbc0f2a1326a3a58c46bc5269e5ee20505ec2828f9654b99a1ec`

Hashes for pureml-0.1.6-py3-none-any.whl

Hashes for pureml-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9353d58008d36c1f9425fdb262846c200ee550af9ca53e7fdb6cbef71d01f5bb`
MD5	`a046fbd560b5d48200f1f2a4c31f5905`
BLAKE2b-256	`146bd2c42aa92729659a8957ae4b4377d03f74db05e94091346db39b39552725`

pureml 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Track, version, compare and review your data and models.

Quick Access

Intro

Quick start

Using `pip`

How it works

Generating Data Lineage

Demo

Live demo

Demo video (2 min)

Main Features

Tutorials

Core design principles

Core abstractions

Why to get involved

Reporting Bugs

Contributing and developing

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

pureml 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Track, version, compare and review your data and models.

Quick Access

Intro

Quick start

Using pip

How it works

Generating Data Lineage

Demo

Live demo

Demo video (2 min)

Main Features

Tutorials

Core design principles

Core abstractions

Why to get involved

Reporting Bugs

Contributing and developing

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Using `pip`