skrub

Prepping tables for machine learning

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Project description

py_ver pypi_var pypi_dl codecov circleci black

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository! You can also join the discord server.

Website: https://skrub-data.org/

What can skrub do?

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (Joiner, AggJoiner, …), encoding columns (MinHashEncoder, ToCategorical, …), building a pipeline (TableVectorizer, tabular_learner, …), and more.

>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986

>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our examples.

Installation

skrub can easily be installed via pip or conda. For more installation information, see the installation instructions.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue and/or submit a pull request.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Release history Release notifications | RSS feed

This version

0.3.1

Sep 25, 2024

0.3.0

Aug 2, 2024

0.2.0

Jul 2, 2024

0.2.0rc1 pre-release

Jul 2, 2024

0.1.1

May 28, 2024

0.1.0

Dec 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrub-0.3.1.tar.gz (6.3 MB view hashes)

Uploaded Sep 25, 2024 Source

Built Distribution

skrub-0.3.1-py3-none-any.whl (304.2 kB view hashes)

Uploaded Sep 25, 2024 Python 3

Hashes for skrub-0.3.1.tar.gz

Hashes for skrub-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`b745cca583732f23c9d410e2ca220f4f3bddb71e6549925ab89aa6ee9d3d55a5`
MD5	`b2050a91106383605640b763c1fd5cdb`
BLAKE2b-256	`0efed9d6be2e27e939ed8b6f68f846b2da438653af74b232039ef3cf9d1291b8`

Hashes for skrub-0.3.1-py3-none-any.whl

Hashes for skrub-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0495ced71f569894b6fbf5239bfae5a4bc839743c36eeeb96d45b370d2bdb4f6`
MD5	`e59cc3d1a10e3c9257874dd4de8e6548`
BLAKE2b-256	`f6da97bfd38b20cfc72ad2cf8e85681d5207b41cec3d6504e4d0f2cfe5b33612`