Skip to main content

Prepping tables for machine learning

Project description

skrub logo

py_ver pypi_var pypi_dl codecov circleci black

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository! You can also join the discord server.

Website: https://skrub-data.org/

What can skrub do?

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (Joiner, AggJoiner, …), encoding columns (MinHashEncoder, ToCategorical, …), building a pipeline (TableVectorizer, tabular_learner, …), and more.

>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986
>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our examples.

Installation

skrub can easily be installed via pip or conda. For more installation information, see the installation instructions.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue and/or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrub-0.3.0.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

skrub-0.3.0-py3-none-any.whl (284.6 kB view details)

Uploaded Python 3

File details

Details for the file skrub-0.3.0.tar.gz.

File metadata

  • Download URL: skrub-0.3.0.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for skrub-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ef12c46b69918ec596052586ebf83c041bd0b9d900b2c5a5bd241c00db38249e
MD5 95b1637e476b29f7fc858dc1da9c3f8e
BLAKE2b-256 68b5d99492a0c4f1c79964c42cade7ec3f04af1ec4d7c854c638ea9d343a2b5e

See more details on using hashes here.

File details

Details for the file skrub-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: skrub-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 284.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for skrub-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7becfc7c8101e04124af3d5e1f4bef02b52e9da4131d50543e1c3d26ff9992b4
MD5 172734394370a6e3cfe3278bb6ffca31
BLAKE2b-256 6bd7d6ed2a0b9b283e0422fc50ea178b34be9022c4e64edcc2c5c2dc3779797a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page