Skip to main content

Prepping tables for machine learning

Project description

skrub logo

py_ver pypi_var pypi_dl codecov circleci black

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository! You can also join the discord server.

Website: https://skrub-data.org/

What can skrub do?

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (Joiner, AggJoiner, …), encoding columns (MinHashEncoder, ToCategorical, …), building a pipeline (TableVectorizer, tabular_learner, …), and explore interactively your data (TableReport).

An animation showing how TableReport works

An animation showing how TableReport works

>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986
>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our examples.

Installation

skrub can easily be installed via pip or conda. For more installation information, see the installation instructions.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue.

If you want to contribute directly to the library, then check the how to contribute page on the website for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrub-0.4.1.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

skrub-0.4.1-py3-none-any.whl (327.6 kB view details)

Uploaded Python 3

File details

Details for the file skrub-0.4.1.tar.gz.

File metadata

  • Download URL: skrub-0.4.1.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for skrub-0.4.1.tar.gz
Algorithm Hash digest
SHA256 2d32267fcae3aec0af187f209039d78b283fe37ddbee112862b7cefc51f0c2d4
MD5 1d492f8569b1a80c9299331e57fe8184
BLAKE2b-256 95b4947b51a9b47fb5301ac14a6759f4d4fc2baa09e0059167de482a5779b822

See more details on using hashes here.

File details

Details for the file skrub-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: skrub-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 327.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for skrub-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 011940ec1a0c79cbaaf0cd18e83aad09f7071011b8e3e2cebe658c8bfa969d64
MD5 e1b49e823425590c8d0ba8833337d71d
BLAKE2b-256 e69ab77226bf12a8690a5d8fa7f1198bc4fdd967dc0138f14549d687ea94daea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page