skrub

Prepping tables for machine learning

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Project description

py_ver pypi_var pypi_dl codecov circleci black

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository!

What can skrub do?

skrub provides data assembling tools (TableVectorizer, fuzzy_join…) and encoders (GapEncoder, MinHashEncoder…) for morphological similarities, for which we usually identify three common cases: similarities, typos and variations

See our examples.

What skrub cannot do

Semantic similarities are currently not supported. For example, the similarity between car and automobile is outside the reach of the methods implemented here.

This kind of problem is tackled by Natural Language Processing methods.

skrub can still help with handling typos and variations in this kind of setting.

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation (WIP)

There are currently no PiPy releases. You can still install the package from the GitHub repository with:

pip install git+https://github.com/skrub-data/skrub.git

Dependencies

Dependencies and minimal versions are listed in the setup file.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue and/or submit a pull request.

Additional resources

References

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development :: Libraries

Release history Release notifications | RSS feed

0.3.1

Sep 25, 2024

0.3.0

Aug 2, 2024

0.2.0

Jul 2, 2024

0.2.0rc1 pre-release

Jul 2, 2024

0.1.1

May 28, 2024

This version

0.1.0

Dec 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skrub-0.1.0.tar.gz (148.6 kB view hashes)

Uploaded Dec 13, 2023 Source

Built Distribution

skrub-0.1.0-py3-none-any.whl (141.6 kB view hashes)

Uploaded Dec 13, 2023 Python 3

Hashes for skrub-0.1.0.tar.gz

Hashes for skrub-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`eaa64fe16b2f738c5fc29782c5a367d9936251480b67d08d3837e72df261de85`
MD5	`6e2ddc11342aa77f8f07c63b2752484d`
BLAKE2b-256	`63f6c489e124a16b58a81dcf31ee2cde860f733b0c2d08d66a743deafe9893e3`

Hashes for skrub-0.1.0-py3-none-any.whl

Hashes for skrub-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f369905d2b1acdb5e49f1f7713414205985f6db0bb8546fcebd8583062a4f91`
MD5	`6d92b8c96247a4c890eb15d6bd0f8ea5`
BLAKE2b-256	`21a37b651bdfad2ee94d477b9c36302d74f4c396c40e184c6c7b228adf90f602`

skrub 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description