Skip to main content

A python library for error genration in dataset for machine learning

Project description

pucktrick

Pucktrick is a Python library that provides various utility functions to introduce errors in your dataframe. The name of library is based on Puck. Puck is the name of the elf in the “A midsummer Night’s dream” of William Shakespeare that is very famous to enjoys causing trouble and playing tricks on mortals and other fairies alike.

Features

Pucktrick is organized in modules, one for error type. Each module inludes a function called with the name of module. Every function receives as parameters the dataset to modify, the strategy, and the original dataset, if mode="extended". Functions return two parameters an error (possibile 0 or 1) and the generated dataset.

Version

version 0.5

  • add strategy, a json file where it is possibile to create a error model by specifing the affected features (from one to many), a selection criteria, a bolean predicate that specify a subset of the rows to be corrupted, the mode, the percentage, the distribution function for injection errors.

version 0.4

  • errortype added: missing values

version 0.3 -error type added: duplicated

version 0.2

  • error type inserted: outliers

version 0.1

  • error type inserted: noisy error and inconsistency labels

Installation

You can install pucktrick using pip:

pip install pucktrick

References

Contributing

We welcome contributions from the community. To contribute:

Fork the repository Create a new branch (git checkout -b feature/your-feature) Commit your changes (git commit -am 'Add new feature') Push to the branch (git push origin feature/your-feature) Create a new Pull Request Please ensure your code adheres to our coding standards and includes appropriate tests.

License This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) - see the LICENSE file for details.

Acknowledgements Thanks to the contributors and open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pucktrick-0.5.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pucktrick-0.5.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file pucktrick-0.5.0.tar.gz.

File metadata

  • Download URL: pucktrick-0.5.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pucktrick-0.5.0.tar.gz
Algorithm Hash digest
SHA256 0c46c7b8c1c0ac1f3e4e0a7ab487e76dffce1aeb1fd3f38565c18428343aa153
MD5 422318a329884dc3fcbacfcaeedd1c1a
BLAKE2b-256 0532e8cab9ce999fe9841f85c471d52d2e90d2c86d1bea8f4e6834e82bbc20a9

See more details on using hashes here.

File details

Details for the file pucktrick-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pucktrick-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pucktrick-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 319343df8054126bc2129fa9e0f0bf78b2415a0205750066bfd620ccfac78454
MD5 3167cb8d6dfb5c79ed998d443530319c
BLAKE2b-256 df677d4b589b01fd3df5855866eb5e750fdb33e1ed649f271f840a0605f841ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page