Skip to main content

PetHarbor is a Python package designed for anonymizing datasets using either a pre-trained model or a hash-based approach. It provides two main classes for anonymization: lite and advance.

Project description

PetHarbor

PetHarbor is a Python package designed for anonymizing datasets using either a pre-trained model or a hash-based approach. It provides two main classes for anonymization: lite and advance.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

Lite Anonymization

The lite anonymization class uses a hash-based approach to anonymize text data. Here is an example of how to use it:

Arguments

Usage

from petharbor.lite import annonymise

lite = petharbor_lite(
    dataset_path="testing/data/out/predictions.csv",
    hash_table="petharbor/data/pet_names_hashed.txt",
    salt="shared_salt",
    text_column="item_text",
    cache=True,
    output_dir="testing/data/out",
)
lite.annonymise()

Advanced Anonymization

The advance anonymization class uses a pre-trained model to anonymize text data. Here is an example of how to use it:

Arguments

Usage

from petharbor.advance import annonymise

    advance = petharbor_advanced(
        dataset_path="testing/data/out/predictions.csv",
        model_path="testing/models/best-model.pt",
        text_column="item_text",
        cache=True,
        logs="logs/",
        output_dir="testing/data/out/predictions.csv",
    )
    advance.annonymise()

Configuration

Device Configuration

The device (CPU or CUDA) can be configured by passing the device parameter to the anonymization classes. If not specified, the package will automatically configure the device.

Caching

Both methods have a caching feature such that records already annonnymised will not be annonymised again. Therefore, after the initial application of the model downstream annonymisation should be quicker. We apply a 'annonymised' flag to the dataset, if a record is marked '1' in this field we skip it, and add it back to the complete dataset at the end.

Logging

Logging is set up using the logging module. Logs will provide information about the progress and status of the anonymization process.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

petharbor-0.1.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

petharbor-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file petharbor-0.1.0.tar.gz.

File metadata

  • Download URL: petharbor-0.1.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for petharbor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7bdf060d7d20620922e925fe37e7bb46a6491a0c50416c0993d1797e8492664e
MD5 5542d6f228fc57d47b18c9b5e17c538c
BLAKE2b-256 89a7a16906f0395ed65651347ae6a7f393a1226ce6329de327082a024d340598

See more details on using hashes here.

File details

Details for the file petharbor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: petharbor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for petharbor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a6b5faa02be06083bdd81258bb0658bcb7d7813dce387b605c3aa710261f050
MD5 4237acdeb6e93f196f9fc8500cda7ed8
BLAKE2b-256 095273a423ddc2ebfdccfe9e9a0d66f46c0203e08d5337939056ef82f87fffe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page