Skip to main content

A package for pseudonymizing travel-specific PII data

Project description

TravelSpecificPIIPseudonymization

TravelSpecificPIIPseudonymization is a Python-based tool designed to detect and pseudonymize personally identifiable information (PII) in travel-related documents. It leverages custom recognizers and fake data generators for specific aviation industry entities such as Passenger Name Records (PNRs), e-tickets, flight numbers, airline and aircraft codes, and airport codes. The pseudonymization tool ensures the privacy and confidentiality of sensitive data commonly found in aviation documents, making it suitable for airline companies, travel agencies, and other organizations dealing with flight information.

Features

  • Pseudonymization of Travel-Related PII: Detects and pseudonymizes common entities in airline and aviation-related documents such as:

    • Passenger Name Records (PNRs)
    • E-tickets
    • Aircraft registration numbers
    • IATA/ICAO aircraft and airline codes
    • IATA/ICAO/FAA airport codes
    • Contact information (phone numbers, email addresses)
  • Custom Recognizers: Includes custom patterns to detect industry-specific codes such as flight numbers, e-ticket prefixes, and more.

  • Faker Integration: Replaces sensitive data with synthetic data using the Faker library, while also providing flexibility for adding custom fake data generators (e.g., generating fake PNRs, e-tickets).

  • Reversible Pseudonymization: The tool provides mapping between the original and pseudonymized data, allowing for reversible pseudonymization when required (useful for testing or regulatory purposes).

Project Structure

The project is organized into several Python modules for better scalability and maintenance:

Files Overview:

  1. tspii.py: This is the main entry point for the project. It handles document input, pseudonymization execution, and saving the pseudonymized document to a file.

  2. pseudonymizer.py: Contains the core class CustomPseudonymizer, which performs the pseudonymization process and stores the depseudonymization mappings. It integrates with custom recognizers and synthetic data generators.

  3. recognizers.py: Defines custom recognizers to detect specific PII entities in travel-related documents (e.g., PNR, e-tickets, IATA/ICAO codes).

  4. generators.py: Implements custom fake data generators that create realistic synthetic data for aviation-related entities (e.g., generating fake PNRs or e-tickets).

  5. tests: Contains unit tests for the CustomPseudonymizer class, validating the accuracy of the pseudonymization process and ensuring that sensitive information is properly anonymized while maintaining a correct mapping for potential deanonymization.

Usage

To use the tool, follow these steps:

  pip install travel-pii-anonymisation

  travel_pii

Contributing

Contributions to improve the tool are welcome! Feel free to open issues for bugs or feature requests, or submit pull requests for enhancements.

Acknowledgements

This project utilizes various libraries, including LangChain for document processing and Presidio for PII detection and anonymization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

travel_pii_anonymisation-0.2.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

travel_pii_anonymisation-0.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file travel_pii_anonymisation-0.2.tar.gz.

File metadata

  • Download URL: travel_pii_anonymisation-0.2.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for travel_pii_anonymisation-0.2.tar.gz
Algorithm Hash digest
SHA256 9c38e4cb38273e110e803424024fd843b59652064623c519bfbc5525336bdb95
MD5 f746e214ef21aac2047335ad54e8eeae
BLAKE2b-256 9ca9bfc23d9de65e31236a2bbdc59a8a9727f370f38910223a67bb0b0e5ae754

See more details on using hashes here.

File details

Details for the file travel_pii_anonymisation-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for travel_pii_anonymisation-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce85f6cd4e26d2a66c03e8e642275801b3c0902555c1de35c62ac2dec70b6e51
MD5 fe6c6f1f9d726e7efc6345fcdda1ce24
BLAKE2b-256 08b80e070570b446f350f98d24e6f7d56c3a62d96a96eef1b4ae8836bd559488

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page