A package for pseudonymizing travel-specific PII data
Project description
TravelSpecificPIIPseudonymization
TravelSpecificPIIPseudonymization is a Python-based tool designed to detect and pseudonymize personally identifiable information (PII) in travel-related documents. It leverages custom recognizers and fake data generators for specific aviation industry entities such as Passenger Name Records (PNRs), e-tickets, flight numbers, airline and aircraft codes, and airport codes. The pseudonymization tool ensures the privacy and confidentiality of sensitive data commonly found in aviation documents, making it suitable for airline companies, travel agencies, and other organizations dealing with flight information.
Features
-
Pseudonymization of Travel-Related PII: Detects and pseudonymizes common entities in airline and aviation-related documents such as:
- Passenger Name Records (PNRs)
- E-tickets
- Aircraft registration numbers
- IATA/ICAO aircraft and airline codes
- IATA/ICAO/FAA airport codes
- Contact information (phone numbers, email addresses)
-
Custom Recognizers: Includes custom patterns to detect industry-specific codes such as flight numbers, e-ticket prefixes, and more.
-
Faker Integration: Replaces sensitive data with synthetic data using the Faker library, while also providing flexibility for adding custom fake data generators (e.g., generating fake PNRs, e-tickets).
-
Reversible Pseudonymization: The tool provides mapping between the original and pseudonymized data, allowing for reversible pseudonymization when required (useful for testing or regulatory purposes).
Components Overview:
-
tspii.py: This file is the main entry point for the project. It handles document input, pseudonymization execution, and saving the pseudonymized document to a file. You can customize the (de)pseudonymization of your documents by using theReversibleAnonymizerclass. -
reversible_anonymizers: This directory contains theReversibleAnonymizercore class, that can be used to perform reversible pseudonymization. This class provides methods to (de)anonymize documents, and offers options for customization. -
recognizers: Defines custom recognizers to detect specific PII entities in travel-related documents (e.g., PNR, e-tickets, IATA/ICAO codes). -
operators: Defines custom operators that will determine the new value of anonymized sentitive data. It implements custom fake data generators that create realistic synthetic data for aviation-related entities (e.g., generating fake PNRs or e-tickets). -
anonymizers: Defines a custom method that define the process to anonymize sensitive data. -
deanonymizers: Defines a custom method that define the process to deanonymize sensitive data. -
tests: Contains unit tests for the CustomPseudonymizer class, validating the accuracy of the pseudonymization process and ensuring that sensitive information is properly anonymized while maintaining a correct mapping for potential deanonymization.
Usage
To use the tool, follow these steps:
pip install travel-pii-anonymisation
travel_pii
Would you like to (1) Load a document from a file or (2) Use the sample document? Enter 1 or 2:
If you choose 1, you need to specify a file path. (you can use sample.txt in this repository)
Minimum Requirements:
Ensure you have at least 1 GB of free storage and a stable internet connection, as this software requires downloading the en-core-web package which is approximately 500 MB.
Contributing
Contributions to improve the tool are welcome! Feel free to open issues for bugs or feature requests, or submit pull requests for enhancements.
Acknowledgements
This project utilizes various libraries, including LangChain for document processing and Presidio for PII detection and anonymization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file travel_pii_anonymisation-0.3.3.tar.gz.
File metadata
- Download URL: travel_pii_anonymisation-0.3.3.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b40a93e33fab18dc2b9b8218bb838a159d2ee89a7d02b0e5fa0245e1fbea432c
|
|
| MD5 |
f1a5af5c936b7730e41af9efb7e94686
|
|
| BLAKE2b-256 |
5f8d5c366d372bc3aeeb43a8b79e94c5c4156817c3cf9f7b699c7b8a97e2c531
|
File details
Details for the file travel_pii_anonymisation-0.3.3-py3-none-any.whl.
File metadata
- Download URL: travel_pii_anonymisation-0.3.3-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c86eede760ec2dd7c3e72b1c20b3ef8c6df0a23c8fd9d323d58808e3c8a50d93
|
|
| MD5 |
8e285ea7b1e3be300c6a55065995753d
|
|
| BLAKE2b-256 |
2d7ab9905ea40c1b16d12d9004a8404a7a66f5aaf96ec8d2eca7ed377b8c96ce
|