Pseudonymize email content in Romance languages
Project description
mailcom
Tool to parse email body from email text (eml file), and retains only the text, with names removed, for French of Spanish emails.
Installation
Install using
python -m pip install mailcom
You will also need to download the French and Spanish models for spaCy and Stanza using the provided script - run this in the terminal:
./get-models.sh
For an overview over the available languages and models, check the spaCy website.
Usage
The package uses spaCy for sentencizing, based on the default language models, and transformers for NER recognition.
Currently, you have to set the language and eml file directory manually at the top of parse.py
; the default directory is data/in
. Then run python parse.py
. After the run, the output can be found in data/out
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mailcom-0.0.1.tar.gz
.
File metadata
- Download URL: mailcom-0.0.1.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98ee29553951d8568dec8d311f9384afc00a2a7511df6243dc3f6bb421781d88 |
|
MD5 | b8cd3aeb397d7e24e3e7f46e9e3c0476 |
|
BLAKE2b-256 | a7a1d1107ca1c4f44a1f6e7887cdc40a463439160cbca5d73a069f8b77ace4e2 |
File details
Details for the file mailcom-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: mailcom-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c241a16dee6006efe29dce404e708fdc970d1b4631932a11de3f9002a91107c2 |
|
MD5 | 678736bfa20e2f53247f66e3e5154d14 |
|
BLAKE2b-256 | 6c3981f6accfc420bf220e4a51b15d3fb57f79b2f27a7ae4ba4cc076085d0b72 |