Tool for extracting emails from pdf and docx files. (Designed especially for resumes)
Project description
PyEmailExtractor 
PyEmailExtractor is a tool designed specifically for extracting emails from PDF and DOCX files, with a focus on resumes. It provides a convenient way to extract email addresses from these document formats, which can be useful for various applications, such as recruitment, data analysis, or contact management.
Features
- Extract email addresses from PDF files.
- Extract email addresses from DOCX files.
- Designed specifically for resumes, ensuring accurate email extraction.
- Simple and easy-to-use.
Installation
You can install PyEmailExtractor using pip:
pip install PyEmailExtractor
Example Usage
from PyEmailExtractor import extract_emails
dir = "/home/username/Downloads/resumes"
list_emails = extract_emails(dir)
print(list_emails)
- The above example will print the parsed emails in a
list.
Requirements
PyEmailExtractor requires the following dependencies:
- docx2txt==0.8
- lxml==4.9.2
- PyPDF2==3.0.1
- python-docx==0.8.11
- PyMuPDF==1.22.5
- pytesseract
Contributing
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. License
PyEmailExtractor is released under the MIT License. See the LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PyEmailExtractor-0.2.1.tar.gz.
File metadata
- Download URL: PyEmailExtractor-0.2.1.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
836e45a0041adec62143ea33449f70ff1889abf1fb1cfdbdff4d6a022942bede
|
|
| MD5 |
a26a94a25137db56884a9a09374874a8
|
|
| BLAKE2b-256 |
b293d8ef97080ebd0d5620e3ef925702674f1498e5cb790fc4d0aae80695913d
|