Parser made to convert lists of emails and urls into JSON formatted, CSV formatted, or plain text strings or files

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Pyrolysate

Pyrolysate is a Python library and CLI tool for parsing and validating URLs and email addresses. It breaks down URLs and emails into their component parts, validates against IANA's official TLD list, and outputs structured data in JSON, CSV, or text format.

The library offers both a programmer-friendly API and a command-line interface, making it suitable for both development integration and quick data processing tasks. It handles single entries or large datasets efficiently using Python's generator functionality, and provides flexible input/output options including file processing with custom delimiters.

Features

URL Parsing

Extract scheme, subdomain, domain, TLD, port, path, query, and fragment components
Support for complex URL patterns including ports, queries, and fragments
Support for IP addresses in URLs
Support for both direct input and file processing via CLI or API
Output as JSON, CSV, or text format through CLI or API

Email Parsing

Extract local, mail server, and domain components
Support for plus addressing (e.g., user+tag@domain.com)
Support for both direct input and file processing via CLI or API
Output as JSON, CSV, or text format through CLI or API

Top Level Domain Validation

Automatic updates from IANA's official TLD list
Local TLD file caching for offline use
Fallback to common TLDs if both online and local sources fail

Flexible Input/Output

Process single or multiple entries
Support for government domain emails (.gov.tld)
Custom delimiters for file input
Multiple output formats with .txt format as default (JSON, CSV, text)
Pretty-printed or minified JSON output
Console output or file saving options
Memory-efficient processing of large datasets using Python generators
Support for compressed input files:
- ZIP archives (processes all text files within .zip)
- GZIP (.gz)
- BZIP2 (.bz2)
- LZMA (.xz, .lzma)

Developer Friendly

Type hints for better IDE support
Comprehensive docstrings
Modular design for easy integration
Command-line interface for quick testing

API Reference

Email Class

Method	Parameters	Description
`parse_email(email_str)`	`email_str: str`	Parses single email address
`parse_email_array(emails)`	`emails: list[str]`	Parses list of email addresses
`to_json(emails, prettify=True)`	`emails: str\|list[str]`, `prettify: bool`	Converts to JSON format
`to_json_file(file_name, emails, prettify=True)`	`file_name: str`, `emails: list[str]`, `prettify: bool`	Converts and saves JSON to file
`to_csv(emails)`	`emails: str\|list[str]`	Converts to CSV format
`to_csv_file(file_name, emails)`	`file_name: str`, `emails: list[str]`	Converts and saves CSV to file

URL Class

Method	Parameters	Description
`parse_url(url_str, tlds=[])`	`url_str: str`, `tlds: list[str]`	Parses single URL
`parse_url_array(urls, tlds=[])`	`urls: list[str]`, `tlds: list[str]`	Parses list of URLs
`to_json(urls, prettify=True)`	`urls: str\|list[str]`, `prettify: bool`	Converts to JSON format
`to_json_file(file_name, urls, prettify=True)`	`file_name: str`, `urls: list[str]`, `prettify: bool`	Converts and saves JSON to file
`to_csv(urls)`	`urls: str\|list[str]`	Converts to CSV format
`to_csv_file(file_name, urls)`	`file_name: str`, `urls: list[str]`	Converts and saves CSV to file

Miscellaneous

Method	Parameters	Description
`file_to_list(input_file_name, delimiter='\n')`	`input_file_name: str`, `delimiter: str`	Parses input file into python list by delimiter
`get_tlds_from_iana`		Fetches latest top level domains from IANA
`get_tlds_from_local`	`path_to_tlds_file: str`	Fetches tlds from local file. Defaults to project's local file if path not specified

CLI Reference

Argument	Type	Value when argument is omitted	Description
`target`	`str`	`None`	Email or URL string(s) to process
`-u`, `--url`	`flag`	`False`	Specify URL input
`-e`, `--email`	`flag`	`False`	Specify Email input
`-i`, `--input_file`	`str`	`None`	Input file name with extension
`-o`, `--output_file`	`str`	`None`	Output file name without extension
`-c`, `--csv`	`flag`	`False`	Save output as CSV format
`-j`, `--json`	`flag`	`False`	Save output as JSON format
`-np`, `--no_prettify`	`flag`	`False`	Turn off prettified JSON output
`-d`, `--delimiter`	`str`	`'\n'`	Delimiter for input file parsing

Input File Support

Format	Extension	Description
Text	.txt	Plain text files
Log	.log	Plain text log files
CSV	.csv	Comma-separated values
ZIP	.zip	Archives containing text files
GZIP	.gz	GZIP compressed files
BZIP2	.bz2	BZIP2 compressed files
LZMA	.xz, .lzma	LZMA compressed files

Output Types

Email Parse Output

Field	Description	Example
input	Full email	user+tag@gmail.com
local	Part before + or @ symbol	user
plus_address	Optional part between + and @	tag
mail_server	Domain before TLD	gmail
domain	Top-level domain	com

Example output:

{"user+tag@gmail.com":
    {
    "local": "user",
    "plus_address": "tag",
    "mail_server": "gmail",
    "domain": "com"
    }
}

email,local,plus_address,mail_server,domain
user+tag@gmail.com,user,tag,gmail,com

URL Parse Output

Field	Description	Example
scheme	Protocol	https
subdomain	Domain prefix	www
second_level_domain	Main domain	example
top_level_domain	Domain suffix	com
port	Port number	443
path	URL path	blog/post
query	Query parameters	q=test
fragment	URL fragment	section1

Example output:

{"https://www.example.com:443/blog/post?q=test#section1":
    {
    "scheme": "https",
    "subdomain": "www",
    "second_level_domain": "example",
    "top_level_domain": "com",
    "port": "443",
    "path": "blog/post",
    "query": "q=test",
    "fragment": "section1"
    }
}

url,scheme,subdomain,second_level_domain,top_level_domain,port,path,query,fragment
https://www.example.com:443/blog/post?q=test#section1,https,www,example,com,443,blog/post,q=test,section1

🚀 Installation

From PyPI

pip install pyrolysate

For Development

Clone the repository

git clone https://github.com/dawnandrew100/pyrolysate.git
cd pyrolysate

Create and activate a virtual environment

# Using hatch (recommended)
hatch env create

# Or using venv
python -m venv .venv
# Windows
.venv\Scripts\activate
# Unix/MacOS
source .venv/bin/activate

Install in development mode

# Using hatch
hatch run dev

# Or using pip
pip install -e .

Verify Installation

# Using hatch (recommended)
hatch run pyro -u example.com

# Or using the CLI directly
pyro -u example.com

The CLI command pyro will be available after installation. If the command isn't found, ensure Python's Scripts directory is in your PATH.

Usage

Input File Parsing

from pyrolysate import file_to_list

Parse file with default newline delimiter

urls = file_to_list("urls.txt")

Parse file with custom delimiter

emails = file_to_list("emails.csv", delimiter=",")

Supported Outputs

JSON (prettified or minified)
CSV
Text (default)
File output with custom naming
Console output

Email Parsing

from pyrolysate import email

Parse single email

result = email.parse_email("user@example.com")

Parse plus addressed email

result = email.parse_email("user+tag@example.com")

Parse multiple emails

emails = ["user1@example.com", "user2@agency.gov.uk"]
result = email.parse_email_array(emails)

Convert to JSON

json_output = email.to_json("user@example.com")
json_output = email.to_json(["user1@example.com", "user2@example.com"])

Save to JSON file

email.to_json_file("output", "user@example.com")
email.to_json_file("output", ["user1@example.com", "user2@test.org"])

Convert to CSV

csv_output = email.to_csv("user@example.com")
csv_output = email.to_csv(["user1@example.com", "user2@test.org"])

Save to CSV file

email.to_csv_file("output", "user@example.com")
email.to_csv_file("output", ["user1@example.com", "user2@test.org"])

URL Parsing

from pyrolysate import url

Parse single URL

result = url.parse_url("https://www.example.com/path?q=test#fragment")

Parse multiple URLs

urls = ["example.com", "https://www.test.org"]
result = url.parse_url_array(urls)

Convert to JSON

json_output = url.to_json("example.com")
json_output = url.to_json(["example.com", "test.org"])

Save to JSON file

url.to_json_file("output", "example.com")
url.to_json_file("output", ["example.com", "test.org"])

Convert to CSV

csv_output = url.to_csv("example.com")
csv_output = url.to_csv(["example.com", "test.org"])

Save to CSV file

url.to_csv_file("output", "example.com")
url.to_csv_file("output", ["example.com", "test.org"])

Command Line Interface

CLI help

pyro -h

Parse single URL

pyro -u example.com

Parse multiple URLs

pyro -u example1.com example2.com

Parse URLs from file (one per line by default)

pyro -u -i urls.txt

Parse URLs from CSV file with comma delimiter

pyro -u -i urls.csv -d ","

Parse email with plus addressing

pyro -e user+newsletter@example.com

Parse multiple emails and save as JSON

pyro -e user1@example.com user2@example.com -j -o output

Parse URLs from file and save as CSV

pyro -u -i urls.txt -c -o parsed_urls

Parse emails from file with comma delimiter

pyro -e -i emails.txt -d "," -o output

Parse emails with non-prettified JSON output

pyro -e user@example.com -j -np

Parse different file types

# Parse log file
pyro -u -i server.log

# Parse compressed log file
pyro -u -i server.log.gz

# Parse BZIP2 compressed file
pyro -e -i emails.txt.bz2

# Parse ZIP archive containing logs and text files
pyro -u -i archive.zip

Supported Formats

Email Formats

Standard: example@mail.com
Plus Addresses: example+tag@mail.com
Government: example@agency.gov.uk

URL Formats

Basic: example.com
With subdomain: www.example.com
With scheme: https://example.org
With path: example.com/path/to/file.txt
With port: example.com:8080
With query: example.com/search?q=test
With fragment: example.com#section1
IP addresses: 192.168.1.1:8080
Government domains: agency.gov.uk
Full complex URLs: https://www.example.gov.uk:8080/path?q=test#section1

Input File Support

Plain text files (.txt)
Plain text log files (.log)
Comma-separated values (.csv)
ZIP archives containing text files (.zip)
GZIP compressed files (.gz)
BZIP2 compressed files (.bz2)
LZMA compressed files (.xz, .lzma)

ZIP Archive Support

Processes all text files within the archive (.txt, .csv, .log)
Handles nested directories
Continues processing if some files are corrupted
UTF-8 encoding expected for text files

Outputs

Text file (default)
JSON file (prettified or minified)
CSV file
Console output

[!IMPORTANT] This library handles email address comments by removing them from the final output

[!CAUTION]

This library does not specially handle emails containing double quotes. Double quotes are valid in the local part of an email, but many modern email systems either block or mark emails with quotes as spam.

Make sure that requests is installed before running get_tlds_from_iana.

[!WARNING] This library is designed and tested to handle http and https urls. Other forms of url may return undefined results.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.2

May 1, 2026

1.0.1

Mar 2, 2026

1.0.0

Jan 1, 2026

0.14.1

Dec 24, 2025

0.14.0

Dec 20, 2025

0.12.0

Apr 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrolysate-1.0.2.tar.gz (21.5 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyrolysate-1.0.2-py3-none-any.whl (24.4 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file pyrolysate-1.0.2.tar.gz.

File metadata

Download URL: pyrolysate-1.0.2.tar.gz
Upload date: May 1, 2026
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pyrolysate-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`84264dd3071969723c7378567bf9b373dc8c109aa00815daa00241be90ec8cb5`
MD5	`45b1dcf3a8ae08dc8f82f869cf760eb5`
BLAKE2b-256	`a5655ce8965ccadef400a1df9ca3179cd3ecdddc2316b1108d4d6ace0580d562`

See more details on using hashes here.

File details

Details for the file pyrolysate-1.0.2-py3-none-any.whl.

File metadata

Download URL: pyrolysate-1.0.2-py3-none-any.whl
Upload date: May 1, 2026
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pyrolysate-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16cd3b7a5a9e1973fb435c7c58e6d3d10fe4b98adf3f1f0ab26b155a913ee9ca`
MD5	`5b0e0dfd6c81ec0682ff34a77fb905bd`
BLAKE2b-256	`eca02538e991b034a2bf8e227b92516a8df5eadde6ee9696eb564e144dd64df8`

See more details on using hashes here.

pyrolysate 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pyrolysate

Features

URL Parsing

Email Parsing

Top Level Domain Validation

Flexible Input/Output

Developer Friendly

API Reference

Email Class

URL Class

Miscellaneous

CLI Reference

Input File Support

Output Types

Email Parse Output

URL Parse Output

🚀 Installation

From PyPI

For Development

Verify Installation

Usage

Input File Parsing

Parse file with default newline delimiter

Parse file with custom delimiter

Supported Outputs

Email Parsing

Parse single email

Parse plus addressed email

Parse multiple emails

Convert to JSON

Save to JSON file

Convert to CSV

Save to CSV file

URL Parsing

Parse single URL

Parse multiple URLs

Convert to JSON

Save to JSON file

Convert to CSV

Save to CSV file

Command Line Interface

CLI help

Parse single URL

Parse multiple URLs

Parse URLs from file (one per line by default)

Parse URLs from CSV file with comma delimiter

Parse email with plus addressing

Parse multiple emails and save as JSON

Parse URLs from file and save as CSV

Parse emails from file with comma delimiter

Parse emails with non-prettified JSON output

Parse different file types

Supported Formats

Email Formats

URL Formats

Input File Support

ZIP Archive Support

Outputs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata