Highly optimized Domain Name Extraction library written in C++
Project description
Highly optimized domain name extraction library written in C++
Table of Contents
About The Project
PyDomainExtractor is a library intended for parsing domain names into their parts fast. The library is written in C++ to achieve the highest performance possible.
Built With
Performance
Test was measured on a file containing 10 million random domains from various TLDs
| Library | Function | Time |
|---|---|---|
| tldextract | __call__ | 67.0s |
| publicsuffix2 | publicsuffix2.get_tld | 25.8s |
| PyDomainExtractor | pydomainextractor.extract | 2.76s |
Prerequisites
In order to compile this package you should have GCC, libidn2, and Python development package installed.
- Fedora
sudo dnf install python3-devel libidn2-devel gcc-c++
- Ubuntu 18.04
sudo apt install python3-dev libidn2-dev g++-8
Installation
pip3 install PyDomainExtractor
Usage
The usual use case:
import pydomainextractor
# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
pydomainextractor.load()
pydomainextractor.extract('google.com')
>>> {
>>> 'subdomain': '',
>>> 'domain': 'google',
>>> 'suffix': 'com'
>>> }
# Loads a custom SuffixList data. Should follow PublicSuffixList's format.
pydomainextractor.load(
'tld\n'
'custom.tld\n'
)
pydomainextractor.extract('google.com')
>>> {
>>> 'subdomain': 'google',
>>> 'domain': 'com',
>>> 'suffix': ''
>>> }
pydomainextractor.extract('google.custom.tld')
>>> {
>>> 'subdomain': '',
>>> 'domain': 'google',
>>> 'suffix': 'custom.tld'
>>> }
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Gal Ben David - wavenator@gmail.com
Project Link: https://github.com/wavenator/PyDomainExtractor
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PyDomainExtractor-0.2.2.tar.gz.
File metadata
- Download URL: PyDomainExtractor-0.2.2.tar.gz
- Upload date:
- Size: 98.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd2d506310186d382f5c897d26389e4d2e6a2300fb775a4e7093ad6741586407
|
|
| MD5 |
096db90fa5e61890d1d154048f4e7735
|
|
| BLAKE2b-256 |
c9df29c6d4253cbb5bc219e6926755c0d02704a15f974e467a8a5223af69117d
|