Highly optimized Domain Name Extraction library written in C++
Project description
Highly optimized domain name extraction library written in C++
Table of Contents
About The Project
PyDomainExtractor is a library intended for parsing domain names into their parts fast. The library is written in C++ to achieve the highest performance possible.
Built With
Performance
Test was measured on a file containing 10 million random domains from various TLDs
Library | Function | Time | Improvement Factor |
---|---|---|---|
tldextract | __call__ | 67.0s | 1.0x |
publicsuffix2 | publicsuffix2.get_tld | 25.8s | 2.6x |
PyDomainExtractor | pydomainextractor.extract | 2.76s | 24.3x |
Prerequisites
In order to compile this package you should have GCC, libidn2, and Python development package installed.
- Fedora
sudo dnf install python3-devel libidn2-devel gcc-c++
- Ubuntu 18.04
sudo apt install python3-dev libidn2-dev g++-9
Installation
pip3 install PyDomainExtractor
Usage
The usual use case:
import pydomainextractor
# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
pydomainextractor.load()
pydomainextractor.extract('google.com')
>>> {
>>> 'subdomain': '',
>>> 'domain': 'google',
>>> 'suffix': 'com'
>>> }
# Loads a custom SuffixList data. Should follow PublicSuffixList's format.
pydomainextractor.load(
'tld\n'
'custom.tld\n'
)
pydomainextractor.extract('google.com')
>>> {
>>> 'subdomain': 'google',
>>> 'domain': 'com',
>>> 'suffix': ''
>>> }
pydomainextractor.extract('google.custom.tld')
>>> {
>>> 'subdomain': '',
>>> 'domain': 'google',
>>> 'suffix': 'custom.tld'
>>> }
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/Intsights/PyDomainExtractor
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PyDomainExtractor-0.2.5.tar.gz
(99.3 kB
view hashes)