Skip to main content

Highly optimized Domain Name Extraction library written in C++

Project description

Logo

Highly optimized domain name extraction library written in C++

license Python Build PyPi

Table of Contents

About The Project

PyDomainExtractor is a library intended for parsing domain names into their parts fast. The library is written in C++ to achieve the highest performance possible.

Built With

Performance

Test was measured on a file containing 10 million random domains from various TLDs

Library Function Time
tldextract __call__ 67.0s
publicsuffix2 publicsuffix2.get_tld 25.8s
PyDomainExtractor pydomainextractor.extract 2.76s

Prerequisites

In order to compile this package you should have GCC, libidn2, and Python development package installed.

  • Fedora
sudo dnf install python3-devel libidn2-devel gcc-c++
  • Ubuntu 18.04
sudo apt install python3-dev libidn2-dev g++-8

Installation

pip3 install PyDomainExtractor

Usage

The usual use case:

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
pydomainextractor.load()

pydomainextractor.extract('google.com')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'com'
>>> }

# Loads a custom SuffixList data. Should follow PublicSuffixList's format.
pydomainextractor.load(
    'tld\n'
    'custom.tld\n'
)

pydomainextractor.extract('google.com')
>>> {
>>>     'subdomain': 'google',
>>>     'domain': 'com',
>>>     'suffix': ''
>>> }

pydomainextractor.extract('google.custom.tld')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'custom.tld'
>>> }

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - wavenator@gmail.com

Project Link: https://github.com/wavenator/PyDomainExtractor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for PyDomainExtractor, version 0.2.3
Filename, size File type Python version Upload date Hashes
Filename, size PyDomainExtractor-0.2.3.tar.gz (98.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page