Skip to main content

Highly optimized Domain Name Extraction library written in C++

Project description

Logo

Highly optimized domain name extraction library written in C++

license Python Build PyPi

Table of Contents

About The Project

PyDomainExtractor is a library intended for parsing domain names into their parts fast. The library is written in C++ to achieve the highest performance possible.

Built With

Performance

Test was measured on a file containing 10 million random domains from various TLDs

Library Function Time
tldextract __call__ 67.0s
publicsuffix2 publicsuffix2.get_tld 25.8s
PyDomainExtractor pydomainextractor.extract 2.76s

Prerequisites

In order to compile this package you should have GCC, libidn2, and Python development package installed.

  • Fedora
sudo dnf install python3-devel libidn2-devel gcc-c++
  • Ubuntu 18.04
sudo apt install python3-dev libidn2-dev g++-8

Installation

pip3 install PyDomainExtractor

Usage

The usual use case:

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
pydomainextractor.load()

pydomainextractor.extract('google.com')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'com'
>>> }

# Loads a custom SuffixList data. Should follow PublicSuffixList's format.
pydomainextractor.load(
    'tld\n'
    'custom.tld\n'
)

pydomainextractor.extract('google.com')
>>> {
>>>     'subdomain': 'google',
>>>     'domain': 'com',
>>>     'suffix': ''
>>> }

pydomainextractor.extract('google.custom.tld')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'custom.tld'
>>> }

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - wavenator@gmail.com

Project Link: https://github.com/wavenator/PyDomainExtractor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyDomainExtractor-0.2.2.tar.gz (98.4 kB view details)

Uploaded Source

File details

Details for the file PyDomainExtractor-0.2.2.tar.gz.

File metadata

  • Download URL: PyDomainExtractor-0.2.2.tar.gz
  • Upload date:
  • Size: 98.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for PyDomainExtractor-0.2.2.tar.gz
Algorithm Hash digest
SHA256 cd2d506310186d382f5c897d26389e4d2e6a2300fb775a4e7093ad6741586407
MD5 096db90fa5e61890d1d154048f4e7735
BLAKE2b-256 c9df29c6d4253cbb5bc219e6926755c0d02704a15f974e467a8a5223af69117d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page