Skip to main content

No project description provided

Project description

Logo

A blazingly fast domain extraction library written in Rust

license Python Build PyPi

Table of Contents

About The Project

PyDomainExtractor is a Python library designed to parse domain names quickly. In order to achieve the highest performance possible, the library was written in Rust.

Built With

Performance

Extract From Domain

Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)

Library Function Time
PyDomainExtractor pydomainextractor.extract 1.50s
publicsuffix2 publicsuffix2.get_sld 9.92s
tldextract __call__ 29.23s
tld tld.parse_tld 34.48s

Extract From URL

The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)

Library Function Time
PyDomainExtractor pydomainextractor.extract_from_url 2.24s
publicsuffix2 publicsuffix2.get_sld 10.84s
tldextract __call__ 36.04s
tld tld.parse_tld 57.87s

Installation

pip3 install PyDomainExtractor

Usage

Extraction

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
domain_extractor = pydomainextractor.DomainExtractor()

domain_extractor.extract('google.com')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'com'
>>> }

# Loads a custom SuffixList data. Should follow PublicSuffixList's format.
domain_extractor = pydomainextractor.DomainExtractor(
    'tld\n'
    'custom.tld\n'
)

domain_extractor.extract('google.com')
>>> {
>>>     'subdomain': 'google',
>>>     'domain': 'com',
>>>     'suffix': ''
>>> }

domain_extractor.extract('google.custom.tld')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'custom.tld'
>>> }

URL Extraction

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
domain_extractor = pydomainextractor.DomainExtractor()

domain_extractor.extract_from_url('http://google.com/')
>>> {
>>>     'subdomain': '',
>>>     'domain': 'google',
>>>     'suffix': 'com'
>>> }

Validation

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
domain_extractor = pydomainextractor.DomainExtractor()

domain_extractor.is_valid_domain('google.com')
>>> True

domain_extractor.is_valid_domain('domain.اتصالات')
>>> True

domain_extractor.is_valid_domain('xn--mgbaakc7dvf.xn--mgbaakc7dvf')
>>> True

domain_extractor.is_valid_domain('domain-.com')
>>> False

domain_extractor.is_valid_domain('-sub.domain.com')
>>> False

domain_extractor.is_valid_domain('\xF0\x9F\x98\x81nonalphanum.com')
>>> False

TLDs List

import pydomainextractor


# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.
domain_extractor = pydomainextractor.DomainExtractor()

domain_extractor.get_tld_list()
>>> [
>>>     'bostik',
>>>     'backyards.banzaicloud.io',
>>>     'biz.bb',
>>>     ...
>>> ]

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/Intsights/PyDomainExtractor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pydomainextractor-0.13.9-cp311-none-win_amd64.whl (328.5 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

pydomainextractor-0.13.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.0 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pydomainextractor-0.13.9-cp311-cp311-macosx_11_0_arm64.whl (391.6 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pydomainextractor-0.13.9-cp311-cp311-macosx_10_12_x86_64.whl (400.7 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

pydomainextractor-0.13.9-cp310-none-win_amd64.whl (328.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pydomainextractor-0.13.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.0 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pydomainextractor-0.13.9-cp310-cp310-macosx_11_0_arm64.whl (391.6 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pydomainextractor-0.13.9-cp310-cp310-macosx_10_12_x86_64.whl (400.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

pydomainextractor-0.13.9-cp39-none-win_amd64.whl (328.5 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pydomainextractor-0.13.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pydomainextractor-0.13.9-cp39-cp39-macosx_11_0_arm64.whl (391.7 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pydomainextractor-0.13.9-cp39-cp39-macosx_10_12_x86_64.whl (400.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.12+ x86-64

pydomainextractor-0.13.9-cp38-none-win_amd64.whl (328.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pydomainextractor-0.13.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.1 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pydomainextractor-0.13.9-cp38-cp38-macosx_11_0_arm64.whl (392.0 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pydomainextractor-0.13.9-cp38-cp38-macosx_10_12_x86_64.whl (401.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.12+ x86-64

pydomainextractor-0.13.9-cp37-none-win_amd64.whl (328.5 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

pydomainextractor-0.13.9-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (447.2 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pydomainextractor-0.13.9-cp37-cp37m-macosx_11_0_arm64.whl (391.8 kB view hashes)

Uploaded CPython 3.7m macOS 11.0+ ARM64

pydomainextractor-0.13.9-cp37-cp37m-macosx_10_12_x86_64.whl (400.9 kB view hashes)

Uploaded CPython 3.7m macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page