Skip to main content

Library to download and use the latest set of TLDs and public multi label domain suffixes from IANA and ICANN

Project description

Note: the API is still being fleshed out and subject to change

Overview

FQDN Parser (Fully Qualified Domain Name Parser) is a library used to (surprise!) parse FQDNs into their component parts, including subdomains, domain names, and the public suffix.

It also provides additional contextual metadata about the domain’s publix suffix including:

  • International TLDs in both unicode and puny code format

  • The TLD type: generic, generic-restricted, country-code, sponsored, test, infrastructure, and host_suffix (.onion)

  • The date the TLD was registered with ICANN

  • In the case of multi-label effective TLDs, is it public like .co.uk which is owned by a Registrar or private like .duckdns.org which is owned by a private company

  • If the TLD (or any label in the FQDN) is puny code encoded, the ascii’ification of the unicode. This can be useful for identifying registrable domains that use unicode characters that are very similar to ascii characters used by legitimate domains, a common phishing technique.

The suffix metadata can be used as contextual features for machine learning models that generate predictions about domain names and FQDNs.

Data sources used by FQDN Parser:

The first time fqdn_parser is run, it will perform two http calls to the links above to pull down the latest ICANN and Public Suffix List information. This may take a few seconds to pull the data down, parse, and persist into a cache file. Subsequent calls to fqdn_parser will use the existing cache file and be much faster. The cache file can be managed via arguments to the Suffixes class constructor.

Terminology

Coming up with a consistent naming convention for each specific part of a FQDN can get a little inconsistent and confusing.

Take for example somedomain.co.jp; many people would call somedomain the second level domain, or SLD, but actually the 2nd level domain is .co and somedomain is the 3rd level domain. But since most domain names have only 2 levels a lot of people have standardized on SLD. But when writing code logic to parse FQDNs it is way more clear to be pedantic about naming.

This library uses a very specific naming convention in order to be explicitly clear about what every label means.

tld - the actual top level domain of the FQDN. This is the domain that is controlled by IANA.

effective_tld - this is the full domain suffix, which can be made up of 1 to many labels. The effective TLD is the thing a person chooses to register a domain under and is controlled by a Registrar, or in the case of private domain suffixes the company that owns the private suffix (like DuckDNS).

registrable_domain - this is the full domain name that a person registers with a Registrar and includes the effective tld.

registrable_domain_host - this is the label of the registrable domain without the effective tld. Most people call this the second level domain, but as you can see this can get confusing.

fqdn (Fully Qualified Domain Name) - this is the full list of labels.

pqdn (Partially Qualified Domain Name) - this is the list of sub-domains in a FQDN, not including the registrable domain and the effective TLD.

To give a concrete example of these names, take the FQDN test.integration.api.somedomain.co.jp

tld - jp

effective_tld - co.jp

registrable_domain - somedomain.co.jp

registrable_domain_host - somedomain

fqdn - test.integration.api.somedomain.co.jp

pqdn - test.integration.api

Doesn’t tldextract do this for me? How is fqdn_parser different?

tldextract is a great library if all you need to do is to parse a FQDN to get it’s subdomain, domain, or full suffix.

But fqdn_parser adds a bit more contextual metadata about each TLD/suffix, as well as supports punycoded labels within FQDNs

Usage Examples

Parse the registrable domain host from a FQDN:

from fqdn_parser.suffixes import Suffixes
suffixes = Suffixes(read_cache=True)
fqdn = "login.mail.stuffandthings.co.uk"
result = suffixes.parse(fqdn)
print(result.registrable_domain_host)

Install

To install via Pypi

pip install fqdn-parser

To Do List

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fqdn_parser-1.0.4.tar.gz (60.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fqdn_parser-1.0.4-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file fqdn_parser-1.0.4.tar.gz.

File metadata

  • Download URL: fqdn_parser-1.0.4.tar.gz
  • Upload date:
  • Size: 60.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for fqdn_parser-1.0.4.tar.gz
Algorithm Hash digest
SHA256 88bc1744f25f2ef9afaebfc44cf22492a786a924d90e953aa9c08d23721beb5f
MD5 364ba6c28a1199d387722137db891ae0
BLAKE2b-256 4f5cdbd57b7bf9ea30194d4badafedf048058aef417d1ef7a20a50d3d24e1815

See more details on using hashes here.

File details

Details for the file fqdn_parser-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: fqdn_parser-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for fqdn_parser-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4cfe7573061c668bfb1778c2afac9c17309c3815b4844366b2897a41668d9627
MD5 b0dd1982c2ca63763b65237463b3d00e
BLAKE2b-256 f58e541772f0c4f2bb70e1864dae84e6830b6779dbfb60afa32ac026d993ab63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page