Library to download and use the latest set of TLDs and public multi label domain suffixes from IANA and ICANN
Project description
Overview
FQDN Parser (Fully Qualified Domain Name Parser) is a library used to (surprise) parse FQDNs into their component parts, including subdomains, domain names, and the public suffix.
It also provides additional contextual metadata about the domain’s TLD including:
International TLDs in both unicode and puny code format
The TLD type: generic, generic-restricted, country-code, sponsored, test, infrastructure, and host_suffix (.onion)
The date the TLD was registered with ICANN
In the case of multi-label effective TLDs, is it public like
.co.uk
which is owned by a Registrar or private like.duckdns.org
which is owned by a private companyIf the TLD (or any label in the FQDN) is puny code encoded, the ascii’ification of the unicode. This can be useful for identifying registrable domains that use unicode characters that are very similar to ascii characters used by legitimate domains, a common phishing technique.
The TLD metadata can be used as contextual features for machine learning models that generate predictions about domain names and FQDNs.
Data sources used by FQDN Parser:
TLD metadata comes from the IANA Root Zone Database
Multi-label suffix data comes from the Mozilla Public Suffix List
The first time fqdn_parser is run, it will perform two http calls to the links above to pull down the latest ICANN and
Public Suffix List information. This may take a few seconds to pull the data down, parse, and persist into a cache file.
Subsequent calls to fqdn_parser will use the existing cache file and be much faster. The cache file can be managed via
arguments to the Suffixes
class constructor.
Note: As of the last commit there are 9 country code TLDs listed in the Mozilla Public Suffix List that are not listed in the IANA Root Zone Database for some reason. These TLDs are added to the parsing cache file, but you will see a warning for each TLD
WARNING: 澳门 not in IANA root zone database. Adding to list of TLDs
Terminology
Coming up with a consistent naming convention for each specific part of a FQDN can get a little inconsistent and confusing.
Take for example somedomain.co.jp
; many people would call somedomain
the second level domain, or SLD,
but actually the 2nd level domain is .co
and somedomain
is the 3rd level domain. But since
most domain names have only 2 levels a lot of people have standardized on SLD. But when writing code logic to parse FQDNs
it is way more clear to be pedantic about naming.
This library uses a very specific naming convention in order to be explicitly clear about what every label means.
tld
- the actual top level domain of the FQDN. This is the domain that is controlled by IANA.
effective_tld
- this is the full domain suffix, which can be made up of 1 to many labels. The effective TLD is the thing a person chooses to register a domain under and is controlled by a Registrar, or in the case of private domain suffixes the company that owns the private suffix (like DuckDNS).
registrable_domain
- this is the full domain name that a person registers with a Registrar and includes the effective tld.
registrable_domain_host
- this is the label of the registrable domain without the effective tld. Most people call this the second level domain, but as you can see this can get confusing.
fqdn
(Fully Qualified Domain Name) - this is the full list of labels.
pqdn
(Partially Qualified Domain Name) - this is the list of sub-domains in a FQDN, not including the registrable domain and the effective TLD.
To give a concrete example of these names, take the FQDN test.integration.api.somedomain.co.jp
tld
- jp
effective_tld
- co.jp
registrable_domain
- somedomain.co.jp
registrable_domain_host
- somedomain
fqdn
- test.integration.api.somedomain.co.jp
pqdn
- test.integration.api
Doesn’t tldextract do this for me? How is fqdn_parser different?
tldextract is a great library if all you need to do is to parse a FQDN to get it’s subdomain, domain, or full suffix.
But fqdn_parser adds a bit more contextual metadata about each TLD/suffix, as well as supports punycoded labels within FQDNs
Usage Examples
Parse the registrable domain host from a FQDN:
from fqdn_parser.suffixes import Suffixes
suffixes = Suffixes(read_cache=True)
fqdn = "login.mail.stuffandthings.co.uk"
result = suffixes.parse(fqdn)
print(result.registrable_domain_host)
Private Suffixes
The “Public Suffix List” (https://publicsuffix.org/list/public_suffix_list.dat) lists all known public domain suffixes, including both single and multi-label TLDs (.com vs .co.uk).
It also has a section of “Private Suffixes”. These are not considered TLDs, but instead are domain names privately owned by companies that people can get subdomains under. A good example of this are Dynamic DNS companies. For example, duckdns.org is a Dynamic DNS provider and you can register subdomains under duckdns.org.
Private Suffixes can be identified by inspecting the ParsedResult.private_suffix
property.
Example:
api.fake_aws_login.duckdns.org
tld
- org
effective_tld
- org
registrable_domain
- duckdns.org
registrable_domain_host
- duckdns
private_suffix
- duckdns.org
fqdn
- api.fake_aws_login.duckdns.org
pqdn
- api.fake_aws_login
A more complex example, using the private suffix cdn.prod.atlassian-dev.net
assets.some_company.cdn.prod.atlassian-dev.net
tld
- net
effective_tld
- net
registrable_domain
- atlassian-dev.net
registrable_domain_host
- atlassian-dev
private_suffix
- cdn.prod.atlassian-dev.net
fqdn
- assets.some_company.cdn.prod.atlassian-dev.net
pqdn
- assets.some_company
Install
To install via Pypi
pip install fqdn-parser
To Do Wish List
A lot of the suffixes listed in https://publicsuffix.org/list/public_suffix_list.dat are not actually recognized TLDs, but are suffixes used for Dynamic DNS (https://en.wikipedia.org/wiki/Dynamic_DNS). At some point I’d like parse that information and to pull out Dynamic DNS suffixes from actual TLDs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fqdn_parser-2.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9732e0811f75271eebfc36459c74e5c3073a937e7b3e5e615e406fdd4913445b |
|
MD5 | 164dd89321b718b0fe6083b81a8ed157 |
|
BLAKE2b-256 | 0cc0629c5d3438ef625773ab9b86166b19325a7a7228451bcdc81be60a5aa4fd |