Skip to main content

Query the Public Suffix List (PSL) via DNS and check the PSL status of a domain.

Project description

DNS-based Public Suffix List handling for Python

This Python package provides a PSL class for querying the Public Suffix List (PSL) via the DNS. By utilizing the library, one can retrieve information about the public suffix status of a domain as well as the PSL rules governing it. There is also a corresponding command-line tool, psl-dns_query, enabling convenient use of the library from the shell.

Public suffix information is based on DNS lookups only; no rule matching is performed at lookup time. To make this possible, the PSL rules have been encoded in the DNS itself (currently under the DNSSEC-enabled zone _psl.desec.io). This facilitates easy querying without the need to keep the PSL at hand. The PSL zone is usually updated once a day.

The Parser class (along with the psl-dns_parse command) is used to iterate over a PSL file and convert the ruleset into a list of DNS Resource Record sets for submission to the DNS operator. The tool adds an extra TXT record at the root of the PSL zone, containing the parsing timestamp as well as the PSL file SHA-256 hash for currentness checking.

The package also contains the psl-dns_check command (based on the Checker class) to iterate over a PSL file and query the DNS for each rule encountered, to verify whether the PSL zone contents are in agreement with the file. (Note that DNS caching may cause update delays; after a zone update, you may be receiving outdated information until the TTL of the PSL DNS records expires. To make sure, specify one of the PSL zone's authoritative servers as the resolver argument.)

Usage

Python

The following examples show how to query the PSL via DNS using the PSL class. For advanced use, please refer to the source.

Example use cases for the Parser and Checker classes can be found in the scripts under psl/commands/.

Initialize

>>> from psl_dns import PSL
>>> psl = PSL(resolver='8.8.8.8')

Query public suffix status of a domain (for the rules, see below)

>>> psl.is_public_suffix('com')
True
>>> psl.is_public_suffix('checkip.dedyn.io')
False
>>> psl.is_public_suffix('takatsu.kawasaki.jp')
True
>>> psl.is_public_suffix('www.ikuoufukushi.takatsu.kawasaki.jp')
False
>>> psl.is_public_suffix('city.kawasaki.jp')
False
>>> psl.is_public_suffix('www.library.city.kawasaki.jp')
False

Get the public suffix for a domain

>>> psl.get_public_suffix('com')
'com'
>>> psl.get_public_suffix('checkip.dedyn.io')
'dedyn.io'
>>> psl.get_public_suffix('takatsu.kawasaki.jp')
'*.kawasaki.jp'
>>> psl.get_public_suffix('www.ikuoufukushi.takatsu.kawasaki.jp')
'*.kawasaki.jp'
>>> psl.get_public_suffix('city.kawasaki.jp') # Wildcard exception
'jp'
>>> psl.get_public_suffix('www.library.city.kawasaki.jp') # same
'jp'

If the queried domain has a trailing dot, the dot is preserved in the response. Furthermore, IDDA mode is preserved so that Unicode queries return Unicode responses, and Punycode queries return Punycode responses:

>>> psl.get_public_suffix('www.xn--55qx5d.cn')
'xn--55qx5d.cn'
>>> psl.get_public_suffix('www.公司.cn.')
'公司.cn.'

Get the set of rules applicable for a domain

>>> psl.get_rules('com')
{'com'}
>>> psl.get_rules('checkip.dedyn.io')
{'dedyn.io'}
>>> psl.get_rules('takatsu.kawasaki.jp')
{'*.kawasaki.jp'}
>>> psl.get_rules('www.ikuoufukushi.takatsu.kawasaki.jp')
{'*.kawasaki.jp'}
>>> psl.get_rules('city.kawasaki.jp') # Note wildcard exception
{'jp', '!city.kawasaki.jp', '*.kawasaki.jp'}
>>> psl.get_rules('www.library.city.kawasaki.jp') # same
{'jp', '!city.kawasaki.jp', '*.kawasaki.jp'}

Rules are always returned in Unicode encoding and without a trailing dot, consistent with the encoding in the Public Suffix List itself:

>>> psl.get_rules('www.xn--55qx5d.cn.')
{'公司.cn'}

Rules with inline wildcards

Unfortunately, rules with inline wildcard labels * (i.e. wildcards that are not at the leftmost position) cannot be represented using DNS lookups. Luckily, the PSL does not contain any such rules as of the time of this writing (but this may change).

To demonstrate what would happen in such a case, a few test rules have been added to the PSL zone under the *.wildcard.test domain. (As these rules are made up, they are not included in the PSL checksum calculation.)

When querying the public suffix (status) for a domain that falls into the realm of a wildcard label which acts as an inline label in at least one PSL rule, an UnsupportedRule exception is thrown:

# Query public suffix status
>>> psl.is_public_suffix('unsupported.wildcard.test')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
    raise UnsupportedRule
psl.exceptions.UnsupportedRule: Domain unsupported.wildcard.test is affected by an unsupported rule

# Get the public suffix
>>> psl.get_public_suffix('unsupported.wildcard.test')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
    raise UnsupportedRule
psl.exceptions.UnsupportedRule: Domain unsupported.wildcard.test is affected by an unsupported rule

However, you can retrieve the relevant rules for manual consumption:

# Get the applicable rules
>>> psl.get_rules('unsupported.wildcard.test')
{'*.wildcard.test', '!except.inline.*.wildcard.test', 'inline.*.wildcard.test', '*.inline.*.wildcard.test'}

This behavior applies to the entire DNS subtree that is defined by the first (right-most) wildcard label in the rule.

Command line

psl-dns_query

This is a command-line interface to the PSL class demonstrated in the previous section.

$ psl-dns_query -h
usage: psl-dns_query [-h] [--zone ZONE] [--resolver RESOLVER]
                     [--timeout TIMEOUT] [-l] [-c] [-v]
                     domain

Query the PSL via DNS and check the PSL status of a domain.

Returns the the word "public" or "private", followed by the public
suffix that covers the queried domain. Public Suffix List (PSL) rules
with inline wildcards are not fully supported. If the queried name is
governed by such a rule, the word "unknown" is returned.

Optionally, the set of applicable rules and the PSL checksum can be
displayed.

Exit codes: 0 (public), 1 (private), or 2 (unknown).

positional arguments:
  domain               Domain to query

optional arguments:
  -h, --help           show this help message and exit
  --zone ZONE          PSL zone to use (default: _psl.desec.io)
  --resolver RESOLVER  DNS resolver to use (default: 8.8.8.8)
  --timeout TIMEOUT    DNS query timeout (seconds) (default: 5)
  -l                   Show set of applicable rules (default: False)
  -c                   Show PSL checksum (default: False)
  -v, --verbose        Increase output verbosity (default: 0)
Retrieve status and public suffix
# Plain
$ psl-dns_query com
public com

# Same, followed by the set of relevant rules, in no particular order
$ psl-dns_query www.ck -l
private *
*.ck
!www.ck
*

psl-dns_parse

$ psl-dns_parse -h
usage: psl-dns_parse [-h] [--zone ZONE] [--format FORMAT] [-l] [-v] psl_file

Print rules from a Public Suffix List (PSL) file in DNS RRsets format.

positional arguments:
  psl_file         Path to PSL file

optional arguments:
  -h, --help       show this help message and exit
  --zone ZONE      PSL zone to use (default: _psl.desec.io)
  --format FORMAT  Output format to use (default: deSEC)
  -l               List available formats (default: False)
  -v, --verbose    Increase output verbosity (default: 0)
Convert current PSL file to deSEC RRsets
# Note: This produces very long output
$ time psl-dns_parse <(curl https://publicsuffix.org/list/public_suffix_list.dat) | jq .
[
  {
    "subname": "ac",
    "ttl": 86400,
    "type": "PTR",
    "records": [
      "ac."
    ]
  },
  ... # shortened for readability
  {
    "subname": "",
    "ttl": 86400,
    "type": "TXT",
    "records": [
      "\"1555895008 d205f587d61c6bbf05bec818776da1dd030ce68f2e8912fea732158b9a33cc54\""
    ]
  }
]

real	0m1.262s
user	0m0.475s
sys	0m0.239s

psl-dns_check

$ psl-dns_check -h
usage: psl-dns_check [-h] [--resolver RESOLVER] [--timeout TIMEOUT]
                     [--zone ZONE] [-v]
                     psl_file

Check rules from the Public Suffix List (PSL) via DNS and output
inconsistencies.

positional arguments:
  psl_file             Path to PSL file

optional arguments:
  -h, --help           show this help message and exit
  --resolver RESOLVER  DNS resolver to use (default: 8.8.8.8)
  --timeout TIMEOUT    DNS query timeout (in seconds) (default: 5)
  --zone ZONE          PSL zone to use (default: _psl.desec.io)
  -v, --verbose        Increase output verbosity (default: 0)
Verifying the correctness of the PSL zone
$ time psl-dns_check -v <(curl https://publicsuffix.org/list/public_suffix_list.dat)
... # shortened for readability
INFO:psl:Querying for zone.id._psl.desec.io. TXT
INFO:psl:Querying for zone.id._psl.desec.io. PTR
INFO:psl:Querying for _psl.desec.io. TXT
WARNING:psl:Hash mismatch! Input PSL file appears to differ from remote version.
8684 rules with 3 inconsistencies found

real	13m42.366s
user	0m38.560s
sys	0m8.383s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psl-dns-1.0rc1.tar.gz (14.0 kB view hashes)

Uploaded Source

Built Distribution

psl_dns-1.0rc1-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page