publicsuffixlist implement
Project description
publicsuffixlist
===
[Public Suffix List](https://publicsuffix.org/) parser implementation for Python 2.5+/3.x.
- Compliant with [TEST DATA](http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1)
- Support IDN (unicode or punycoded).
- Support Python2.5+ and Python 3.x
- Shipped with built-in PSL and the updater script.
- Written in Pure Python. No library dependencies.
[![Build Status](https://travis-ci.org/ko-zu/psl.svg?branch=master)](https://travis-ci.org/ko-zu/psl)
Install
===
`publicsuffixlist` can be installed via `pip` or `pip3`.
```
$ sudo pip install publicsuffixlist
```
If you are in a bit old distributions (RHEL/CentOS6.x), you may need to update `pip` itself before install.
```
$ sudo pip install -U pip
```
Usage
===
```python
from publicsuffixlist import PublicSuffixList
psl = PublicSuffixList()
# uses built-in PSL file
psl.publicsuffix("www.example.com") # "com"
# longest public suffix part
psl.privatesuffix("www.example.com") # "example.com"
# shortest domain assigned for a registrant
psl.privatesuffix("com") # None
# None if no private (non-public) part found
psl.publicsuffix("www.example.unknownnewtld") # "unkownnewtld"
# new TLDs are valid public suffix by default
psl.publicsuffix(u"www.example.香港") # u"香港"
# accept unicode
psl.publicsuffix("www.example.xn--j6w193g") # "xn--j6w193g"
# accept punycoded IDNs by default
```
Latest PSL can be passed as a file like line-iterable object.
```python
with open("latest_psl.dat", "rb") as f:
psl = PublicSuffixList(f)
```
Works with both Python 2.x and 3.x.
```
$ python2 setup.py test
$ python3 setup.py test
```
Drop-in compatibility code to replace [publicsuffix](https://pypi.python.org/pypi/publicsuffix/)
```python
# from publicsuffix import PublicSuffixList
from publicsuffixlist.compat import PublicSuffixList
psl = PublicSuffixList()
psl.suffix("www.example.com") # return "example.com"
psl.suffix("com") # return "" rather than None
```
Some convenient methods available.
```python
psl.is_private("example.com") # True
psl.privateparts("aaa.www.example.com") # ("aaa", "www", "example.com")
psl.subdomain("aaa.www.example.com", depth=1) # "www.example.com"
```
Limitation
===
`publicsuffixlist` do NOT provide domain name validation.
In DNS protocol, most of 8-bit characters are acceptable label of domain name. ICANN compliant registries do not accept domain names that have `_` (underscore) but hostname may have. DMARC records, for example.
Users need to confirm the input is valid based on the users' context.
Partially encoded (Unicode-mixed) Punycode is not supported because of very slow Punycode en/decoding and unpredictable encoding of results.
If you are not sure the input is valid Punycode or not, you should do `unknowndomain.encode("idna")` which is idempotence.
ICANN and private suffixes
===
The public suffix list contains both suffixes for ICANN domains and private suffixes. Using the flag `only_icann` the private suffixes can be deactivated:
```
>>> psl = PublicSuffixList()
>>> psl.publicsuffix("example.priv.at")
'priv.at'
>>> psl = PublicSuffixList(only_icann=True)
>>> psl.publicsuffix("example.priv.at")
'at'
```
License
===
- This module is licensed under Mozilla Public License 2.0.
- Public Suffix List maintained by Mozilla Foundation is licensed under Mozilla Public License 2.0.
- PSL testcase dataset is public domain (CC0).
Source / Link
===
- Git repository on GitHub (https://github.com/ko-zu/psl)
- PyPI (https://pypi.python.org/pypi?name=publicsuffixlist&:action=display)
===
[Public Suffix List](https://publicsuffix.org/) parser implementation for Python 2.5+/3.x.
- Compliant with [TEST DATA](http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1)
- Support IDN (unicode or punycoded).
- Support Python2.5+ and Python 3.x
- Shipped with built-in PSL and the updater script.
- Written in Pure Python. No library dependencies.
[![Build Status](https://travis-ci.org/ko-zu/psl.svg?branch=master)](https://travis-ci.org/ko-zu/psl)
Install
===
`publicsuffixlist` can be installed via `pip` or `pip3`.
```
$ sudo pip install publicsuffixlist
```
If you are in a bit old distributions (RHEL/CentOS6.x), you may need to update `pip` itself before install.
```
$ sudo pip install -U pip
```
Usage
===
```python
from publicsuffixlist import PublicSuffixList
psl = PublicSuffixList()
# uses built-in PSL file
psl.publicsuffix("www.example.com") # "com"
# longest public suffix part
psl.privatesuffix("www.example.com") # "example.com"
# shortest domain assigned for a registrant
psl.privatesuffix("com") # None
# None if no private (non-public) part found
psl.publicsuffix("www.example.unknownnewtld") # "unkownnewtld"
# new TLDs are valid public suffix by default
psl.publicsuffix(u"www.example.香港") # u"香港"
# accept unicode
psl.publicsuffix("www.example.xn--j6w193g") # "xn--j6w193g"
# accept punycoded IDNs by default
```
Latest PSL can be passed as a file like line-iterable object.
```python
with open("latest_psl.dat", "rb") as f:
psl = PublicSuffixList(f)
```
Works with both Python 2.x and 3.x.
```
$ python2 setup.py test
$ python3 setup.py test
```
Drop-in compatibility code to replace [publicsuffix](https://pypi.python.org/pypi/publicsuffix/)
```python
# from publicsuffix import PublicSuffixList
from publicsuffixlist.compat import PublicSuffixList
psl = PublicSuffixList()
psl.suffix("www.example.com") # return "example.com"
psl.suffix("com") # return "" rather than None
```
Some convenient methods available.
```python
psl.is_private("example.com") # True
psl.privateparts("aaa.www.example.com") # ("aaa", "www", "example.com")
psl.subdomain("aaa.www.example.com", depth=1) # "www.example.com"
```
Limitation
===
`publicsuffixlist` do NOT provide domain name validation.
In DNS protocol, most of 8-bit characters are acceptable label of domain name. ICANN compliant registries do not accept domain names that have `_` (underscore) but hostname may have. DMARC records, for example.
Users need to confirm the input is valid based on the users' context.
Partially encoded (Unicode-mixed) Punycode is not supported because of very slow Punycode en/decoding and unpredictable encoding of results.
If you are not sure the input is valid Punycode or not, you should do `unknowndomain.encode("idna")` which is idempotence.
ICANN and private suffixes
===
The public suffix list contains both suffixes for ICANN domains and private suffixes. Using the flag `only_icann` the private suffixes can be deactivated:
```
>>> psl = PublicSuffixList()
>>> psl.publicsuffix("example.priv.at")
'priv.at'
>>> psl = PublicSuffixList(only_icann=True)
>>> psl.publicsuffix("example.priv.at")
'at'
```
License
===
- This module is licensed under Mozilla Public License 2.0.
- Public Suffix List maintained by Mozilla Foundation is licensed under Mozilla Public License 2.0.
- PSL testcase dataset is public domain (CC0).
Source / Link
===
- Git repository on GitHub (https://github.com/ko-zu/psl)
- PyPI (https://pypi.python.org/pypi?name=publicsuffixlist&:action=display)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
publicsuffixlist-0.6.0.tar.gz
(81.1 kB
view hashes)
Built Distribution
Close
Hashes for publicsuffixlist-0.6.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aed31a42edbdfbf8b4fe98a3373bcda689680962aa1650be1add7fb09485b89 |
|
MD5 | a0159344b86bb79eb3ead54b3ef4f9b0 |
|
BLAKE2b-256 | 95fefd3a15cedf662b928445b648f796fcc687cf84c2815149834dd89292195d |