Skip to main content

Extract the top-level domain (TLD) from the URL given.

Project description

Extract the top level domain (TLD) from the URL given. List of TLD names is taken from Public Suffix.

Optionally raises exceptions on non-existing TLDs or silently fails (if fail_silently argument is set to True).

PyPI Version Supported Python versions Build Status MPL-1.1 OR GPL-2.0-only OR LGPL-2.1-or-later Coverage

Prerequisites

  • Python 3.6, 3.7 and 3.8

Support for Python 2.7 and 3.5 is available as well.

Documentation

Documentation is available on Read the Docs.

Installation

Latest stable version on PyPI:

pip install tld

Or latest stable version from GitHub:

pip install https://github.com/barseghyanartur/tld/archive/stable.tar.gz

Usage examples

In addition to examples below, see the jupyter notebook workbook file.

Get the TLD name as string from the URL given

from tld import get_tld

get_tld("http://www.google.co.uk")
# 'co.uk'

get_tld("http://www.google.idontexist", fail_silently=True)
# None

Get the TLD as an object

from tld import get_tld

res = get_tld("http://some.subdomain.google.co.uk", as_object=True)

res
# 'co.uk'

res.subdomain
# 'some.subdomain'

res.domain
# 'google'

res.tld
# 'co.uk'

res.fld
# 'google.co.uk'

res.parsed_url
# SplitResult(
#     scheme='http',
#     netloc='some.subdomain.google.co.uk',
#     path='',
#     query='',
#     fragment=''
# )

Get TLD name, ignoring the missing protocol

from tld import get_tld, get_fld

get_tld("www.google.co.uk", fix_protocol=True)
# 'co.uk'

get_fld("www.google.co.uk", fix_protocol=True)
# 'google.co.uk'

Return TLD parts as tuple

from tld import parse_tld

parse_tld('http://www.google.com')
# 'com', 'google', 'www'

Get the first level domain name as string from the URL given

from tld import get_fld

get_fld("http://www.google.co.uk")
# 'google.co.uk'

get_fld("http://www.google.idontexist", fail_silently=True)
# None

Check if some tld is a valid tld

from tld import is_tld

is_tld('co.uk)
# True

is_tld('uk')
# True

is_tld('tld.doesnotexist')
# False

is_tld('www.google.com')
# False

Update the list of TLD names

To update/sync the tld names with the most recent versions run the following from your terminal:

update-tld-names

Or simply do:

from tld.utils import update_tld_names

update_tld_names()

Note, that this will update all registered TLD source parsers (not only the list of TLD names taken from Mozilla). In order to run the update for a single parser, append uid of that parser as argument.

update-tld-names mozilla

Custom TLD parsers

By default list of TLD names is taken from Mozilla. Parsing implemented in the tld.utils.MozillaTLDSourceParser class. If you want to use another parser, subclass the tld.base.BaseTLDSourceParser, provide uid, source_url, local_path and implement the get_tld_names method. Take the tld.utils.MozillaTLDSourceParser as a good example of such implementation. You could then use get_tld (as well as other tld module functions) as shown below:

from tld import get_tld
from some.module import CustomTLDSourceParser

get_tld(
    "http://www.google.co.uk",
    parser_class=CustomTLDSourceParser
)

Custom list of TLD names

You could maintain your own custom version of the TLD names list (even multiple ones) and use them simultaneously with built in TLD names list.

You would then store them locally and provide a path to it as shown below:

from tld import get_tld
from tld.utils import BaseMozillaTLDSourceParser

class CustomBaseMozillaTLDSourceParser(BaseMozillaTLDSourceParser):

    uid: str = 'custom_mozilla'
    local_path: str = 'tests/res/effective_tld_names_custom.dat.txt'

get_tld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)
# 'foreverchild'

Same goes for first level domain names:

from tld import get_fld

get_fld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)
# 'www.foreverchild'

Note, that in both examples shown above, there the original TLD names file has been modified in the following way:

...
// ===BEGIN ICANN DOMAINS===

// This one actually does not exist, added for testing purposes
foreverchild
...

Free up resources

To free up memory occupied by loading of custom TLD names, use reset_tld_names function with tld_names_local_path parameter.

from tld import get_tld, reset_tld_names

# Get TLD from a custom TLD names parser
get_tld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)

# Free resources occupied by the custom TLD names list
reset_tld_names("tests/res/effective_tld_names_custom.dat.txt")

Support for Python 2.7 and 3.5

As you might have noticed, typing (Python 3.6+) is extensively used in the code. However, Python 3.5 will likely be supported until it’s EOL. All modern recent versions (starting from tld 0.11.7) are fully compatible with Python 2.7 and 3.5 (just works with pip install tld).

Install from pip

pip install tld

Development tips follow:

Python 2.7

Install locally in development mode

python setup.py develop --python-tag py27

Prepare dist

./scripts/prepare_build_py27.sh

Run tests

tox -e py27

Python 3.5

Install locally in development mode

python setup.py develop --python-tag py35

Prepare dist

./scripts/prepare_build_py35.sh

Run tests

tox -e py35

Troubleshooting

If somehow domain names listed here are not recognised, make sure you have the most recent version of TLD names in your virtual environment:

update-tld-names

To update TLD names list for a single parser, specify it as an argument:

update-tld-names mozilla

Testing

Simply type:

./runtests.py

Or use tox:

tox

Or use tox to check specific env:

tox -e py38

Writing documentation

Keep the following hierarchy.

=====
title
=====

header
======

sub-header
----------

sub-sub-header
~~~~~~~~~~~~~~

sub-sub-sub-header
^^^^^^^^^^^^^^^^^^

sub-sub-sub-sub-header
++++++++++++++++++++++

sub-sub-sub-sub-sub-header
**************************

License

MPL-1.1 OR GPL-2.0-only OR LGPL-2.1-or-later

Support

For any issues contact me at the e-mail given in the Author section.

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tld-0.12.2.tar.gz (898.2 kB view details)

Uploaded Source

Built Distributions

tld-0.12.2-py38-none-any.whl (330.0 kB view details)

Uploaded Python 3.8

tld-0.12.2-py37-none-any.whl (330.0 kB view details)

Uploaded Python 3.7

tld-0.12.2-py36-none-any.whl (330.0 kB view details)

Uploaded Python 3.6

tld-0.12.2-py35-none-any.whl (329.3 kB view details)

Uploaded Python 3.5

tld-0.12.2-py27-none-any.whl (329.6 kB view details)

Uploaded Python 2.7

File details

Details for the file tld-0.12.2.tar.gz.

File metadata

  • Download URL: tld-0.12.2.tar.gz
  • Upload date:
  • Size: 898.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2.tar.gz
Algorithm Hash digest
SHA256 cf8410a7ed7b9477f563fa158dabef5117d8374cba55f65142ba0af6dcd15d4d
MD5 ee79876dc2108a55b9b7ab9f07d6774b
BLAKE2b-256 21adea313baa4f03474b0a212c47da8b92fe52241d4862ad3e29fb025ee2f85f

See more details on using hashes here.

File details

Details for the file tld-0.12.2-py38-none-any.whl.

File metadata

  • Download URL: tld-0.12.2-py38-none-any.whl
  • Upload date:
  • Size: 330.0 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2-py38-none-any.whl
Algorithm Hash digest
SHA256 7a172dc412bb46624f6c61c7afa9ba581a9147699c94386b802830836752ba9f
MD5 29bbe255d584b92ded071fb58d47e53a
BLAKE2b-256 64b1e0988f2503f6793606371b20665e564a3d545c9a5883ae460602685108e1

See more details on using hashes here.

File details

Details for the file tld-0.12.2-py37-none-any.whl.

File metadata

  • Download URL: tld-0.12.2-py37-none-any.whl
  • Upload date:
  • Size: 330.0 kB
  • Tags: Python 3.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2-py37-none-any.whl
Algorithm Hash digest
SHA256 5eb6f39835c286189cd3bab7613e1d611fd03e91a5b1926172020c978881daef
MD5 6565481214c8c49919b452b94741b45a
BLAKE2b-256 f452aa31dba9b1c0b6c9f4b676f3f2a25677ed553351072f5c7fbc4f16b79383

See more details on using hashes here.

File details

Details for the file tld-0.12.2-py36-none-any.whl.

File metadata

  • Download URL: tld-0.12.2-py36-none-any.whl
  • Upload date:
  • Size: 330.0 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2-py36-none-any.whl
Algorithm Hash digest
SHA256 3e7ecadbc58632af78b2aa887d6eaf0d20983061720c070e5bd5639c2bcf63cb
MD5 9dae7f55cb9506c8a493aee2f875f9d8
BLAKE2b-256 3d8f32f3e61027495ce2822f32280178440e40bcf6038c8d66704d7caf4d3850

See more details on using hashes here.

File details

Details for the file tld-0.12.2-py35-none-any.whl.

File metadata

  • Download URL: tld-0.12.2-py35-none-any.whl
  • Upload date:
  • Size: 329.3 kB
  • Tags: Python 3.5
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2-py35-none-any.whl
Algorithm Hash digest
SHA256 66d1b79284d014a40c36d675c1a56ec22a716d87da2a190b12fe2b267c5a95e9
MD5 de72a4ca16debe6489044f2a60d526e1
BLAKE2b-256 5daa619b22f4997307f3e1cc5755558f18bc60adf9e35d5196cfcc1af1360abc

See more details on using hashes here.

File details

Details for the file tld-0.12.2-py27-none-any.whl.

File metadata

  • Download URL: tld-0.12.2-py27-none-any.whl
  • Upload date:
  • Size: 329.6 kB
  • Tags: Python 2.7
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9

File hashes

Hashes for tld-0.12.2-py27-none-any.whl
Algorithm Hash digest
SHA256 afc49c2d8d03ebd3cb686fc958747c03d0db5f51a5c5038c893a44fdae4a1987
MD5 66249070c35dce29ba35ee8c7a3db91e
BLAKE2b-256 e876ed08580f766b6ffa140b4b889c4905aab59451e70812d183df2b89122fd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page