Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions. You can optionally support the Public Suffix List's private domains as well.

These details have not been verified by PyPI

Project links

Homepage

Project description

tldextract

tldextract accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL).

Why? Naive URL parsing like splitting on dots fails for domains like forums.bbc.co.uk (gives "co" instead of "bbc"). tldextract handles the edge cases, so you don't have to.

Quick Start

>>> import tldextract

>>> tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)

>>> tldextract.extract('http://forums.bbc.co.uk/')
ExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)

>>> # Access the parts you need
>>> ext = tldextract.extract('http://forums.bbc.co.uk')
>>> ext.domain
'bbc'
>>> ext.top_domain_under_public_suffix
'bbc.co.uk'
>>> ext.fqdn
'forums.bbc.co.uk'

Install

pip install tldextract

How-to Guides

How to disable HTTP suffix list fetching for production

no_fetch_extract = tldextract.TLDExtract(suffix_list_urls=())
no_fetch_extract('http://www.google.com')

How to set a custom cache location

Via environment variable:

export TLDEXTRACT_CACHE="/path/to/cache"

Or in code:

custom_cache_extract = tldextract.TLDExtract(cache_dir='/path/to/cache/')

How to update TLD definitions

Command line:

tldextract --update

Or delete the cache folder:

rm -rf $HOME/.cache/python-tldextract

How to treat private domains as suffixes

extract = tldextract.TLDExtract(include_psl_private_domains=True)
extract('waiterrant.blogspot.com')
# ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)

How to use a local suffix list

extract = tldextract.TLDExtract(
    suffix_list_urls=["file:///path/to/your/list.dat"],
    cache_dir='/path/to/cache/',
    fallback_to_snapshot=False)

How to use a remote suffix list

extract = tldextract.TLDExtract(
    suffix_list_urls=["https://myserver.com/suffix-list.dat"])

How to add extra suffixes

extract = tldextract.TLDExtract(
    extra_suffixes=["foo", "bar.baz"])

How to validate URLs before extraction

from urllib.parse import urlsplit

split_url = urlsplit("https://example.com:8080/path")
result = tldextract.extract_urllib(split_url)

Command Line

$ tldextract http://forums.bbc.co.uk
forums bbc co.uk

$ tldextract --update  # Update cached suffix list
$ tldextract --help    # See all options

Understanding Domain Parsing

Public Suffix List

tldextract uses the Public Suffix List, a community-maintained list of domain suffixes. The PSL contains both:

Public suffixes: Where anyone can register a domain (.com, .co.uk, .org.kg)
Private suffixes: Operated by companies for customer subdomains (blogspot.com, github.io)

Web browsers use this same list for security decisions like cookie scoping.

Suffix vs. TLD

While .com is a top-level domain (TLD), many suffixes like .co.uk are technically second-level. The PSL uses "public suffix" to cover both.

Default behavior with private domains

By default, tldextract treats private suffixes as regular domains:

>>> tldextract.extract('waiterrant.blogspot.com')
ExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_private=False)

To treat them as suffixes instead, see How to treat private domains as suffixes.

Caching behavior

By default, tldextract fetches the latest Public Suffix List on first use and caches it indefinitely in $HOME/.cache/python-tldextract.

URL validation

tldextract accepts any string and is very lenient. It prioritizes ease of use over strict validation, extracting domains from any string, even partial URLs or non-URLs.

FAQ

Can you add/remove suffix ____?

tldextract doesn't maintain the suffix list. Submit changes to the Public Suffix List.

Meanwhile, use the extra_suffixes parameter, or fork the PSL and pass it to this library with the suffix_list_urls parameter.

My suffix is in the PSL but not extracted correctly

Check if it's in the "PRIVATE" section. See How to treat private domains as suffixes.

Why does it parse invalid URLs?

See URL validation and How to validate URLs before extraction.

Contribute

Setting up

git clone this repository.
Change into the new directory.
pip install --upgrade --editable '.[testing]'

Running tests

tox --parallel       # Test all Python versions
tox -e py311         # Test specific Python version
ruff format .        # Format code

History

This package started from a StackOverflow answer about regex-based domain extraction. The regex approach fails for many domains, so this library switched to the Public Suffix List for accuracy.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

5.3.1

Dec 28, 2025

5.3.0

Apr 22, 2025

5.2.0

Apr 7, 2025

5.1.3

Nov 5, 2024

5.1.2

Mar 19, 2024

5.1.1

Nov 17, 2023

5.1.0

Nov 6, 2023

5.0.1

Oct 17, 2023

5.0.0

Oct 11, 2023

4.0.0

Oct 11, 2023

3.6.0 yanked

Sep 19, 2023

3.5.0

Sep 7, 2023

3.4.4

May 20, 2023

3.4.3

May 18, 2023

3.4.2

May 16, 2023

3.4.1

Apr 26, 2023

3.4.0

Oct 4, 2022

3.3.1

Jul 8, 2022

3.3.0

May 4, 2022

3.2.1

Apr 11, 2022

3.2.0

Feb 21, 2022

3.1.2

Sep 1, 2021

3.1.1

Aug 27, 2021

3.1.0

Nov 22, 2020

3.0.2

Oct 25, 2020

3.0.1

Oct 22, 2020

3.0.0

Oct 20, 2020

3.0.0rc1 pre-release

Oct 13, 2020

2.2.3

Aug 6, 2020

2.2.2

Oct 16, 2019

2.2.1

Mar 5, 2019

2.2.0

Oct 27, 2017

2.1.0

May 25, 2017

2.0.3

May 20, 2017

2.0.2

Oct 16, 2016

2.0.1

Apr 26, 2016

2.0.0

Apr 21, 2016

2.0rc1 pre-release

Apr 4, 2016

1.7.5

Feb 7, 2016

1.7.4

Dec 27, 2015

1.7.3

Dec 12, 2015

1.7.2

Nov 28, 2015

1.7.1

Aug 23, 2015

1.7

Aug 22, 2015

1.6

Mar 22, 2015

1.5.1

Oct 14, 2014

1.5

Sep 9, 2014

1.4

Jun 1, 2014

1.3.1

Dec 17, 2013

1.3

Dec 9, 2013

1.2.2

Oct 10, 2013

1.2.1

Oct 9, 2013

1.2

Jul 7, 2013

1.1.3

Jan 29, 2013

1.1.2

Sep 17, 2012

1.1.1

Jul 25, 2012

1.1

Mar 22, 2012

1.0

Feb 12, 2012

0.4

Jan 19, 2012

0.3.2

Jan 7, 2012

0.3.1

Jul 7, 2011

0.3

Jun 19, 2011

0.2

Mar 2, 2011

0.1.1

Mar 1, 2011

0.1

Feb 28, 2011

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tldextract-5.3.1.tar.gz (126.1 kB view details)

Uploaded Dec 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tldextract-5.3.1-py3-none-any.whl (105.9 kB view details)

Uploaded Dec 28, 2025 Python 3

File details

Details for the file tldextract-5.3.1.tar.gz.

File metadata

Download URL: tldextract-5.3.1.tar.gz
Upload date: Dec 28, 2025
Size: 126.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tldextract-5.3.1.tar.gz
Algorithm	Hash digest
SHA256	`a72756ca170b2510315076383ea2993478f7da6f897eef1f4a5400735d5057fb`
MD5	`e4e429649a5567af70c86669d5b7b9d4`
BLAKE2b-256	`657b644fbbb49564a6cb124a8582013315a41148dba2f72209bba14a84242bf0`

See more details on using hashes here.

File details

Details for the file tldextract-5.3.1-py3-none-any.whl.

File metadata

Download URL: tldextract-5.3.1-py3-none-any.whl
Upload date: Dec 28, 2025
Size: 105.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tldextract-5.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6bfe36d518de569c572062b788e16a659ccaceffc486d243af0484e8ecf432d9`
MD5	`327cc796139cc5910a6342ff555c2f8b`
BLAKE2b-256	`6d420e49d6d0aac449ca71952ec5bae764af009754fcb2e76a5cc097543747b3`

See more details on using hashes here.

tldextract 5.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tldextract

Quick Start

Install

How-to Guides

How to disable HTTP suffix list fetching for production

How to set a custom cache location

How to update TLD definitions

How to treat private domains as suffixes

How to use a local suffix list

How to use a remote suffix list

How to add extra suffixes

How to validate URLs before extraction

Command Line

Understanding Domain Parsing

Public Suffix List

Suffix vs. TLD

Default behavior with private domains

Caching behavior

URL validation

FAQ

Can you add/remove suffix ____?

My suffix is in the PSL but not extracted correctly

Why does it parse invalid URLs?

Contribute

Setting up

Running tests

History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes