Skip to main content
Help the Python Software Foundation raise $60,000 USD by December 31st!  Building the PSF Q4 Fundraiser

Accurately separate the gTLD/ccTLD component from the registered domain and subdomains of a URL.

Project description

The tldextract module accurately separates the gTLD and ccTLDs from the registered domain and subdomains of a URL. For example, you may want the ‘www.google’ part of http://www.google.com. This is simple to do by splitting on the ‘.’ and using all but the last split element, however that will not work for URLs with arbitrary numbers of subdomains and country codes, unless you know what all country codes look like. Think http://forums.bbc.co.uk for example.

tldextract can give you the subdomains, domain, and gTLD/ccTLD component of a URL, because it looks up–and caches locally–the currently living TLDs according to iana.org.

>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
>>> ext['subdomain'], ext['domain'], ext['tld']
('forums.news', 'cnn', 'com')
>>> ext = tldextract.extract('http://forums.bbc.co.uk/')
>>> ext['subdomain'], ext['domain'], ext['tld']
('forums', 'bbc', 'co.uk')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tldextract, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size tldextract-0.1.1.tar.gz (3.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page