Skip to main content

Accurately separate the gTLD/ccTLD component from the registered domain and subdomains of a URL.

Project description

The tldextract module accurately separates the gTLD and ccTLDs from the registered domain and subdomains of a URL. For example, you may want the ‘www.google’ part of http://www.google.com. This is simple to do by splitting on the ‘.’ and using all but the last split element, however that will not work for URLs with arbitrary numbers of subdomains and country codes, unless you know what all country codes look like. Think http://forums.bbc.co.uk for example.

tldextract can give you the subdomains, domain, and gTLD/ccTLD component of a URL, because it looks up–and caches locally–the currently living TLDs according to iana.org.

>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
>>> ext['subdomain'], ext['domain'], ext['tld']
('forums.news', 'cnn', 'com')
>>> ext = tldextract.extract('http://forums.bbc.co.uk/')
>>> ext['subdomain'], ext['domain'], ext['tld']
('forums', 'bbc', 'co.uk')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tldextract-0.1.tar.gz (3.2 kB view details)

Uploaded Source

File details

Details for the file tldextract-0.1.tar.gz.

File metadata

  • Download URL: tldextract-0.1.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tldextract-0.1.tar.gz
Algorithm Hash digest
SHA256 9b6095ffd073da7593f7c624dcf24bfbba0e07a6ca2fcfa268fb140b073d0ade
MD5 27e050e2e2037bc4d59d1cf52c3944e8
BLAKE2b-256 90bbf366fca33da1e33d38ad4b355b8d33b7efb85b5bc7305b73833e9966220d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page