Skip to main content

Remove clutter from URLs and return a canonicalized version

Project description

cleanurl

Remove clutter from URLs and return a canonicalized version

Install

pip install cleanurl

or if you're using poetry:

poetry add cleanurl

Usage

By default cleanurl retuns a cleaned URL without respecting semantics. For example:

>>> import cleanurl
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')
>>> r.url
'https://xojoc.pw/blog/focus'
>>> r.parsed_url
ParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')

The default parameters are useful if you want to get a canonical URL without caring if the resulting URL is still valid.

If you want to get a clean URL which is still valid call it like this:

>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)
>>> r.url
'https://www.xojoc.pw/blog/focus.html'

celeanurl.cleanurl parameters:

  • generic -> if True don't use site specific rules
  • respect_semantics -> if True make sure the returned URL is still valid, altough it may still contain some superfluous elements
  • host_remap -> whether to remap hosts. Example:
>>> import cleanurl
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=True).url
'https://twitter.com/i/status/1453753924960219145'
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=False).url
'https://threadreaderapp.com/thread/1453753924960219145'

For more examples see the unit tests.

Why?

While there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.

Users

Initially used for discu.eu.

Discussions around the web

Who?

cleanurl was written by Alexandru Cojocaru.

License

cleanurl is Free Software and is released as AGPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanurl-0.1.15.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

cleanurl-0.1.15-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file cleanurl-0.1.15.tar.gz.

File metadata

  • Download URL: cleanurl-0.1.15.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.11.2 Linux/6.1.0-1-amd64

File hashes

Hashes for cleanurl-0.1.15.tar.gz
Algorithm Hash digest
SHA256 e05e9fe59491a5df51dd4a08015d82259cdd1c2fe2f6b573205d8ec09877bbaa
MD5 dedb6c91e75b7d7c9e4279b620e385fe
BLAKE2b-256 92fbbf71e2b1060f36fb26f1b62f26f8a9d27c13a95b9a86310118f963071619

See more details on using hashes here.

File details

Details for the file cleanurl-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: cleanurl-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.11.2 Linux/6.1.0-1-amd64

File hashes

Hashes for cleanurl-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 24edd6f8d4d01b8781c709b122e0f0d55defa081535ef416f7f04aaedf9bde7a
MD5 bbb78e4c47d93892e1252e7af9e817d2
BLAKE2b-256 e6d932b98ad854a35cde655f462d0d0fc55ae052188eb54c7c835dfb8dd0b35e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page