Skip to main content

Remove clutter from URLs and return a canonicalized version

Project description

cleanurl

Remove clutter from URLs and return a canonicalized version

Install

pip install cleanurl

or if you're using poetry:

poetry add cleanurl

Usage

By default cleanurl retuns a cleaned URL without respecting semantics. For example:

>>> import cleanurl
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')
>>> r.url
'https://xojoc.pw/blog/focus'
>>> r.parsed_url
ParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')

The default parameters are useful if you want to get a canonical URL without caring if the resulting URL is still valid.

If you want to get a clean URL which is still valid call it like this:

>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)
>>> r.url
'https://www.xojoc.pw/blog/focus.html'

celeanurl.cleanurl parameters:

  • generic -> if True don't use site specific rules
  • respect_semantics -> if True make sure the returned URL is still valid, altough it may still contain some superfluous elements
  • host_remap -> whether to remap hosts. Example:
>>> import cleanurl
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=True).url
'https://twitter.com/i/status/1453753924960219145'
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=False).url
'https://threadreaderapp.com/thread/1453753924960219145'

For more examples see the unit tests.

Why?

While there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.

Users

Initially used for discu.eu.

Who?

cleanurl was written by Alexandru Cojocaru.

License

cleanurl is Free Software and is released as AGPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanurl-0.1.7.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

cleanurl-0.1.7-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file cleanurl-0.1.7.tar.gz.

File metadata

  • Download URL: cleanurl-0.1.7.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.2 Linux/5.10.0-10-amd64

File hashes

Hashes for cleanurl-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f07bb7b275df4f83d6993030f12ccc1fe999ed568c9b094341ae179a3ef97d17
MD5 b8f071f2ab05081d80d7897f6b953fc7
BLAKE2b-256 9dd253c3bc15111da4d8c126d1a1f8eda6506b1b6ac27ba9850c37f7519bbf48

See more details on using hashes here.

File details

Details for the file cleanurl-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: cleanurl-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.2 Linux/5.10.0-10-amd64

File hashes

Hashes for cleanurl-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 56521b29d541925e24c8aa31c790d765a1d39a174fd5d1a46f1a5d8e0f5f0c7b
MD5 e5c82273da4de61a2d55881a739a5053
BLAKE2b-256 bbb2f599062143a526963b9aa6014ec1a0300553b418fc71bcd43140d32a1e29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page