Skip to main content

Removes clutter from URLs and returns a canonicalized version

Project description

cleanurl

Removes clutter from URLs and returns a canonicalized version

Install

pip install cleanurl

or if you're using poetry:

poetry add cleanurl

Usage

By default cleanurl retuns a cleaned URL without respecting semantics. For example:

>>> import cleanurl
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')
>>> r.url
'https://xojoc.pw/blog/focus'
>>> r.parsed_url
ParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')

The default parameters are useful if you want to get a canonical URL without caring if the resulting URL is still valid.

If you want to get a clean URL which is still valid call it like this:

>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)
>>> r.url
'https://xojoc.pw/blog/focus.html'

For more examples see the unit tests.

Why?

While there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.

Users

Initially used for discu.eu.

Who?

cleanurl was written by Alexandru Cojocaru.

License

cleanurl is Free Software and is released as AGPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanurl-0.1.4.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

cleanurl-0.1.4-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file cleanurl-0.1.4.tar.gz.

File metadata

  • Download URL: cleanurl-0.1.4.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.2 Linux/5.10.0-10-amd64

File hashes

Hashes for cleanurl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8cb3c16456d30178946c27aeb75e5054877d6c2ba56e5f7d146bf4db5bffb423
MD5 1dbd9e7ab06ad1cd469c21a5b22378c0
BLAKE2b-256 f0c46c0e1bfed03bc1ef1bc9ed681a838eef5e90b508c58a5af00f06da115825

See more details on using hashes here.

File details

Details for the file cleanurl-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: cleanurl-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.9.2 Linux/5.10.0-10-amd64

File hashes

Hashes for cleanurl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ab7f3ae667ee6f08b5d44a4097de789f5c5a45be9d7970f28b88fe3a1417398c
MD5 7b50c0954e5fbaf5a9c615e4305bcec4
BLAKE2b-256 7b6470aa845145c5de050be6406522ea71a2b254fa934f3598fbd2274735f5dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page