Remove clutter from URLs and return a canonicalized version
Project description
cleanurl
Remove clutter from URLs and return a canonicalized version
Install
pip install cleanurl
or if you're using poetry:
poetry add cleanurl
Usage
By default cleanurl retuns a cleaned URL without respecting semantics. For example:
>>> import cleanurl
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')
>>> r.url
'https://xojoc.pw/blog/focus'
>>> r.parsed_url
ParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')
The default parameters are useful if you want to get a canonical URL without caring if the resulting URL is still valid.
If you want to get a clean URL which is still valid call it like this:
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)
>>> r.url
'https://www.xojoc.pw/blog/focus.html'
celeanurl.cleanurl
parameters:
generic
-> if True don't use site specific rulesrespect_semantics
-> if True make sure the returned URL is still valid, altough it may still contain some superfluous elementshost_remap
-> whether to remap hosts. Example:
>>> import cleanurl
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=True).url
'https://twitter.com/i/status/1453753924960219145'
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=False).url
'https://threadreaderapp.com/thread/1453753924960219145'
For more examples see the unit tests.
Why?
While there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.
Users
Initially used for discu.eu.
Who?
cleanurl was written by Alexandru Cojocaru.
License
cleanurl is Free Software and is released as AGPLv3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cleanurl-0.1.15.tar.gz
.
File metadata
- Download URL: cleanurl-0.1.15.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.11.2 Linux/6.1.0-1-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e05e9fe59491a5df51dd4a08015d82259cdd1c2fe2f6b573205d8ec09877bbaa |
|
MD5 | dedb6c91e75b7d7c9e4279b620e385fe |
|
BLAKE2b-256 | 92fbbf71e2b1060f36fb26f1b62f26f8a9d27c13a95b9a86310118f963071619 |
File details
Details for the file cleanurl-0.1.15-py3-none-any.whl
.
File metadata
- Download URL: cleanurl-0.1.15-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.11.2 Linux/6.1.0-1-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24edd6f8d4d01b8781c709b122e0f0d55defa081535ef416f7f04aaedf9bde7a |
|
MD5 | bbb78e4c47d93892e1252e7af9e817d2 |
|
BLAKE2b-256 | e6d932b98ad854a35cde655f462d0d0fc55ae052188eb54c7c835dfb8dd0b35e |