Utilities to clean up URLs
Project description
Welcome to urlclean’s documentation!
urlclean provides functions:
to follow a http redirect,
to follow a HTML META redirect,
to remove Urchin and Facebook tracker URL parameters,
plugins for futher cleaning power,
combines all these to unshorten and resolve various URLS
Try it out from the commandline:
python -m urlclean <some url>
Contents:
Indices and tables
Index
Module Index
Search Page
Documentation for the Code
urlcleaner a module that resolves redirected urls and removes tracking url params
urlclean.weedparams(url)
removes Urchin Tracker and Facebook surveillance params from urls.
Args:
url (str): The url to scrub of ugly params
Returns:
(str). The return cleaned url
urlclean.httpresolve(url, ua=None, proxyhost=’’, proxyport=’’)
resolve one redirection of a http request.
Args:
url (str): The url to follow one redirect
ua (fn): A function returning a User Agent string (optional)
proxyhost (str): http proxy server (optional)
proxyport (int): http proxy server port (optional)
- Returns: (str, httplib.response). The return resolved url, and
the response from the http query
urlclean.unmeta(url, res)
Finds any meta redirects a httplib.response object that has text/html as content-type.
Args:
url (str): The url to follow one redirect
res (httplib.response): a http.response object
Returns: (str). The return resolved url
urlclean.unshorten(url, cache=None, ua=None, >>**<<kwargs)
resolves all HTTP/META redirects and optionally caches them in any object supporting a __getitem__, __setitem__ interface
Args:
url (str): The url to follow one redirect
cache (PersistentCryptoDict): an optional PersistentCryptoDict instance
ua (fn): A function returning a User Agent string (optional), the default is googlebot.
>>**<<kwargs (dict): optional proxy args for urlclean.httpresolve (default: localhost:8118)
Returns: (str). The return final cleaned url.
Plugins
Plugins should have a convert function that receives and returns a URL. In case of an error an unchanged URL should be returned.
Changelog
v0.5.4 - fixed httpresolve for relative urls
v0.5.1 - install/doc fixes
v0.5 - added plugins
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file urlclean-0.5.4.tar.gz
.
File metadata
- Download URL: urlclean-0.5.4.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a32d85e709acbd918c6487dad8fd793c14eaa680ea2cacd9d06f1e52ab13b96c |
|
MD5 | 0f3bb6e3f911c66a957402102d7555a1 |
|
BLAKE2b-256 | a25c479e3bcae6984bd1bbedfaa1a0b354d1991b9f3ff5ff75e8aaf3a5d09eca |
File details
Details for the file urlclean-0.5.4-py2.7.egg
.
File metadata
- Download URL: urlclean-0.5.4-py2.7.egg
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f134a3d1b10423f873429fa5817d3e71cff2d0fd00bc46f4ddfc6c656e1b057b |
|
MD5 | 5f385680861fcb085fe06ee272599acb |
|
BLAKE2b-256 | 7003b8e3aaff4b09f201a89c1b07bff6d869d0769ab8735e03e48f4b6540814d |