A collection of util functions for extracting domains from urls.
Project description
domain_utils
A collection of util functions for extracting domains from urls.
Repo: https://github.com/mozilla/domain_utils
Install:
pip install domain_utils
Use:
import domain_utils as du # Return just the url `my.domain.cloudfront.net/a/path/to/a/file.html` du.stem_url('https://my.domain.cloudfront.net/a/path/to/a/file.html?a=1') # Return just the eTLD+1 `domain.cloudfront.net` du.get_etld1('https://my.domain.cloudfront.net/a/path/to/a/file.html?a=1') # Get the port `5000` du.get_port('https://localhost:5000/a/path/to/a/file.html?a=1') # Get the scheme `wss` du.get_port('wss://somedomain.example.com/a/path/to/a/ws')
This package was originally extracted from openwpm-utils.
Free software: Mozilla Public License license
Documentation: https://domain-utils.readthedocs.io.
Community Participation Guidelines
This project is governed by Mozilla’s code of conduct and etiquette guidelines.
For more details, please read the Mozilla Community Participation Guidelines.
For more information on how to report violations of the Community Participation Guidelines, please read our How to Report page.
History
0.7.1 (2020-04-10)
Fix building on readthedocs.
0.7.0 (2020-04-10)
Thanks to new contributor @yabirgb for two PRs (#20 and #25) in this release.
API changes: #26 renamed get_stripped_url to stem_url, and get_ps_plus_1 to get_etld1. Old method names will continue to work though. #22 updated keyword arguments to get_stripped_url - default behavior is basically the same.
API changes (#26 and #22)
Support parsing ws/wss urls (#22)
Add get_port method (#25)
Add get_scheme method (#20)
Correct license declaration in setup.py (#24)
0.6.0 (2020-04-06)
Use tldextract for parsing domains (#12)
Use numpy style docstrings
Support case of no scheme and port in URL (#13)
0.5.0 (2020-04-03)
Remove support for python 3.5
Handle more cases in get_stripped_url and change default behavior:
handle a lack of scheme
boolean flag to return or not non http urls - default is to return them which is a change of behavior as previously they would not return
Use netloc by default instead of hostname with a boolean flag to use hostname.
0.4.0 (2020-03-25)
Remove py27 support
0.3.0 (2020-03-25)
Restore py27 support.
Last version with py27 support.
Remove tox
0.2.0 (2020-03-24)
Extracted from https://github.com/mozilla/openwpm-utils/blob/master/openwpm_utils/domain.py
Removed python 2 support and dependencies
Removed broken get_stripped_urls function
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for domain_utils-0.7.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3066a2db849240bcd5aebed68d9996518c71a510cb86db1444ed1c3a1baf398 |
|
MD5 | 59b2d91b96868255cf80e9d1819e0b3a |
|
BLAKE2b-256 | 1399cc070cb6a1ad1ccfe0108284e948c36ffbb6079b0c887c927a90bdb7a34b |