Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
Features
- Multiple programming language supported such as
Python
,C++
andShell
- Intuitive interface and identical in C++ and Python
- Provide two seperated class Url and Host for the purpose of clean code
- Also support public_suffix_list for known combinatorial suffix such as "ac.ir"
- Support unknown suffix like "google.comm" (it detect "comm" as suffix)
- Update public_suffix_list automatically before each build and deploy
- Host properties:
- subdomain
- domain
- domain_name
- suffix
- Url properties:
- protocol
- userinfo
- host (and all the host properties)
- port
- path
- query
- params
- fragment
Setup
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip by pypi
pip install liburlparser
if you want to use psl.update to update the public suffix list, you must install the online
version
pip install "liburlparser[online]"
Or
pip by git
pip install git+https://github.com/mohammadraziei/liburlparser
Or
manually
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url and host
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.10s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for liburlparser-1.4.5-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bfbffbe4d24b0db4421cc38904843bc26a7c25d7c60d8962608777c80466532 |
|
MD5 | 509a2bdb48b213aafbe25716550212f2 |
|
BLAKE2b-256 | 2312423e233673f3db557dd26eedc1b64965a5e62c45c613ff99778f5f2b9ef4 |
Hashes for liburlparser-1.4.5-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b68b8b7ee7906df71e9d1bb19dcb0f118cbcbf6ae67bf234e0e90699c0213156 |
|
MD5 | 5637d3b77b9ff90670f23bb6bdcb0b9b |
|
BLAKE2b-256 | 7f3028c0f96eaf1af5a1374c18c6eb4b9dceefd9faa9495661cf50844c3e4827 |
Hashes for liburlparser-1.4.5-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33fed90ed6c2dba58e3169be22b07846bd0c3f3c5b667020faaf254daea65f40 |
|
MD5 | 27d48a96a248041c3363cca37fb45cd1 |
|
BLAKE2b-256 | 6a5028b1effdabc72cd05f99b1bb23d503bdced6a504dff36af67b128e5e7462 |
Hashes for liburlparser-1.4.5-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c9d4fee5cfb6565f7b683ac4597367134d747d5a04a719fd65c93341a7535c3 |
|
MD5 | b711615cd637fcb886a7b7c05a118ee3 |
|
BLAKE2b-256 | 64ef549b7cc083b5519a7aacded4b29803f93ccf7daf2fcc2a87eb1d5336fb7f |
Hashes for liburlparser-1.4.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c800dbccfb46284cf7f37290533396a29f1bf4d42a0f6b6c339ba4cb06a0e30f |
|
MD5 | b5ed2efb661231e5bade02aac294af7b |
|
BLAKE2b-256 | 608590cd8878e86d76aea2c78983060cac3d054151b565c7e9d69fdd33c24378 |
Hashes for liburlparser-1.4.5-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8da51b4463cdf9280a1ce0c2511db7084b8a2c10607474e64b14ff3a8abf1904 |
|
MD5 | 832e73ed7bd5ce90fc96ff09f00c6893 |
|
BLAKE2b-256 | 9c3502b7195519ca72e0f3ffd16ff66738998a7a8215fe3911adc41d8ec08693 |
Hashes for liburlparser-1.4.5-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1df59fceae6e3054cc96e66b8af6bc51166a3be4e95493e6d1a883aa0940c19 |
|
MD5 | ac1a8c59348a70a30871e6a749f1b428 |
|
BLAKE2b-256 | 3d5ffd0022bf2e701a54c3e9aa768bc92de32ff766b92db21b2a0dcaee69c326 |
Hashes for liburlparser-1.4.5-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88c45830d9cef78158e7d67c23ff5b44aa04d80ea56d2b26db54799f5411e100 |
|
MD5 | c78b920bdf116ea29d54b1e4552471f7 |
|
BLAKE2b-256 | b5b5aea682a1e64caf49923eaf0af934b3394399050513bf99459331bc62140a |
Hashes for liburlparser-1.4.5-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aeddd429d03f2788a28bdd8449ae1c37a779bc5b9ebe31e86cf1171c782346d |
|
MD5 | dc058083861192f26ea5d36560b0ab61 |
|
BLAKE2b-256 | 7986a95ea16219861a5ad98307fb85d2ac1e67869ff4c6cc66ebe554c2337891 |
Hashes for liburlparser-1.4.5-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1453215615dc4f937c5d23bb01afef60ce6acefae51359890ddfe78bfa544dcb |
|
MD5 | 6c9103b568d2c89a49db086ec5f11773 |
|
BLAKE2b-256 | 10f8250f16bab50a216bd54daa61eba5152ae9a8e0e8cde39757c6f081e7ff6b |
Hashes for liburlparser-1.4.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10a955d2f393e1e4e233ce5207da6478591f1390024368a724ccfeeee207f17d |
|
MD5 | f1167f7d84ea3d3cf53be8aaf356886c |
|
BLAKE2b-256 | cbb2da36c2eeb2390be8f79048b4244cae66f6b98098c7d478ec05174489e675 |
Hashes for liburlparser-1.4.5-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1408b4c0a9c13d6ffeb4e1777d72bfb745d96e3a725805324d8afce9b145bfa6 |
|
MD5 | 74d125abc7cb5f7d8bce7aef3cb20158 |
|
BLAKE2b-256 | 6a8cd5cd4ce0f002e207d6b42c12c3e261390f15802f9e1f43db44a9f2e113b8 |
Hashes for liburlparser-1.4.5-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 185a97f48f24463204abc0f5b951c82fbb895b6c144b60beb940f39b5cbf135e |
|
MD5 | 5085ffe3fa9d84cff007e845bf5baa4b |
|
BLAKE2b-256 | dce385861b8d1bed47a01ee6ddc06cdc5a2c906e4ee0fd38103b630633686728 |
Hashes for liburlparser-1.4.5-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68eb5b06e028f5e501f392ec81d45000efdc3919fe108aa692b6e7d8f2e0def2 |
|
MD5 | b861fed4cf976693f1ea06bf7b3d77f1 |
|
BLAKE2b-256 | 5581d7ce08aa43958e7b85a281ae0f9f415cccbc53a82820ec7b5cd0975c2c24 |
Hashes for liburlparser-1.4.5-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1c2119515f89a307db7981e7da0cd0e056881e69c9a1dc6794c32e663c3d107 |
|
MD5 | 9aee1e0f1a0fa589ae569a41c9d2b355 |
|
BLAKE2b-256 | 5a0d7cdde91f40db7eda664ead3007a71e3a9ca93ef866ccfcb6d951b43c009f |
Hashes for liburlparser-1.4.5-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29bddffa51d5adbb14d304420e39d92c5433e89fb94778554bc6b8b98455c2a4 |
|
MD5 | 090d7ff87be82f1846747b704e198e82 |
|
BLAKE2b-256 | f0215fe9bb4d0c3f4015470d91b570c2f95bec5f32ad9a2f3cce7bd7d80eccc9 |
Hashes for liburlparser-1.4.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72ae54072ebfa7a89e5bae3b0e3511299e07274a1d23cba803689c1bd1fdf529 |
|
MD5 | 9c92dbcbe59caa8536dc1d272d23e1f6 |
|
BLAKE2b-256 | 4e7d7ec0002780c398d73bc23732fad64fb225d466f09a6dd1880269421417c4 |
Hashes for liburlparser-1.4.5-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47df33266750433841f8b9ef17bbcf34e25b0cf51634e24b6ac10667c926951c |
|
MD5 | b1d49f37fe07c54d446b76dbee40f8bd |
|
BLAKE2b-256 | 455fa2fbb8eecf58913f32fb0429579e27c9e5d37a3f56a03222bebb91a0a0a1 |
Hashes for liburlparser-1.4.5-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1105d8dab13bb4fc5c10a057d8506186eb3c37c2d42ce6ed0d482232151415df |
|
MD5 | 43feaef7f62b11244a7f0702cde47034 |
|
BLAKE2b-256 | 09fa1114f3eb227409b683716488318b39f86b76ecfbf0833e8aafe8d6fc1065 |
Hashes for liburlparser-1.4.5-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5c0cd2f1bcd721e0ed8a7597cd3815803d672ce28fbc5d4b321553e8d966717 |
|
MD5 | c3bfbbe21bbc407c398b3b397398d0b9 |
|
BLAKE2b-256 | a55a2e23d555a86b6bfca594d1cf886d158fd6af4a8dfb424c9e14b3b4099113 |
Hashes for liburlparser-1.4.5-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c3f263b98cc8291fe18e7983c86aadc6e2232f2c073f443ce1d5e124834b687 |
|
MD5 | 939a98a5730d7663dfa244a3965985c2 |
|
BLAKE2b-256 | ee53a1192413e1b1841597820f820e80fd7b1d205076e44e68e4b500a7d8f998 |
Hashes for liburlparser-1.4.5-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2982faa7cdf5b8a4934951a5866b56284cea9f2ca183eada8dfa58ce9cf01a6d |
|
MD5 | 2daf37b285f27cd5f7e148a58b0c5d8c |
|
BLAKE2b-256 | 42e7c73116325e3af3d5b7025e15414cb841a74b82e1233b4ee2c211081ea9b5 |
Hashes for liburlparser-1.4.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e2c904eef19bff3c613d8473098d5b0a22de11a584f2f3ed75909944ada9250 |
|
MD5 | 082c9df6581f876af7993632482b2363 |
|
BLAKE2b-256 | 95e8a925eadd9e602da2da73873af9ff99e36d43b2be79e6a156c372f8ab9451 |
Hashes for liburlparser-1.4.5-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b291bb0064ef5a499095dead338145804587726a031d43fca1fe97d67b25178 |
|
MD5 | feb89148e60e58837f7045518456bf19 |
|
BLAKE2b-256 | 55048913447f532d705175a60403c35fa86f16a9a364ca91d5162bfb689af5f1 |