Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
Features
- Capable with multiple programming language such as Python, C++ and Shell
- Intuitive interface and identical in C++ and Python
- Provide two seperated class Url and Host for the purpose of clean code
- Also support public_suffix_list for known combinatorial suffix such as "ac.ir"
- Support unknown suffix like "google.comm" (it detect "comm" as suffix)
- Update public_suffix_list automatically before each build and deploy
- Host properties:
- subdomain
- domain
- domain_name
- suffix
- Url properties:
- protocol
- userinfo
- host (and all of host properties)
- port
- path
- query
- params
- fragment
Setup
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip install liburlparser
Or
pip install git+https://github.com/mohammadraziei/liburlparser
Or
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url as liburlparserc
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.20s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link: https://github.com/mohammadraziei/liburlparser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for liburlparser-0.0.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bc8eb0228c1cdfb13368ca88f1c5cbbb5309e1a648dd07c2dedcc0d0af3d8bd |
|
MD5 | b92a87a499f53b8b144c9c5caf2e2ce5 |
|
BLAKE2b-256 | 1384e267a3644448f2126c80d7b8dd19d05b08aa3c266f2ad1e8665eac372569 |
Hashes for liburlparser-0.0.2-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88d40dd2364b22b9e4012c87068153dc34033d6f4bcc8f8b8086b4b048ed75a8 |
|
MD5 | 5207e418e7ca9b5514044b95bc61c6c2 |
|
BLAKE2b-256 | ad6fab62cc177419c6325a8f173213183a0ae4b9dbe34897632ec16bedcbb61c |
Hashes for liburlparser-0.0.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09a11b068898f8351c1c11ee89db2d6f5083c53aea92e9ba44a53da49b4071fc |
|
MD5 | b63bf150250a4e96c1014409d4988d9a |
|
BLAKE2b-256 | ecccecc99289b17a59e00b364f149e5d90a55ddb7d88b36c3cebb09e805b6c10 |
Hashes for liburlparser-0.0.2-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4be4cdcda61aa32033d6603abc1e8f540cd382de5943806bd72c90d98662e167 |
|
MD5 | 2c6d786946d841cb417757f7ab664b1f |
|
BLAKE2b-256 | 9792f6440b64d6cb07fddd8787dcb9f4a2434494ad0681baf0cdefd3382d22df |
Hashes for liburlparser-0.0.2-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f55b6cb0f4e79595e311160a131ad85dd6ada53a9952b081ea4079d2ccc3912 |
|
MD5 | 3a0ffa194923a12f758e66ef713b362a |
|
BLAKE2b-256 | be57ba9a1fac62592806ea58cc11b0d3807b9bc30ad3c309035b147d9bd64202 |
Hashes for liburlparser-0.0.2-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3921cbdb57447db9652f69ebbd0e9eeab6ef4ea2ca50499c23e1fd1ffdeec91e |
|
MD5 | e3dfa6a799e664e2d9978994e31f4074 |
|
BLAKE2b-256 | 20d0bf519972435d86d82fa448ad38a6771076b68fbaee7fe4d5366eb4cd212b |
Hashes for liburlparser-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae6b45d6626d32dd6a400ea9776f44b248b9c24006e234bd19a4f727d88f6a6d |
|
MD5 | 4e2670c76a768cf7f89ce14bedd04db2 |
|
BLAKE2b-256 | 71404bc20bddbf40e2d37fcc2d201a42ee38043a853ae42240e35f2ccdee8ba4 |
Hashes for liburlparser-0.0.2-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f67e4ca8b838e91b27ad0557fd35894fae70755cbc4744ee0b9a01fd4cc6fcf6 |
|
MD5 | fcaf74617dd2223e5b95b87a5330dde8 |
|
BLAKE2b-256 | f50d9ebf0b43214a5e0f9c1a5c0db6052f845d29632c93f26f922506b4ee9020 |
Hashes for liburlparser-0.0.2-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 523a3f38218585d7cc38356be44c83c70424f1f902a5ee362fc54a4f4bc46737 |
|
MD5 | 76517b2e4e2d54ce3290b5055be7dccc |
|
BLAKE2b-256 | 6336cd3829b627f2f12107fb5ba21bdc18da6ed8b2c4f42e36c1dbee9cc25a37 |
Hashes for liburlparser-0.0.2-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d85abb1e900bbbef2034798090b6bf66433f3109204f20317827e3da035bd23b |
|
MD5 | ad7c97627825d6a36ec045e72b421c77 |
|
BLAKE2b-256 | 569bfba5cea0c73dd50d1a4dbc90f9b95eff814bdbcf3334e8ffb4179086a350 |
Hashes for liburlparser-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 891490f76dc01335137e4b7bcd200c708d445b5d6905274544f63d582edee4a7 |
|
MD5 | 77bf8666f5b54d17d2d776a9643a954f |
|
BLAKE2b-256 | 567a5911ff6a927078697f97c8635c6324e1bd730a307f5f5e681eec98a307ff |
Hashes for liburlparser-0.0.2-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0adc56ec3fc7aeadd5992126237ccd1ea5b5edea975a2a7dbd102397ddb6159 |
|
MD5 | 1fbce1a2506c7f63b9429cdd53af7bc2 |
|
BLAKE2b-256 | 45a467df93727c6f17337da161990fcbeced3018238c96c65e3c0617e5ce8d5c |
Hashes for liburlparser-0.0.2-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 566475c7216bb3f85ab3ae9162a6fbbf26df8c47671852245832a65741dbc222 |
|
MD5 | 296d9ae5454dfcd79dac16d6858dc534 |
|
BLAKE2b-256 | 5f73225e5a6c4da09518fb288c95865d7dc777ead787341818fc25347815e963 |
Hashes for liburlparser-0.0.2-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a56e0bbb07029eba00f506609f8b2d50e420a7f84c6c6f9c3356fa3629b1bb8f |
|
MD5 | f48c20fdbfc91cf7da718bb6721aa5f2 |
|
BLAKE2b-256 | 3f985e44c1627845f3a28c56050071a2ae4691a8ab4c543e7455410e3bf8d1dd |
Hashes for liburlparser-0.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e875f4762927189d2ca7eedf999947b05d58ba82abe5f55daca0076f191a151 |
|
MD5 | 5272725d4ee566411337084d8a6803a1 |
|
BLAKE2b-256 | 39f60d9141389af54d6b3863b339f1e0b4124e652b0c30c58f65a9dc0752f602 |
Hashes for liburlparser-0.0.2-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f3887b11b81fab8d4501d74795fdd63ee078d2dca3d828e972ad5202bf76033 |
|
MD5 | 4564fc51a5912669d6024f8b32366268 |
|
BLAKE2b-256 | a26b66fc9d2d733ba63295a4d324b81ff33dd13efd57630c07318de4d717b342 |
Hashes for liburlparser-0.0.2-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4d2851cbdda5d2ec6d07d2c2f0a72373804baefae399ee78d49691ad6cf7561 |
|
MD5 | bab5701b3a856d58d77efa43eaa42bb8 |
|
BLAKE2b-256 | f0b23625d5d3749c2aae063141c91de97ebfc52d18efd6919f2e9adfa5429be4 |
Hashes for liburlparser-0.0.2-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2382bb9292af0829a96749f16c432bb2b896fe64dc4e49f3414097a3ec07235 |
|
MD5 | b32039daf1cbe074b7f02f0d67b0d501 |
|
BLAKE2b-256 | cbab11914bada746fe750166fc2721ae35ab39870eb60b756d2e6da95dcaa6fa |
Hashes for liburlparser-0.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6509cecd16dd0eb41f800f60e03cd7d8ff87a5f690a60fdf5f80236b3852e5a |
|
MD5 | 15a41f68044c59d1ca6f01bbb53a3b95 |
|
BLAKE2b-256 | 7bb6652b573590c2db1da7440e8adbb8650992ba2019288cd5555254764d505d |
Hashes for liburlparser-0.0.2-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ef84718ae900efb0b7df86c4dfc6a085f2fa843fbe0f0fbcc1d3b453b5605bf |
|
MD5 | e64000c65891dd5792cce93d1da635a6 |
|
BLAKE2b-256 | 52624feb6f2a37f4be8b86606e3f06aabb3aa7f39fa6c6a35807162a6e84a567 |