Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
Features
- Multiple programming language supported such as
Python
,C++
andShell
- Intuitive interface and identical in C++ and Python
- Provide two seperated class Url and Host for the purpose of clean code
- Also support public_suffix_list for known combinatorial suffix such as "ac.ir"
- Support unknown suffix like "google.comm" (it detect "comm" as suffix)
- Update public_suffix_list automatically before each build and deploy
- Host properties:
- subdomain
- domain
- domain_name
- suffix
- Url properties:
- protocol
- userinfo
- host (and all the host properties)
- port
- path
- query
- params
- fragment
Setup
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip install liburlparser
Or
pip install git+https://github.com/mohammadraziei/liburlparser
Or
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url and host
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.20s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link: https://github.com/mohammadraziei/liburlparser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for liburlparser-1.0.0-cp311-cp311-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71e12f0bb7002ab7bea9571fc0197e46fcd7db0343d5c45b05fe6f353f9d2185 |
|
MD5 | 274eba67302fbfd78e39614fcc2c5898 |
|
BLAKE2b-256 | 7bd6f5d27186102802d8383bf97772e6bf24074eb9561fe7892d2733614e8b12 |
Hashes for liburlparser-1.0.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a381a3e5f5d41031adea9242724b20810226b337448bcedeb10cb389dcfc72be |
|
MD5 | 8c62d26107b2a59350e1c857067722b1 |
|
BLAKE2b-256 | 8c3d47ba659697e3c69498af8b8062441dff460ac2f3f9a92fcf24c197a83e7d |
Hashes for liburlparser-1.0.0-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26e7448bfc7d45fa4285210e1c9366e8f0f8d96e3f3ad50ed146b67edc1054e0 |
|
MD5 | 47890485878444915ea52525baa40e8f |
|
BLAKE2b-256 | d9a77beb35b5ccb0321a102dab262f550ddbf9856e5d454f0f7a82bb3f83cc68 |
Hashes for liburlparser-1.0.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef28a3b640b4fe7f5f03aee0f2739040d6bdf4dc817e3209ab11b9fef7d429dd |
|
MD5 | 57d990cbecfdf5c81ec18465efbf96ed |
|
BLAKE2b-256 | fbe9754d31bc259fd7bafb09180bb9c8f9bfc701c19e7a88912bdb0a4cf93935 |
Hashes for liburlparser-1.0.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65640be8717664142cbc90a98ed4a78574ab8e812a273d12f31b5c1e11edf988 |
|
MD5 | a10485b3999609719d30060d1ad615b7 |
|
BLAKE2b-256 | 1c8002f9582bc81cfbfdf4941ca8b6cacdc43514fefa95767d5c238593a44159 |
Hashes for liburlparser-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ff8b857b0f3c6a56ed36fe3b379894b39e0701c15d8501f779ba2a17dbb6df9 |
|
MD5 | a6a2679bfba69332b44355855261735d |
|
BLAKE2b-256 | 7e124a17d3570ac9af87d853b59c11a1402e9aa1454acbb91ba5c159ff3ad39f |
Hashes for liburlparser-1.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c267875c46b9224683a1ef0a1e5d141eda91c61a96402454291b9aed063e06ba |
|
MD5 | 75dd93339c7cea940cb9e3fd44251ad9 |
|
BLAKE2b-256 | 366fb0296f7e461b9126685b143e716737adb4c976ffb97a03a862af70e99eb5 |
Hashes for liburlparser-1.0.0-cp310-cp310-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80faea3cc6c555a5a5a80ce32074c5bda31975747602bbd9abe6d65ccc0b6db1 |
|
MD5 | 2f56a0a11e0573ef6fa4b45bef7750e0 |
|
BLAKE2b-256 | b3fed31019dec01a94eb8f2263c6b467da56003bcd9a1f50cd1620174124cdf3 |
Hashes for liburlparser-1.0.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d00d948611317773a8e201a7fcb5f08f53f0ba95b1b0fdb64dfb3c2c1eb2f7e |
|
MD5 | 1d5fc8e950ae80e7d1b24be29fc0bcb3 |
|
BLAKE2b-256 | be76bf2a90d2ffeae457af6a3a7fc5c29844f3a4f31b338941d84125e5ab9950 |
Hashes for liburlparser-1.0.0-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ddb7e63a1abac58fe8b46934afaf0d0d94252f5dadb4b360f6aaafd151789c5 |
|
MD5 | e916f0d776e976f0a428a2be81ba2d73 |
|
BLAKE2b-256 | f78066b7e5ba9d451fe8227c7d912c5f8b37a2823b99a1176a1fa06380b5a6fe |
Hashes for liburlparser-1.0.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48174cd949364a18867628d8b03f9d16b2e250d38e773abb8db178a9130d70f0 |
|
MD5 | e043402eb79a97bb8dcb085429906bc4 |
|
BLAKE2b-256 | a39773f7edcfb8a2d6cb13607d3b79737a931d8b02078363dd95376d57e5807c |
Hashes for liburlparser-1.0.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8951a864a2f0a07db74e0f51362f58cd3d05733ffe874cd7d68c79f60fe96084 |
|
MD5 | 64e4dad42f9cdea06dd15d6216031994 |
|
BLAKE2b-256 | 95808a93764f3a973062854307a68486aae32fd277e1a8af8490f140b8374283 |
Hashes for liburlparser-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f158b61814f6968a1a772b8c355cc966370da6ddd51d43a7e6144631d3a4e809 |
|
MD5 | dba6114f3d894a50080d675d8530b544 |
|
BLAKE2b-256 | 9ac567ea18d3a0ffcd1641f61c19bcf959be8e7f4025f6a3804fbbe6b9aec958 |
Hashes for liburlparser-1.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbe63f63c14c9135aa1ebbb671bf70c7b6583083a9b15acb363229e4d4812309 |
|
MD5 | 50c317760e07fca959e18dc5906ad415 |
|
BLAKE2b-256 | c97cf64a5e4da79af8504edca59cca64de40bae791f7b81dc11797cb1682ac7f |
Hashes for liburlparser-1.0.0-cp39-cp39-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f2d172da42623738b2e010556bc316b1e6b485a1dd970b62ab553915ce2d5d7 |
|
MD5 | bce752dcdaf77a5f693c366c187222ee |
|
BLAKE2b-256 | 18614181ade27dc5658df63cbe5b050b64b977bc6c24656a6b477d2f120e91de |
Hashes for liburlparser-1.0.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44d9de372c7fba2c911dccbdd4a28b20c7d85eb057b6d4661237adedfbbf7800 |
|
MD5 | b94247c34938211db0eb064c21b7af1a |
|
BLAKE2b-256 | e77e016d77c9f525dfe6f9ce1239983e5e10571fdd4413b68909b8f5d42ca654 |
Hashes for liburlparser-1.0.0-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a358e060a6190a505a6da86c9f241bb88733013938a92692992afae552b5f80f |
|
MD5 | 26f21e9a0092c81469a3086a49d3c9a2 |
|
BLAKE2b-256 | ecf491629f30afaf93e089ff8c108855d193ec679e139dc86a2456716322daf6 |
Hashes for liburlparser-1.0.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a15c4b9ef716b4c13bed051850275fd53e38149a5469ac864ab07b8315c9a89 |
|
MD5 | e403a296c6d22287860625d1d4c59c73 |
|
BLAKE2b-256 | 968beeeda8d00bf767b99d316e5e94db3628c5b3a225df961bcee44a422c7742 |
Hashes for liburlparser-1.0.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c806773f66a1bf6a43277e67c456e5ed3e811ae74578dca556b2a408b718425 |
|
MD5 | 16709e79cfb71758c497424bf916215e |
|
BLAKE2b-256 | 1d110819672086a0e5e04de8f47d2e0c97686dbd2cfbd7aea2c0e1c561358b69 |
Hashes for liburlparser-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a342f1dc16cdb45e709b9f65c6128260d7fe6368041ffbb6500ec8b644e514d6 |
|
MD5 | 935fe919fffed3f9e71edbfe76602b53 |
|
BLAKE2b-256 | 366888d76aca23719778ff7e054fa1afb886a2ada069b42bba5da890203c54e9 |
Hashes for liburlparser-1.0.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0209afd3deecce53db83128e14a231bdf68735dba61cae05e7a5fe65b0727c3 |
|
MD5 | dceedb5e7a9be8e9cf5e368b763b9b09 |
|
BLAKE2b-256 | 3c46ee1aa41843ff168ba2e16e826a4e5a4a0e2fcd0194b1a99b53f096d4fb8f |
Hashes for liburlparser-1.0.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1bdcff182ccf0d6ee61a5883a6bf1889636e357c56be9b7d6f4a9fa661d81a8 |
|
MD5 | 1a45fee443fcc31911b0fc5202e16315 |
|
BLAKE2b-256 | 8f15b7e42ff750ff375d83f2d1a099c720590df76adbf3f96fce20bec6e77f5e |
Hashes for liburlparser-1.0.0-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0239c42bb125fc5b25c521099c8ddbcbaa2d5d70d68644322708dbc9846ae50f |
|
MD5 | 59fff3233e06c0de190b468cf0433ca4 |
|
BLAKE2b-256 | 229ec65cf1c18b85441036ec5c01b3303bd71f8c0d3535b2d8baeaded7e8f98d |
Hashes for liburlparser-1.0.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bdc10e75226db4110cf6bb608a81a25c19b4db5e6eb46eeeb76da2aca048aa4 |
|
MD5 | 6f45d09876151809acb54713b2d428c4 |
|
BLAKE2b-256 | f718eb777e125a981022e3b418a52293219473f7f8f26d69a7af8c1972f46f05 |
Hashes for liburlparser-1.0.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 495ecb130deabc7845ab397231f80624a15c43e8a8f44976b18c55a6aa6b90b7 |
|
MD5 | 789a257f5d2903c10ea7b1dba5d08b7a |
|
BLAKE2b-256 | 432a554e9b785e7a57be5d868d442a4b43a8c27e19c867d00c82b9b1a9eb8d92 |
Hashes for liburlparser-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62060427082e1ccf900d5c5a37aa503663d06d491b8c77a5dab062781005b595 |
|
MD5 | 55b263b018122d8ffc894c64bd03a083 |
|
BLAKE2b-256 | eab51d4e2014bd94fb42119fd4aaa1911183d09380fe2e0c102e018fb77595a5 |
Hashes for liburlparser-1.0.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5c0883aa679e7b95eb4f246ae3fe0f3f62ec8c76b7b45811843cd7fbcdb5acc |
|
MD5 | 20cc1c717bb433a7e6798271dbb07ce3 |
|
BLAKE2b-256 | 70e37e21415ea5c4994efb60c54bc9ad2a6f3cf23b43e049769ef5a9849068f6 |