Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
liburlparser is a powerful domain extractor library written in C++ with Python bindings. It provides efficient URL parsing capabilities for both C++ and Python, making it a valuable tool for projects that involve working with web addresses.
Features
Here are some key features of liburlparser:
-
Multiple Language Support:
- liburlparser can be used in multiple programming languages, including
Python
,C++
, andShell
. - It offers an intuitive interface that remains consistent across both C++ and Python.
- liburlparser can be used in multiple programming languages, including
-
Clean Code Design:
- The library provides two separate classes:
Url
andHost
. - This separation allows for cleaner and more organized code when dealing with URLs.
- The library provides two separate classes:
-
Public Suffix List Support:
- liburlparser supports known combinatorial suffixes (e.g., "ac.ir") using the public_suffix_list.
- It can also handle unknown suffixes (e.g., "comm" in "google.comm").
-
Automatic Public Suffix List Updates:
- Before each build and deployment, liburlparser updates the public_suffix_list automatically.
-
Host Properties:
- The
Host
class includes properties such as subdomain, domain, domain name, and suffix.
- The
-
URL Properties:
- The
Url
class provides properties like protocol, userinfo, host (and all host properties), port, path, query parameters, and fragment.
- The
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url and host
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Installation
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip by pypi
pip install liburlparser
if you want to use psl.update to update the public suffix list, you must install the online
version
pip install "liburlparser[online]"
Or
pip by git
pip install git+https://github.com/mohammadraziei/liburlparser
Or
manually
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.10s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for liburlparser-1.4.6-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 293641d5e10221bbe4aad8601e3eed3aa8ae8601c62ad54d8f18ba2b29c75a34 |
|
MD5 | cabccc1fabeb74d524eb4f2c2d158995 |
|
BLAKE2b-256 | f8e151551707e744b8ec77dd0ddd4f5cedb0fcae28dd402246ad892b370a4eb2 |
Hashes for liburlparser-1.4.6-cp312-cp312-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edc2a654c907e5180417cf073e53062ec552fc1a8e8fa7bd23b1058f3f961a6c |
|
MD5 | 51418954e1f59cd0d8173240e2479f09 |
|
BLAKE2b-256 | fdf6bd1a53484ad5d7b23bd780f28b50c77c7da6e4135bb7079e0bdfc5ae4050 |
Hashes for liburlparser-1.4.6-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8062dfbb1ab00c2b97b946fdf97eb53ea9955ef6b058023c9e1bc3d427d92de5 |
|
MD5 | ee53089879ecfeef52dc3524100310eb |
|
BLAKE2b-256 | df4a41a2126ca8e25ca547e53045f867f991261052c53a2b2c27139c948aba23 |
Hashes for liburlparser-1.4.6-cp312-cp312-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03922a7db2bb04c4c32250f0a965cf8fcddbaf783569246a294b61bad792eb77 |
|
MD5 | df9dbc5a4dec9e3e5a0c3d32d1498217 |
|
BLAKE2b-256 | c50847f2dbb984af94a8894a87bef4475a98e81c045fc2f481bae8aec2732d5c |
Hashes for liburlparser-1.4.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d410fd857a446bad9177737272de02cd917235e7606e76d25c0d52b4768d89d3 |
|
MD5 | 0de3636901b8cb3c7da80a4d47894ebf |
|
BLAKE2b-256 | a9a0a0b0dda34b08dc15a46eb4bac9ba814262211df51f673e47e1570125c20c |
Hashes for liburlparser-1.4.6-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7effcdeb4a5f01588385083afea8fa75e345685902fd3e2c6dc35e7e7a3dcd8 |
|
MD5 | ccd02c1af428f12c7a5bc2609d9cdbe3 |
|
BLAKE2b-256 | 35914da5e6ee7f1d44b67b349984fd4e8503340f3f303ebc77902d7ea625135d |
Hashes for liburlparser-1.4.6-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2255294576c3c0e1a20891522d86fe713e2073cf3e560c8a5a4238a6ca8e908 |
|
MD5 | 2da0835c0dd1fb6c36f875cc2e4e011c |
|
BLAKE2b-256 | 4169d20ff29da84607bf8f479a6f862e2119fb4f4f46df0eb3ffb078960ef868 |
Hashes for liburlparser-1.4.6-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96a108ea9880b28a49bf83827ad92560eaca995285d7550f94e2cc5675e3ef49 |
|
MD5 | 3efdc30a5e29fe6f7e45aa9851256082 |
|
BLAKE2b-256 | 2fe48f8ebf4fd0021d2928f54d9df39639ae8a2e6355fed91b9dcae0561987a3 |
Hashes for liburlparser-1.4.6-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bcd888931f39e9e04348bbb5f5d464e83d5f940c059f1bbddb1e2316359d34a |
|
MD5 | 93a73face0213ef695711d4a8962ebb0 |
|
BLAKE2b-256 | ab777f849e2996377edbe921de133efae7bf1636b756a6e803505ef4eb6a37b5 |
Hashes for liburlparser-1.4.6-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56491fad4af315e8457686405872a04ac26944654d5e260becba46bb314f62dd |
|
MD5 | 88138b9dfe0d38e9fb99aedbd3ae1d81 |
|
BLAKE2b-256 | dc6bea1f5e79e05f54b8a38dd090f3301dfec35a2e9fd9ab7d0b328dec1d75c3 |
Hashes for liburlparser-1.4.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 931598a853755311a464918f20dac5ac984da9fb3610d735e59d6bd969f24c54 |
|
MD5 | 2d71d4278293f4baf9c8d063efbeac44 |
|
BLAKE2b-256 | 8d85b04a2be798da085995a346cf1b5c5a0a2012d2fa92d6707ebe64b07717c6 |
Hashes for liburlparser-1.4.6-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cd81981d7fe51580773b1e33dec64d1635d1932a904d05c95f845632a905d30 |
|
MD5 | de5cf9420557bc3b1d339e414c502402 |
|
BLAKE2b-256 | 0018777440017c31eafeec8962f41bfeba73f0fa7dae489d762974cc4190417e |
Hashes for liburlparser-1.4.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a401069dfaf4ab80a8eb809b0ee56dcbef1933ab9ed5c9c5dd1369684cafc94 |
|
MD5 | faf5e47c0ce4c8fe53779fafa728168c |
|
BLAKE2b-256 | cbf1ab99d784c994ce011c0fbf690d36ab6309f3220fe0d71490f6a485c80237 |
Hashes for liburlparser-1.4.6-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b983f98ddee040bc6fc415b9bafb8c22c377dcb39c7b66e554f107bb8a40810e |
|
MD5 | 83f41556310f1d9779dcbe939c72de8e |
|
BLAKE2b-256 | 19046aa05b82efb9c36a8569eaecc1145f88d19a96da4c1b6caff60f3df0c67d |
Hashes for liburlparser-1.4.6-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 155882110c8392b6745511087c87ea0ba31b3473fd64a9e5facfbeb01bd47383 |
|
MD5 | f3cc65785e46d8991049be8767eaa44d |
|
BLAKE2b-256 | 03d704bf8650b3c7700b99e6e24347f5be657d78a18e534eaf01e86bbee0e9f1 |
Hashes for liburlparser-1.4.6-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f9823914192ff57cbad8dd5ab2921251530715d291d245f9ef219624f2137f9 |
|
MD5 | 90917afcacd39fe48a8c3f3668979da0 |
|
BLAKE2b-256 | 9812af48ba15d3d697368f856a4127d32b8112ae30791cdd9e3cae9f9fce1ade |
Hashes for liburlparser-1.4.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fedeeb1960663d146ebc346fc7dfc0a0ddf8c86ebf6abdda51794e4f404c5161 |
|
MD5 | 35c90ec6220505b932d60a685cf53d5b |
|
BLAKE2b-256 | a44b3d576dda789439d8ea6ac95d531bed28eee458c0e38653a8ffd245b2ecde |
Hashes for liburlparser-1.4.6-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a36a92ebac5d766021fc64b72eb2b823b5f6951dee7aebaf0ad1041f7d28245 |
|
MD5 | 949aa0a71aeb1873dc174f98bfb7b5a7 |
|
BLAKE2b-256 | 2190865872fc8d2afb473d0f2bac68c69c13f1829a841ab268c57a6ddb85a528 |
Hashes for liburlparser-1.4.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f7786f1e0ab8a8a5af4a199c82ca86f56ddd46e8927c027fd831ec3a7bf42d5 |
|
MD5 | 591022933dbb7a4ab4d885acce59e643 |
|
BLAKE2b-256 | c2346bc6cb397451646ee0fcccb57614ee9ff5760c3f2bc61e67ad57aedc45f3 |
Hashes for liburlparser-1.4.6-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24c2adfe1079f410e2a35731e494e25403f16596533414dda9caca4b610b6412 |
|
MD5 | caad6a786374c65879320c944eb6f110 |
|
BLAKE2b-256 | 90ea3ea002d1b24d82d17638839fd47eb8ab89513177e26ddc4e3eebde84ee11 |
Hashes for liburlparser-1.4.6-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc701d89a875a49b791c691e725091229ace4dbc960288aafbe6279f33389333 |
|
MD5 | 44662cacdb69b7aa1789118f7df691f3 |
|
BLAKE2b-256 | efdf0fe335187bebb7bbe905868795eb993257541e7ee8b6fec12b19f581ee74 |
Hashes for liburlparser-1.4.6-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ba603b5f1cef6a22d3332180bd13e665d8d60bae0c665ee71405fa79a2938ad |
|
MD5 | 9a833f1c4dc4748c5b3792c9e8542c1f |
|
BLAKE2b-256 | 56e8a0becaaafa7fb9bb561f9daa6f02b8a7f82765de4cd963d96cfac732706c |
Hashes for liburlparser-1.4.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 374e43a5d14c96dd76caccc4f12b5c4cbac5137ef51e6bd64559f87312c08c7f |
|
MD5 | 593dd952d91b376025cb23da72a31a38 |
|
BLAKE2b-256 | 55e6125e9ce58d5c53de05fa86a01564fb844b397b99cbf108e8ddc14af820c5 |
Hashes for liburlparser-1.4.6-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 966e527f0ce693d4d769e55592a73a7f5686f9762095a937438e7e4aa0baf8c8 |
|
MD5 | e2fb4f92d267be801e490498202366d0 |
|
BLAKE2b-256 | ac957cb6740f1296ca53f0154303e949b74b746a0568ceb7769b79ce86f303b4 |
Hashes for liburlparser-1.4.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ad44e207f799a85c4ff8ff74d8ce65252f12bed1c5e2199c1a74849c40e75c5 |
|
MD5 | 0e1db9bf88761b3c50956ff9e1de5834 |
|
BLAKE2b-256 | 74a825fa7637a03bad5aeb2d24069c721ff4e3ed4e604bb551daf2b6d118fd18 |
Hashes for liburlparser-1.4.6-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b443e15029515c707ae2b3762349f1ee75f27fa59fe2018e3c86392f7f29714 |
|
MD5 | 7f5e2846a6601608315dd28656c2d1c4 |
|
BLAKE2b-256 | b3cc300a0ee3dc4f5c5e39aaa4c5d98d6c00b1168b2be0b87acc59fffcab5b18 |
Hashes for liburlparser-1.4.6-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 282e0828d39bfa807576340cd381cec0ced634f45a4d7c1a50cdeefadcfa22b9 |
|
MD5 | cc9a8129029adf028360e1c1688d985b |
|
BLAKE2b-256 | 4fb8b5af16134fa7989aeaf43f3527e2e9a1912e72804a7e1c44e2842765d4d3 |
Hashes for liburlparser-1.4.6-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54aefd89186e214cd57dd0ab64ed829b245b420f06d20fe052e27030a01d0097 |
|
MD5 | 2186f7f5e8bcbce64c347abebe07ad55 |
|
BLAKE2b-256 | d1c62776e600110d7afc3bbaf9fa941d55c0ee1d352ffd2484bdee943ffa4413 |
Hashes for liburlparser-1.4.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e355a907275411c47a5624c1f8bfc75cad5494c226623c451c2c55bc78eab514 |
|
MD5 | ccb6641ff962c2bfb2bd75d31bc887bf |
|
BLAKE2b-256 | 15a0faf70fc83c2bb62056a107989965eb8256e620d5da8715d58f09b1a86905 |
Hashes for liburlparser-1.4.6-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73ed0255dd86d2293825c96174613471f686cc5fbec713cda6634c9f876ab8d8 |
|
MD5 | 54ce058118a0d33fae7d40e1bab8e7ad |
|
BLAKE2b-256 | 75d32936286d72f71ea2df022ab8d2b1c3155d73894bb0ed0ddec3518c1fd13e |