Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
Features
- Multiple programming language supported such as
Python
,C++
andShell
- Intuitive interface and identical in C++ and Python
- Provide two seperated class Url and Host for the purpose of clean code
- Also support public_suffix_list for known combinatorial suffix such as "ac.ir"
- Support unknown suffix like "google.comm" (it detect "comm" as suffix)
- Update public_suffix_list automatically before each build and deploy
- Host properties:
- subdomain
- domain
- domain_name
- suffix
- Url properties:
- protocol
- userinfo
- host (and all the host properties)
- port
- path
- query
- params
- fragment
Setup
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip install liburlparser
Or
pip install git+https://github.com/mohammadraziei/liburlparser
Or
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url and host
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.20s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link: https://github.com/mohammadraziei/liburlparser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for liburlparser-1.2.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19abf39be10298c1115dee71cebd28fcf8ac88cb954afc67215d82087a0091ec |
|
MD5 | 27b87361218075771f3bed490fbaa5a2 |
|
BLAKE2b-256 | 31f36a4ebd43829d9948fcd97230cea8d2cd2d76ab9d9812edb95e9defaf67d7 |
Hashes for liburlparser-1.2.1-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af5025ce70753356f01536482f4dfdeb656ebf404b0ed8477b38ea97f6fb1a0a |
|
MD5 | f799b1654b91891c9a52059bb38696cb |
|
BLAKE2b-256 | 5c6a8f018bab4acf5faea7969410e26bc36457a4f1e6dfbe439da9f0fc6caa3f |
Hashes for liburlparser-1.2.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bf59daf207ced6a9df33544ae6a05da8949890f77c0015f6daf4db0654c5399 |
|
MD5 | 63512bdc318f80df23f92fb5224f27d5 |
|
BLAKE2b-256 | 90b6518fe582b221182073934a62068829f7e4fb785a295f1bec97b3dfd7cac8 |
Hashes for liburlparser-1.2.1-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6560a697a0d684b73e61a82409795d64d41806e106265f410b36305af09c35ed |
|
MD5 | bce6ceb4f312bfec8f91f54e80ae384d |
|
BLAKE2b-256 | 621d7f3c1e0c01050b239b6a7ac0476addb216bf929d1b41d96c27b033056913 |
Hashes for liburlparser-1.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f46e951384227c85c7021594fb380abd5abaaecb7326a68a112320364994d510 |
|
MD5 | e950d9b644a7477e39d36db386f0d12d |
|
BLAKE2b-256 | 68f2010ee90bce3b977a5a3cf21acd368e45fc0d0eb97261dc6299ba26a82816 |
Hashes for liburlparser-1.2.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66a5e7a51e926becc16045c9d4b6d678d7cf91df622079cbf9d99aa4d6616aa7 |
|
MD5 | a8f4eb2baa7ca60c3c37d0fd368f4b6e |
|
BLAKE2b-256 | 9be2debcab810e0b73b2ef79fd200c9549e5381c4d528550b1f93ddc3ea340d1 |
Hashes for liburlparser-1.2.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 327d18494201ff200f09cff21f0c0b25d15870967b417cdce24c75e2b3a1db74 |
|
MD5 | b1a148e28d7403a2c3f0c6016a303d2b |
|
BLAKE2b-256 | a032c618e3d59341b6966aa60d7e8bf2dd12b7bcd40c4e65e4c533dd22240397 |
Hashes for liburlparser-1.2.1-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d63e829cc65bdf75d0605e90ecbaf63de7f95a550d0d8493622092b110ae065 |
|
MD5 | 24c723569b958804a825088973295fe2 |
|
BLAKE2b-256 | 072eef211bbd006b87aca1da92077b6364bc2a602cb61659a7c64074e35de833 |
Hashes for liburlparser-1.2.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a453ba2049ef6e9511c0897dccd21b067238fbe09f617bc90987ef925d8108a |
|
MD5 | b4233ffc9be060fc12c01f48a2237ca1 |
|
BLAKE2b-256 | bde79607cf46b34f6715e8cdb90520ab7e779518bf32fd72683d0b3615672098 |
Hashes for liburlparser-1.2.1-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8468865d82287ff9c0a862c860a10f78597b8d9d31fd10f62d238037fbe8ee31 |
|
MD5 | ae8522b5d5e4d170437af0b5b81640b9 |
|
BLAKE2b-256 | 6ed9d1e91b16be11c23eb64d29d5d811628d5d49b86fbddf5b5d794223289771 |
Hashes for liburlparser-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e6d3c78c59541e82de3c6f7d432cbfdb6c9d74bfb8db19b10f38fc12c10777c |
|
MD5 | 644b186f773cbf835bef9a21b7b34a6f |
|
BLAKE2b-256 | 9fbe6c1e53a9890fedab0832729b81f2a4581d6e93526c90a604000b6a8991b9 |
Hashes for liburlparser-1.2.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df848a964de09db15f9bf9d3059bb7f763530f833e8d5e1e3e18f50a101c2da2 |
|
MD5 | 32b85dc532d5995767b8939b6dd0df71 |
|
BLAKE2b-256 | df8119d2987a19a3e0e403336b1ba991481e31daf2a82a3a097ee08680fef345 |
Hashes for liburlparser-1.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d81bf01f78b03bba722453b59c21c29da6c5054df1a3c45a525854a914405003 |
|
MD5 | 192116e53ae793db5f3048bd9088c18f |
|
BLAKE2b-256 | e1836dd6f5e28bf0489bd8c8ceefbda8096f47ae6556c3b6021c8a0a54e52f5a |
Hashes for liburlparser-1.2.1-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddba0b8d5dd52cbf97d97c67c05990681c72701bae61cae879e13f2663c0e47d |
|
MD5 | bf385b7e9c05fc0b7fe1d301f652b0c9 |
|
BLAKE2b-256 | e51996dd6e1a104464d016418b08d0bf6e10c289b55fa7cb2471c7619775e8ab |
Hashes for liburlparser-1.2.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 569e7d76a59c0d0971ccb32fa7d02ad33bb2799ab86f9862431735e09d4a93b3 |
|
MD5 | 2d30a5f2102d1178f52bd1f481eb8ca8 |
|
BLAKE2b-256 | a6e08a49e9d43f20d02bd15b901a0151056b853117859617454bbb8b974f9a41 |
Hashes for liburlparser-1.2.1-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa575b6670d533e6c0274fb08003b3ae8bd18fc0c6f843e288b6b35d1065d552 |
|
MD5 | a036c8a6fb36a14196d23f415f33b5de |
|
BLAKE2b-256 | 192477bc8d401c1224eb80eef21374ce1e2b7ce0f0b87e35fa3346216adafc8b |
Hashes for liburlparser-1.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3a74e072a404ad01cf92f924bcca49abc33f7b1ae1b7655647a423290fafa0c |
|
MD5 | 8f5f2b2a853f9c8b5eb7b719156dd74e |
|
BLAKE2b-256 | 5dc4dd78779febcd18c65b12e254157223e71a8e515fb63447f2c81199dd8fd3 |
Hashes for liburlparser-1.2.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcbd6a8cd985cd9bb7699c2e0c1da06258dfccafdb1231c714787c9024896f2b |
|
MD5 | 3ee2a906237fae6ad1aba3f4caea87f9 |
|
BLAKE2b-256 | a2d457406e2399cf91f98b7abf431c1540cbeb874b7e7254e1dd111198d304a5 |
Hashes for liburlparser-1.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 170b147d808c77ecbacd293689f64e26aa9c9352ec2a1b50744faec65784b4aa |
|
MD5 | 934b38e772c2a3d217aaa889afccf848 |
|
BLAKE2b-256 | ba89edc2cba51d86959b2e10cd0da4a0a465bbf8aa9e648d5f9bcabb465d29b9 |
Hashes for liburlparser-1.2.1-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 974ecb9ca589189c02e03aac3fd862ea1f76dfaa4e73f5ae55388e1ddafcc165 |
|
MD5 | c603aabdde77fc040fdaff9b74816ae8 |
|
BLAKE2b-256 | 250c6e97c69f59785290045b7d6be1ba9c86cc4b69469f3f585a15682321bcee |
Hashes for liburlparser-1.2.1-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20304bca41bdc7555509dee0ed4b1c47f1920b5a5ecbe0d3382b9d6a34fb3370 |
|
MD5 | d9f41d6aa31a825c473502b4a04cf6bf |
|
BLAKE2b-256 | 2ebf4effce88189e5c16503940243089459d4b5b3ea7391a615d966ae69dcd44 |
Hashes for liburlparser-1.2.1-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dd458d29225856d1a82add5dbcfa57a8f0755553139b7a0b0c665cb858d8ed4 |
|
MD5 | ce8a1a0e6c139cb8320b94fcf6d0b740 |
|
BLAKE2b-256 | 9c4d96371a00f9fe6ef2e2ff1210e75bb0ba05b715252d33c772b8b2868f523f |
Hashes for liburlparser-1.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d94d01c50ad82b7ec05fcf2343444ea4b7f4b8d17e99471ebf0a60538c9f1d9 |
|
MD5 | e90e6ad52163f18a73b0c3a709112dba |
|
BLAKE2b-256 | 9508b98e314d86f4ea6e15d791c4ffdcea204615555cfa0b8e00fcda0369d829 |
Hashes for liburlparser-1.2.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77f3a8b9914264f105d1b68cd4ac90030eec31f3c1b35300ac7c59d5a120e336 |
|
MD5 | f0343c13db9ac560cea15189a965b7a4 |
|
BLAKE2b-256 | 2bfd4240b824a618eadbf29f6c7603a62f87bf19c3d1f292d5dfd7e772f88fd3 |