Fastest Url parser in the world
Project description
Fastest domain extractor library written in C++ with python binding.
First and complete library for parsing url in C++ and Python and Command Line
About The Project
Features
- Multiple programming language supported such as
Python
,C++
andShell
- Intuitive interface and identical in C++ and Python
- Provide two seperated class Url and Host for the purpose of clean code
- Also support public_suffix_list for known combinatorial suffix such as "ac.ir"
- Support unknown suffix like "google.comm" (it detect "comm" as suffix)
- Update public_suffix_list automatically before each build and deploy
- Host properties:
- subdomain
- domain
- domain_name
- suffix
- Url properties:
- protocol
- userinfo
- host (and all of host properties)
- port
- path
- query
- params
- fragment
Setup
C++:
build steps:
git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install
Python and Command Line:
Be aware that it required python>=3.8
Installation
pip install liburlparser
Or
pip install git+https://github.com/mohammadraziei/liburlparser
Or
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser
Usage
Command Line
python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json
Python
you can use liburlparser so intutively
all of classes has help section
import liburlparser
help(liburlparser)
print(liburlparser.__version__)
from liburlparser import Url, Host
help(Url)
help(Host)
parse url as liburlparserc
from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())
Also there is some helping api to get better performance for some small tasks
# if you need to extract the host of url as a string without any parsing
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast
if you are fan of pydomainextractor
, there is some interface similar to it
import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url
# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api
C++
there is some examples in examples folder
#include "liburlparser"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");
you can see all methods in python we can use in c++ very easily
Performance
Extract From Host
Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL
The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.20s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Project Link: https://github.com/mohammadraziei/liburlparser
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for liburlparser-0.1.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 626f840facb8e8efd234da1cd6eeff40199e38e6dcbf36ace54a9e211eafd395 |
|
MD5 | 87757732a4084f69e9dad44270f6e369 |
|
BLAKE2b-256 | d33b6be7a529321d82edf55bd12077e4bea81bea8543aa6490f2b1bfba6a3979 |
Hashes for liburlparser-0.1.0-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d7b5551b61a21926baf2497c0a4fd041c5798c68760219e560c17310ca3cb46 |
|
MD5 | a91cb7be8a343807aa2a177df0f3adbd |
|
BLAKE2b-256 | 8c717811eabc8f1cf44dc638fc1b4ec9bb3d4a44480e4a9b1b2a3e62e6f8e30a |
Hashes for liburlparser-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6eaaef55beb8a55fdb012e95d53ff4b2fc75375ae873dcf9d5f327a7b48cd41 |
|
MD5 | e34466bd51aff66070f869bf87eb959a |
|
BLAKE2b-256 | 9f232a7244da0beca708354ab7475581e8522278b74e10f34aeaec42562c79b2 |
Hashes for liburlparser-0.1.0-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79ccc00a9c6d989c183588218dd00f0a97ec1f639a383c99c74a91190c41a445 |
|
MD5 | fe62672b4434f6e12623c7f0b57f6e19 |
|
BLAKE2b-256 | 75c3fea06f403e79670ba41c3ed95244fc552efabc23fdc6779b17d740987553 |
Hashes for liburlparser-0.1.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a17caefa65c0eb5900ad82dc91c311791feabe35ea22944e83e3a59cca37d24d |
|
MD5 | 10df99bd0a4fca9d55d64c43af71d56c |
|
BLAKE2b-256 | 05e36c450a292676bac873d7862035fd7bcecff1f877c8b1539bf3b6678a62df |
Hashes for liburlparser-0.1.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42ac161ea00d955633ee7aab8cb3f2536c7b2d9a6394f610fbee47017cd2b86b |
|
MD5 | 5c943c13489cd9c8bcf14765fbd11544 |
|
BLAKE2b-256 | 44e106acb00a5790ea5e2c34f8543217afecf431fbe089ce3346e94e7f98632e |
Hashes for liburlparser-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a50d157f1569ad998762ca496755d45837da5faa41be0b48c9e44146cb74ee3 |
|
MD5 | 652d67d927aaafd552e90712f9e26dfb |
|
BLAKE2b-256 | 5bd41286b0a08045468cde6796a7165dd5d38d18f9c45264b4a7052d5ff1d5d2 |
Hashes for liburlparser-0.1.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6d0c5ad0349080024f4d20c53e330f722c478d450f3d502d1129099d1b48230 |
|
MD5 | 8381b7f36726347545556eeb05dea0a9 |
|
BLAKE2b-256 | 0d67e1b6075757577c9c2655a615eafdb11981824df4e5ff601ea7bf6a67766b |
Hashes for liburlparser-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b8a2dcd2e76f38c6f715d100b698b830052a17bf6c5bf7ca75feba0fbeacde8 |
|
MD5 | 3e7493a968c5703778cc31e7500a76d4 |
|
BLAKE2b-256 | fad0d74a61fdb00e1656c1c1995e5ac1f65fc81b0b566ca6cfcde7029bda2c17 |
Hashes for liburlparser-0.1.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af1c93f10979c9ed220d1798ae99885f8d365749ba1eb60e8ead9d9e6228d291 |
|
MD5 | eb1744476d86591ba4600f05e7de7440 |
|
BLAKE2b-256 | acf78f82b506224188a7d5344ffc35c2ca2ae94ba568f2c9e5fa90ad361f474f |
Hashes for liburlparser-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a36b5f59e9ec2deed628106ae91b7bd4736ad6ee9756c3d1bbbf69316543d41 |
|
MD5 | b0a740dcd0b2ab32fc21bafdfe32f2be |
|
BLAKE2b-256 | 88f7d7da8aaa9df65e33582f58156db7a90eb9d0a72dd214d33fe03582eb0e5c |
Hashes for liburlparser-0.1.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 931915fc2abd205e8228fd11ccd5d6738384b30b18f54b9ef5393410f58dd4f9 |
|
MD5 | 62b06fc37750ab26774ceb40a3d1e971 |
|
BLAKE2b-256 | b8d6954f4d0a0356e82b0f6739839fce3c90f3412d1da18d170e3ccd33cc73b2 |
Hashes for liburlparser-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd63a1135cff94a055df02ec682be0fd2774a0065668007bca34504059572b9a |
|
MD5 | a3f9a93a6a201ffd6072ac9d632c0391 |
|
BLAKE2b-256 | 11b379392f10d01e6dfb6b479eefad178b52607ab6ab58ca2bd224a5e3a3de44 |
Hashes for liburlparser-0.1.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d4f562b5f842960c586cce323d26265ba1f82a67579eb15042cfe20238485b7 |
|
MD5 | 4faf58a2a93126f140a7df24c072d1c4 |
|
BLAKE2b-256 | e1a3f93fabd1cdf64d312a25bad67d3047e1b800344de4c58bdf0914ae817832 |
Hashes for liburlparser-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52e1939ae79a27c05c02ef6c67f6fb225d1b73bfe66d7a7582e138d2cf816833 |
|
MD5 | ef5a224439f846d0f816093d7ba38848 |
|
BLAKE2b-256 | cad393cedf06eda6d7ada3185e8c4603d4a8b06d08d1a87abc364e63df9d5f84 |
Hashes for liburlparser-0.1.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 787455834c82c983431ac24b22334334ea04e19ce2e1f214fdd1c5ebde16eb2e |
|
MD5 | e88b25c23533cf66f0f1cd7b2010b90b |
|
BLAKE2b-256 | cc0351da0e14f107649e2f2a5673582ecd390c12ccccef09c199ec8efa41e1a3 |
Hashes for liburlparser-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e0d5178fc6cdb440344e79674b12640e5813d4e8fb279abdfed784c38a2a0f3 |
|
MD5 | e06ae5a41b9fa1a0b00db3fd87742a98 |
|
BLAKE2b-256 | 60a4ace8cfa1f644e3bdb5dcb61d3a0f85d5e4318a9c47b28a016d25d71f6992 |
Hashes for liburlparser-0.1.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f07f91584ab4c23dac718588618ce38e74fa4b7c6518caa6414310c55e9ad143 |
|
MD5 | b63ffe32a5249cc35c6cf8973b704dd1 |
|
BLAKE2b-256 | e1568f83ca0cd3236edf6735805de6e0682f0cffa3bd239a287eddda19ec9ad5 |
Hashes for liburlparser-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb16da11b3a3a704c64c2ca3fbd03443b7520fdfd8945c4c442a91c6253250cd |
|
MD5 | c562093d6fc28bf67ec1512b3c1e2109 |
|
BLAKE2b-256 | 3f7fc83ca161cce61c4e76bd5b261019a49d261362928ab3b86d7569ae9b8fa9 |
Hashes for liburlparser-0.1.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c2647b42b20dcd83689c5e8b08f1c45f80a468eca4f93416db994fc1d5859b9 |
|
MD5 | c7750022174029e7bd3bb010b1e05a05 |
|
BLAKE2b-256 | 57d2195e43ec37adf2867bf061de71d97472830c5555c8f0c97b4a63cdb98bab |