Skip to main content

Fastest Url parser in the world

Project description

Logo

Fastest domain extractor library written in C++ with python binding.

First and complete library for parsing url in C++ and Python and Command Line

mohammadraziei - liburlparser stars - liburlparser forks - liburlparser

PyPi Python Cpp

GitHub release License issues - liburlparser

SonarCloud

Quality Gate Status CodeFactor snyk.io

About The Project

liburlparser is a powerful domain extractor library written in C++ with Python bindings. It provides efficient URL parsing capabilities for both C++ and Python, making it a valuable tool for projects that involve working with web addresses.

Features

Here are some key features of liburlparser:

  1. Multiple Language Support:

    • liburlparser can be used in multiple programming languages, including Python, C++, and Shell.
    • It offers an intuitive interface that remains consistent across both C++ and Python.
  2. Clean Code Design:

    • The library provides two separate classes: Url and Host.
    • This separation allows for cleaner and more organized code when dealing with URLs.
  3. Public Suffix List Support:

    • liburlparser supports known combinatorial suffixes (e.g., "ac.ir") using the public_suffix_list.
    • It can also handle unknown suffixes (e.g., "comm" in "google.comm").
  4. Automatic Public Suffix List Updates:

    • Before each build and deployment, liburlparser updates the public_suffix_list automatically.
  5. Host Properties:

    • The Host class includes properties such as subdomain, domain, domain name, and suffix.
  6. URL Properties:

    • The Url class provides properties like protocol, userinfo, host (and all host properties), port, path, query parameters, and fragment.

Usage

Command Line

python -m liburlparser --help # show help section
python -m liburlparser --version # show version
python -m liburlparser --url "https://mail.google.com/about" | jq #return as json
python -m liburlparser --host "mail.google.com" | jq # return as json

Python

you can use liburlparser so intutively

all of classes has help section

import liburlparser
help(liburlparser)
print(liburlparser.__version__)

from liburlparser import Url, Host
help(Url)
help(Host)

parse url and host

from liburlparser import Url, Host
## parse url:
url = Url("https://ee.aut.ac.ir/#id") # parse all part of url
print(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())
## parse host
host = url.host # ee.aut.ac.ir
# or
host = Host("ee.aut.ac.ir")
# or 
host = Host.from_url("https://ee.aut.ac.ir/#id") # the fastest way for parsing host from url
# all of these methods return an object of Host class which already parse the host part of url 
print(host, host.domain, host.suffix, host.to_dict(), host.to_json())

Also there is some helping api to get better performance for some small tasks

# if you need to extract the host of url as a string without any parsing 
host_str = Url.extract_host("https://ee.aut.ac.ir/about") # very fast

if you are fan of pydomainextractor, there is some interface similar to it

import pydomainextractor
extractor = pydomainextractor.DomainExtractor()
extractor.extract("ee.aut.ac.ir") # from host
extractor.extract_from_url("https://ee.aut.ac.ir/about") # from url

# alternatively you can use:
from liburlparser import Host
Host.extract("ee.aut.ac.ir") # from host
Host.extract_from_url("https://ee.aut.ac.ir/about") # from url
# you can see there is the same api

C++

there is some examples in examples folder

#include "urlparser.h"
...
/// for parsing url
TLD::Url url("https://ee.aut.ac.ir/about");
std::string domain = url.domain(); // also for subdomain, port, params, ...
/// for parsing host
TLD::Host host("ee.aut.ac.ir");
// or
TLD::Host host = url.host();
// or
TLD::Host host = TLD::Host::fromUrl("https://ee.aut.ac.ir/about");

you can see all methods in python we can use in c++ very easily

Installation

C++:

build steps:

git clone https://github.com/mohammadraziei/liburlparser
mkdir -p build; cd build
cmake ..
# Build the project:
make
# [Optional] run tests:
make test
# [Optional] make documents:
make docs
# [Optional] Run examples:
./example
# Make install
sudo make install

Python and Command Line:

Be aware that it required python>=3.8

Installation

pip by pypi
pip install liburlparser

if you want to use psl.update to update the public suffix list, you must install the online version

pip install "liburlparser[online]"

Or

pip by git
pip install git+https://github.com/mohammadraziei/liburlparser

Or

manually
git clone https://github.com/mohammadraziei/liburlparser
pip install ./liburlparser

Performance

Extract From Host

Tests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)

Library Function Time
liburlparser liburlparser.Host 1.12s
PyDomainExtractor pydomainextractor.extract 1.50s
publicsuffix2 publicsuffix2.get_sld 9.92s
tldextract __call__ 29.23s
tld tld.parse_tld 34.48s

Extract From URL

The test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)

Library Function Time
liburlparser liburlparser.Host.from_url 2.10s
PyDomainExtractor pydomainextractor.extract_from_url 2.24s
publicsuffix2 publicsuffix2.get_sld 10.84s
tldextract __call__ 36.04s
tld tld.parse_tld 57.87s

License

Distributed under the MIT License. See LICENSE for more information.

Stats

Stars

Contact

Project Link:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

liburlparser-1.6.0-cp312-cp312-win_amd64.whl (272.2 kB view details)

Uploaded CPython 3.12Windows x86-64

liburlparser-1.6.0-cp312-cp312-win32.whl (242.8 kB view details)

Uploaded CPython 3.12Windows x86

liburlparser-1.6.0-cp311-cp311-win_amd64.whl (283.1 kB view details)

Uploaded CPython 3.11Windows x86-64

liburlparser-1.6.0-cp311-cp311-win32.whl (252.8 kB view details)

Uploaded CPython 3.11Windows x86

liburlparser-1.6.0-cp310-cp310-win_amd64.whl (280.9 kB view details)

Uploaded CPython 3.10Windows x86-64

liburlparser-1.6.0-cp310-cp310-win32.whl (250.4 kB view details)

Uploaded CPython 3.10Windows x86

liburlparser-1.6.0-cp39-cp39-win_amd64.whl (281.1 kB view details)

Uploaded CPython 3.9Windows x86-64

liburlparser-1.6.0-cp39-cp39-win32.whl (250.9 kB view details)

Uploaded CPython 3.9Windows x86

liburlparser-1.6.0-cp38-cp38-win_amd64.whl (280.6 kB view details)

Uploaded CPython 3.8Windows x86-64

liburlparser-1.6.0-cp38-cp38-win32.whl (251.1 kB view details)

Uploaded CPython 3.8Windows x86

File details

Details for the file liburlparser-1.6.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for liburlparser-1.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7a6e49ffcab9270061c938cfa008f73dec68b2602239e5f471e881e9dbbbfc47
MD5 77e2eb4a7e6580953e89337fd05c1a35
BLAKE2b-256 23cdd59918287570f7e42a44fa7442db7f3d836bced19bf392c625204c89d068

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp312-cp312-win32.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp312-cp312-win32.whl
  • Upload date:
  • Size: 242.8 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 7401a051d9203961373da40e04df4d2db56832d37443c353efa60b4e1e523b7d
MD5 4a82a6375a02128d3822f1a0b72d4ed5
BLAKE2b-256 80bbfda4ef37bff2ec65b061ddd76ba378f6298859df862149079b946a6dd12a

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for liburlparser-1.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0d204e57baf731d614a5eaa4fead6d37d42130c17e29092b10a7470cc83f9ff4
MD5 cac39efde186e853e68faa9a044cfbd6
BLAKE2b-256 8330bd16bac180a3143a7786a5b1aeda8a804c9a94511a391d47ebf2ac982570

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp311-cp311-win32.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp311-cp311-win32.whl
  • Upload date:
  • Size: 252.8 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 7f8e74c95860e7250981ccbc5bd313b77aa661c1bebbc00ed6bb77d6b2621631
MD5 986ee50ce906e5148db7117f2fdc4e6b
BLAKE2b-256 4a266454b1e298c5e89fb385da1c6a81181fb2130f828c8f48369d357d8268a3

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for liburlparser-1.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 427fb3d7fbaf0e0a7ef05136eae144b725049b99144d0cc3ac2afcddb2281fd8
MD5 3b704e0dbe9bfbdc6dcc3034d0d9d535
BLAKE2b-256 4f0d1c85fb3a280053152a44ec1ef9a7b9fa5db0c418615d0dc8e6b6143e146b

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp310-cp310-win32.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp310-cp310-win32.whl
  • Upload date:
  • Size: 250.4 kB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 4c0d0af4ba57df651ed545cd3b756dc865ce5735a2d3539a071faf056d1e4f24
MD5 0bd13717ca073a353feae64729b3f5f9
BLAKE2b-256 69f129cee984aef88986a1c346dcc0f5c4c1ea8a97ccf296768fb521ed634a17

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 281.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 69013f31a16ce5c4f3874fe50c07a0fe16bb71ef531dae053d9e5b20d0a7bbf5
MD5 c1e608f0e97dfd67e9e6a7932bf7bbd2
BLAKE2b-256 1b0af3a20efbca08cfb2681b83a6448da8ccf6958ee9f16c257689f07781d22b

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp39-cp39-win32.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp39-cp39-win32.whl
  • Upload date:
  • Size: 250.9 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 6b71f7825d820c7ec130cefdd4f7a8da37ebe2faae9d3de551ced350fce0ca8b
MD5 d5a0ba13f42d0348e732d66ed3c9c716
BLAKE2b-256 5ea8d4cb720cfe325f805bd493e51d6cc26ca9b2f720a5713d86b4b0a4a6f516

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 280.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 500e2f0c2371486570383d8d15c75ace9a9193505c53d12ac70d7d070c219a8f
MD5 e498c16645ad7cffc11276744f7e8616
BLAKE2b-256 fcd7e05605374917c0ad98763923a224c873c638effa36d6edb48d7afdaea452

See more details on using hashes here.

File details

Details for the file liburlparser-1.6.0-cp38-cp38-win32.whl.

File metadata

  • Download URL: liburlparser-1.6.0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 251.1 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for liburlparser-1.6.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 8dce94526a50fe7aa2322504a6ca7ba84da182d5495488237bb90e14d85edca6
MD5 42398dbaa3d68c928af3d3614d0a723e
BLAKE2b-256 92c8890fd9fd1022b9c2a22ddcc08f49c554135a226355ab7f7679bb9432f0fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page