Skip to main content

Compare web page and evaluate the level of similarity.

Project description

Similarius

Similarius is a Python library to compare web page and evaluate the level of similarity.

The tool can be used as a stand-alone tool or to feed other systems.

Requirements

Installation

Source install

Similarius can be install with poetry. If you don't have poetry installed, you can do the following curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python.

$ poetry install
$ poetry shell
$ similarius -h

pip installation

$ pip3 install similarius

Usage

dacru@dacru:~/git/Similarius/similarius$ similarius --help
usage: similarius.py [-h] [-o ORIGINAL] [-w WEBSITE [WEBSITE ...]]

optional arguments:
  -h, --help            show this help message and exit
  -o ORIGINAL, --original ORIGINAL
                        Website to compare
  -w WEBSITE [WEBSITE ...], --website WEBSITE [WEBSITE ...]
                        Website to compare

Usage example

dacru@dacru:~/git/Similarius/similarius$ similarius -o circl.lu -w europa.eu circl.eu circl.lu

Used as a library

import argparse
from similarius import get_website, extract_text_ressource, sk_similarity, ressource_difference, ratio

parser = argparse.ArgumentParser()
parser.add_argument("-w", "--website", nargs="+", help="Website to compare")
parser.add_argument("-o", "--original", help="Website to compare")
args = parser.parse_args()

# Original
original = get_website(args.original)

if not original:
    print("[-] The original website is unreachable...")
    exit(1)

original_text, original_ressource = extract_text_ressource(original.text)

for website in args.website:
    print(f"\n********** {args.original} <-> {website} **********")

    # Compare
    compare = get_website(website)

    if not compare:
        print(f"[-] {website} is unreachable...")
        continue

    compare_text, compare_ressource = extract_text_ressource(compare.text)

    # Calculate
    sim = str(sk_similarity(compare_text, original_text))
    print(f"\nSimilarity: {sim}")

    ressource_diff = ressource_difference(original_ressource, compare_ressource)
    print(f"Ressource Difference: {ressource_diff}")

    ratio_compare = ratio(ressource_diff, sim)
    print(f"Ratio: {ratio_compare}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similarius-0.0.1.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

similarius-0.0.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file similarius-0.0.1.tar.gz.

File metadata

  • Download URL: similarius-0.0.1.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.10 Linux/5.15.0-57-generic

File hashes

Hashes for similarius-0.0.1.tar.gz
Algorithm Hash digest
SHA256 398826bfa359518d318a2f004cf16cb137a4671959f8f63cdb1d4279a8d2ea77
MD5 2037f6543a14defe471d1ad0dca5af5e
BLAKE2b-256 a37075a950e7006f4da0d364e68311b9bc5d800ce4707ec86463ba0c074a1e2c

See more details on using hashes here.

File details

Details for the file similarius-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: similarius-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.10 Linux/5.15.0-57-generic

File hashes

Hashes for similarius-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b49fe0ccc766d574d9034420c262a700290deb6bf51324c3d4ba5e496b550a7
MD5 47c9e62f1fad01062ebd8fa5ff31025c
BLAKE2b-256 fcf221cb53f70c27481be0593e2e3015b0f1676628e7ec63299a7d9b48f22b2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page