Skip to main content

Easy identify site's technologies.json

Project description

Python wappalyzer Build Status BCH compliance

Modern and easy way to identify web technologies on site via Python

Installation

  • Install package from pypi
pip install pywappalyzer
  • Install & setup geckodriver
# if your platform is linux 
export GECKO_DRIVER_VERSION='v0.24.0'
wget https://github.com/mozilla/geckodriver/releases/download/$GECKO_DRIVER_VERSION/geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
tar -xvzf geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
rm geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
chmod +x geckodriver
cp geckodriver /usr/local/bin/

# if your platform is windows pass this step

Usage

Get technologies

from pywappalyzer import Pywappalyzer


wappalyzer = Pywappalyzer()

data = wappalyzer.analyze(url="https://www.python.org/")
print(data)

>>> {'Web servers': ['Nginx'], 'Reverse proxies': ['Nginx'], 'Caching': ['Varnish'], 
>>>  'Analytics': ['Google Analytics'], 'JavaScript libraries': ['jQuery UI', 'Modernizr', 'jQuery']

Update technologies json list which use for identifying of technologies

from pywappalyzer import Pywappalyzer


wappalyzer = Pywappalyzer()

wappalyzer.use_latest()  # call this method only once, for update the file
data = wappalyzer.analyze(url="https://www.python.org/")
print(data)

>>> {'Web servers': ['Nginx'], 'Reverse proxies': ['Nginx'], 'Caching': ['Varnish'], 
>>>  'Analytics': ['Google Analytics'], 'JavaScript libraries': ['jQuery UI', 'Modernizr', 'jQuery']}

Analyze your HTML or HTML file.
Using of this method can't give you 100% of technologies. So if you want get all technologies, please use the default methods as .analyze()

import requests
from pywappalyzer import Pywappalyzer


wappalyzer = Pywappalyzer()
response = requests.get("https://python.org/")

data = wappalyzer.analyze_html(response.content)
print(data)

>>> {'Analytics': ['Google Analytics'], 'JavaScript libraries': ['Modernizr', 'jQuery UI', 'jQuery']}

Analyze HTML file

import requests
from pywappalyzer import Pywappalyzer


wappalyzer = Pywappalyzer()
response = requests.get("https://python.org/")

data = wappalyzer.analyze_html(file="path_to_file")
print(data)

>>> {'Analytics': ['Google Analytics'], 'JavaScript libraries': ['Modernizr', 'jQuery UI', 'jQuery']}

Pywappalyzer uses selenium's webdriver.Firefox driver. For using webdriver.Chrome you need to write your own class

from typing import Optional, Union

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from pywappalyzer import Site


class MySite(Site):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def get_html(
        self, url: Optional[str] = None, *, as_text: bool = False
    ) -> Union[bytes, str]:
        """
        Scrape site's html
        :param url: Site's url
        :param as_text: Return html as string
        :return: Site's HTML as bytes or string
        """
        if url is None:
            url = self.url

        options = Options()
        options.add_argument("--headless")
        with webdriver.Chrome(options=options) as driver:
            driver.get(url)
            page_source = driver.page_source
            self.handle_js(driver)

        if as_text:
            return page_source
        return page_source.encode("utf-8")

CONTRIBUTING

To contribute to the code, suppose you are working on Issue Ticket #34, you’ll need to create a new local branch named “feature/34”

git checkout -b "feature/34"

Now once you have made all changes,

inv format (To format all the files according to Python standards)
inv check (To check formatting once again)
inv test (to run tests)
git add .
git commit -m "#34 <commit message>"

Example: git commit -m "#34 Add support for feature X"

git push --set-upstream origin feature/34

Now, your changes would have been pushed online to the new branch “feature/34”.

After this, you need to go to your branch online and create a Pull Request to merge the branch “feature/34” with “master”.

Once the Pull Request is approved after code review, you can merge the Pull Request. :-)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywappalyzer-0.1.1.tar.gz (125.3 kB view details)

Uploaded Source

Built Distribution

pywappalyzer-0.1.1-py3-none-any.whl (129.1 kB view details)

Uploaded Python 3

File details

Details for the file pywappalyzer-0.1.1.tar.gz.

File metadata

  • Download URL: pywappalyzer-0.1.1.tar.gz
  • Upload date:
  • Size: 125.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.0

File hashes

Hashes for pywappalyzer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 43311c7d35e50b88d5731515a27a4291dbb60ad56b9b3ec8d02423827945ce7f
MD5 3c7c1080f23b64d2a86e913ae5be7f64
BLAKE2b-256 d51669914ed253b46bc8c37ae74ea391d4004ee411f2f780e07fe47a0ad2c178

See more details on using hashes here.

File details

Details for the file pywappalyzer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pywappalyzer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 129.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.0

File hashes

Hashes for pywappalyzer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f3223b7f1d614b47ce4decddbec16685e9c820ce1e33b81041600df4bc49019
MD5 6dd85645f327253a132909be73c3edd0
BLAKE2b-256 878b983cc70902fa1b2afb7120a7b39d4b7697abecaf98709aeeabf4c8f53240

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page