Easy identify site's technologies.json
Project description
Python wappalyzer
Modern and easy way to identify web technologies on site via Python
Installation
- Install package from pypi
pip install pywappalyzer
- Install & setup geckodriver
# if your platform is linux
export GECKO_DRIVER_VERSION='v0.24.0'
wget https://github.com/mozilla/geckodriver/releases/download/$GECKO_DRIVER_VERSION/geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
tar -xvzf geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
rm geckodriver-$GECKO_DRIVER_VERSION-linux64.tar.gz
chmod +x geckodriver
cp geckodriver /usr/local/bin/
# if your platform is windows pass this step
Usage
Get technologies
from pywappalyzer import Pywappalyzer
wappalyzer = Pywappalyzer()
data = wappalyzer.analyze(url="https://www.python.org/")
print(data)
>>> {'Web servers': ['Nginx'], 'Reverse proxies': ['Nginx'], 'Caching': ['Varnish'],
>>> 'Analytics': ['Google Analytics'], 'JavaScript libraries': ['jQuery UI', 'Modernizr', 'jQuery']
Update technologies json list which use for identifying of technologies
from pywappalyzer import Pywappalyzer
wappalyzer = Pywappalyzer()
wappalyzer.use_latest() # call this method only once, for update the file
data = wappalyzer.analyze(url="https://www.python.org/")
print(data)
>>> {'Web servers': ['Nginx'], 'Reverse proxies': ['Nginx'], 'Caching': ['Varnish'],
>>> 'Analytics': ['Google Analytics'], 'JavaScript libraries': ['jQuery UI', 'Modernizr', 'jQuery']}
Analyze your HTML or HTML file.
Using of this method can't give you 100% of technologies. So if you want get all technologies,
please use the default methods as .analyze()
import requests
from pywappalyzer import Pywappalyzer
wappalyzer = Pywappalyzer()
response = requests.get("https://python.org/")
data = wappalyzer.analyze_html(response.content)
print(data)
>>> {'Analytics': ['Google Analytics'], 'JavaScript libraries': ['Modernizr', 'jQuery UI', 'jQuery']}
Analyze HTML file
import requests
from pywappalyzer import Pywappalyzer
wappalyzer = Pywappalyzer()
response = requests.get("https://python.org/")
data = wappalyzer.analyze_html(file="path_to_file")
print(data)
>>> {'Analytics': ['Google Analytics'], 'JavaScript libraries': ['Modernizr', 'jQuery UI', 'jQuery']}
Pywappalyzer uses selenium's webdriver.Firefox
driver. For using webdriver.Chrome
you need to write your own class
from typing import Optional, Union
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from pywappalyzer import Site
class MySite(Site):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def get_html(
self, url: Optional[str] = None, *, as_text: bool = False
) -> Union[bytes, str]:
"""
Scrape site's html
:param url: Site's url
:param as_text: Return html as string
:return: Site's HTML as bytes or string
"""
if url is None:
url = self.url
options = Options()
options.add_argument("--headless")
with webdriver.Chrome(options=options) as driver:
driver.get(url)
page_source = driver.page_source
self.handle_js(driver)
if as_text:
return page_source
return page_source.encode("utf-8")
CONTRIBUTING
To contribute to the code, suppose you are working on Issue Ticket #34, you’ll need to create a new local branch named “feature/34”
git checkout -b "feature/34"
Now once you have made all changes,
inv format (To format all the files according to Python standards)
inv check (To check formatting once again)
inv test (to run tests)
git add .
git commit -m "#34 <commit message>"
Example: git commit -m "#34 Add support for feature X"
git push --set-upstream origin feature/34
Now, your changes would have been pushed online to the new branch “feature/34”.
After this, you need to go to your branch online and create a Pull Request to merge the branch “feature/34” with “master”.
Once the Pull Request is approved after code review, you can merge the Pull Request. :-)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pywappalyzer-0.1.1.tar.gz
.
File metadata
- Download URL: pywappalyzer-0.1.1.tar.gz
- Upload date:
- Size: 125.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43311c7d35e50b88d5731515a27a4291dbb60ad56b9b3ec8d02423827945ce7f |
|
MD5 | 3c7c1080f23b64d2a86e913ae5be7f64 |
|
BLAKE2b-256 | d51669914ed253b46bc8c37ae74ea391d4004ee411f2f780e07fe47a0ad2c178 |
File details
Details for the file pywappalyzer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pywappalyzer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 129.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f3223b7f1d614b47ce4decddbec16685e9c820ce1e33b81041600df4bc49019 |
|
MD5 | 6dd85645f327253a132909be73c3edd0 |
|
BLAKE2b-256 | 878b983cc70902fa1b2afb7120a7b39d4b7697abecaf98709aeeabf4c8f53240 |