A Python library to detect technologies used by websites
Project description
TechFinder
TechFinder is a Python library designed to detect technologies used on websites. It checks both the HTML content and HTTP headers of a given URL to identify various technologies like frameworks, libraries, and server software. The library provides both default patterns and support for user-defined patterns via a JSON configuration file.
Features
- Technology Detection: Detects web technologies such as frameworks (e.g., React, Angular), server technologies (e.g., Apache, Nginx), and cloud services (e.g., AWS, Google Cloud).
- Custom Patterns: Users can define their own patterns for technology detection via a custom JSON file.
- HTML and HTTP Header Parsing: Examines both HTML content and HTTP headers to identify technologies.
- Extensible: Easily extendable to support additional technologies by updating the patterns.
Installation
To install the techfinder library, run the following command:
pip install techfinder
Alternatively, clone this repository and install dependencies manually:
git clone https://github.com/Prathameshsci369/Detector.git
cd Detector
pip install -r requirements.txt
Usage
Importing and Initializing
First, import the techfinder class:
from techfinder import Detector
You can initialize the detector with a default pattern set or provide a custom JSON file containing user-defined patterns.
# Initialize with default patterns
detector = Detector()
# Initialize with custom patterns (provide path to your custom JSON config)
detector = Detector('custom_patterns.json')
Basic Detection Use Case
To detect technologies from a website, you can use the final_function method, which fetches the URL and analyzes its HTML and headers:
url = 'https://example.com'
detected_tech = detector.final_function(url)
print("Detected Technologies:", detected_tech)
Custom Patterns
If you want to use your own patterns to detect specific technologies, you can create a custom JSON file like the one below:
custom_patterns.json
{
"html_patterns": {
"MyCustomTech": "mycustomtech"
},
"header_patterns": {
"MyCustomServer": "mycustomserver"
}
}
You can then initialize Detector with this file:
detector = Detector('custom_patterns.json')
detected_tech = detector.final_function('https://example.com')
print("Detected Technologies:", detected_tech)
Example Outputs
Example 1: Default Patterns
For a URL like https://example.com, the output might be:
Detected Technologies: ['React', 'Node.js', 'Express']
Example 2: Custom Patterns
If the URL matches custom patterns in custom_patterns.json, the output might look like:
Detected Technologies: ['MyCustomTech', 'MyCustomServer']
Logging
The techfinder library uses Python's built-in logging module to provide detailed information during execution. By default, it logs important actions such as pattern loading and technology detection. You can customize the logging level as needed:
import logging
logging.basicConfig(level=logging.DEBUG) # Change logging level to DEBUG
Error Handling
The library will handle common errors such as invalid URLs or issues with fetching data gracefully. If something goes wrong, you will see an error message in the logs, and the program will continue running.
detected_tech = detector.final_function('https://invalid-url.com')
# Will log an error: "Error fetching the URL"
Use Cases
Use Case 1: Identify Web Frameworks and Libraries
techfinder can be used to determine what frameworks and libraries a website is using. For example, detecting if a website uses React, Vue.js, or Angular.
detector = Detector()
url = 'https://some-react-site.com'
detected_tech = detector.final_function(url)
print(detected_tech) # Expected output: ['React']
Use Case 2: Identify Server Technologies
You can use this library to detect the server-side technology used by a website, such as Apache, Nginx, or a cloud platform like AWS.
detector = Detector()
url = 'https://some-apache-server.com'
detected_tech = detector.final_function(url)
print(detected_tech) # Expected output: ['Apache']
Use Case 3: Customize Patterns for Specific Technologies
If you have specific technologies that are not part of the default set, you can define your own patterns in a custom JSON file.
{
"html_patterns": {
"MyCustomTech": "mycustomtech"
},
"header_patterns": {
"MyCustomServer": "mycustomserver"
}
}
This allows you to track and detect technologies that are unique to your environment or your use case.
Use Case 4: Monitor Technology Changes
By integrating techfinder into your monitoring tools, you can keep track of which technologies are being used on various websites over time. This could be useful for identifying when websites update their tech stack.
detector = Detector()
url = 'https://example.com'
detected_tech = detector.final_function(url)
# Log detected technologies every week
Contributing
We welcome contributions to the techfinder library! If you'd like to report bugs, suggest new features, or help improve the documentation, feel free to open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file techfinder-0.3.2.tar.gz.
File metadata
- Download URL: techfinder-0.3.2.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eefc457c2061b032e4f0716283a64d9046e48aca96ab53d61c74100a150fdbb7
|
|
| MD5 |
e522f97d089ff20d80358e43bc266aee
|
|
| BLAKE2b-256 |
8c62258ebb9f00f7e8c174dad8971164d41910cf190714f58657b28fc63f4f10
|
File details
Details for the file techfinder-0.3.2-py3-none-any.whl.
File metadata
- Download URL: techfinder-0.3.2-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a795cdc32640b24dcfba66ee0099cda96088bb30822b36d0868d045a9d23731b
|
|
| MD5 |
5b7232b241d79d058593e66d0725c4ae
|
|
| BLAKE2b-256 |
74dd1589e455cd30af810f71fefd73e7634aa46a3c6a0c2d4d60424b41edff5f
|