A package for testing user agents on specific websites
Project description
UserAgentFilter
UserAgentFilter is a Python package designed for testing user agents on specific websites. It helps in identifying which user agents are effective for web scraping or automated testing by filtering out those that work or fail.
Key Features
- Tests a list of user agents against a specified website.
- Supports optional proxy configuration.
- Handles errors and retries for transient issues.
- Random delays between requests to mimic human browsing behavior.
- Outputs results in a text file for easy review.
Prerequisites
- Python 3.6 or higher
requests
librarybeautifulsoup4
library
Installation
You can install UserAgentFilter via pip. Run the following command:
pip install useragentfilter
Usage
To use UserAgentFilter, follow these steps:
- Import the Package First, import the UserAgentTester class from the package.
from UserAgentFilter import UserAgentTester
- Initialize the UserAgentTester Create an instance of the UserAgentTester class. You need to specify the URL of the website you want to test the user agents against. Optionally, you can provide proxy settings, a timeout period, the number of retries, and a range for random delays between requests to mimic human behavior.
tester = UserAgentTester(
test_url='https://www.example.com', # The URL to test user agents against
proxy={'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'}, # Optional proxy settings
timeout=10, # Timeout for each request in seconds
max_retries=3, # Number of retries for each request
delay_range=(3, 8) # Random delay range between requests in seconds
)
- Prepare a List of User Agents Prepare a text file containing a list of user agents, with each user agent on a new line. For example, save the following content to tests/user_agents.txt:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/91.0.864.64
- Filter User Agents Call the filter_user_agents method to filter the user agents. This method takes two arguments: the path to the input file containing user agents and the path to the output file where the filtered user agents will be saved.
tester.filter_user_agents(
user_agents_file='tests/user_agents.txt', # Path to the input file with user agents
output_file='filtered_user_agents.txt' # Path to the output file to save the filtered user agents
)
- Review the Results After the filtering process is complete, the successful user agents will be saved to the specified output file (filtered_user_agents.txt). You can review this file to see which user agents passed the test.
Example Workflow
Here’s a complete example of the entire workflow:
from UserAgentFilter import UserAgentTester
# Define the target URL and proxy settings (if needed)
test_url = 'https://www.example.com'
proxy = {'http': 'http://your_proxy:port', 'https': 'https://your_proxy:port'}
# Create an instance of UserAgentTester
tester = UserAgentTester(
test_url=test_url,
proxy=proxy,
timeout=10,
max_retries=3,
delay_range=(3, 8)
)
# Filter user agents from the input file and save the successful ones to the output file
tester.filter_user_agents(
user_agents_file='tests/user_agents.txt',
output_file='filtered_user_agents.txt'
)
print("User agents have been filtered and saved to 'filtered_user_agents.txt'")
Additional Tips
- Error Handling: The UserAgentTester handles various errors such as connection timeouts and HTTP errors. It retries requests up to the specified max_retries before giving up on a user agent.
- Random Delays: The delay_range parameter introduces random delays between requests to help mimic human browsing behavior, which can help avoid detection when testing multiple user agents.
- Proxy Configuration: If you need to use a proxy, make sure to provide the correct proxy settings in the proxy dictionary. The dictionary should include keys for http and https proxies.
Configuration Options
- test_url: The URL of the website to test user agents against.
- proxy: A dictionary containing proxy settings (optional).Use importantly in case of any 403 forbidden error.
- timeout: The maximum amount of time to wait for a response (in seconds).Default value is 10.
- max_retries: The number of times to retry a request in case of transient errors.Default value is 3.
- delay_range: A tuple specifying the range (in seconds) for random delays between requests.Default value is (3,8).
Contributing
Contributions are welcome! If you would like to contribute to UserAgentFilter, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Commit your changes.
- Push your branch and create a pull request.
License
UserAgentFilter is licensed under the MIT License. See the LICENSE file for more information.
Contact
If you have any questions, suggestions, or issues, please feel free to contact us at [shahana50997@gmail.com][ambilybiju2408@gmail.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file UserAgentFilter-1.0.0.tar.gz
.
File metadata
- Download URL: UserAgentFilter-1.0.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3d791b0ed117955a2558689f75ab7d47262c8480f7462c4c1c242d18eec08ff |
|
MD5 | 7a878a6e9f52e8cbe66209ea0d77ebf3 |
|
BLAKE2b-256 | 8e97921eef0a6c63eb0f7a69f76533b02531e34120fcd91e7fe0081ff028a555 |
File details
Details for the file UserAgentFilter-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: UserAgentFilter-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5ef230a5c0c66787c335bb87981e450cd9e4a28250794aca55878129f5cb713 |
|
MD5 | 9c34cfab0804612db0cc58070f764195 |
|
BLAKE2b-256 | 498503392e9a36d85c0e3977f2cc6e655983aa84dc3944ec02464e47c5c15d87 |