Grabs page & article titles from lists of URLs contained in files passed in as arguments
Project description
title_grabber
Usage instructions
- Just feed it 1 or more files containing URLs (1 per line)
python -m title_grabber /abs/path/2/urls1.csv rel/path/2/urls2.csv
- Optionally, change the output file:
python -m title_grabber -o output.csv /abs/path/2/urls1.csv rel/path/2/urls2.csv
- See all available config options:
python -m title_grabber -h
usage: title_grabber [-h] [-o OUT_FILE] [--connect-timeout TIMEOUT]
[--read-timeout TIMEOUT] [-r RETRIES] [-t THREADS] [-d]
[FILES [FILES ...]]
positional arguments:
FILES 1 or more CSV files containing URLs (1 per line)
optional arguments:
-h, --help show this help message and exit
-o OUT_FILE, --output OUT_FILE
Output file (defaults to out.csv)
--connect-timeout TIMEOUT
HTTP connect timeout. Defaults to the value of the
CONNECT_TIMEOUT env var or 10
--read-timeout TIMEOUT
HTTP read timeout. Defaults to the value of the
READ_TIMEOUT env var or 15
-r RETRIES, --max-retries RETRIES
Max. # of times to retry failed HTTP reqs. Defaults to
the value of the MAX_RETRIES env var or 3
-t THREADS, --max-threads THREADS
Max. # of threads to use. Defaults to the value of the
MAX_THREADS env var or the # of logical processors in
the system (8)
-d, --debug Log to STDOUT instead of to a file in the CWD.
Defaults to the value of the DEBUG env var or False
dev setup instructions
- Clone the project
git clone git@github.com:cristianrasch/title_grabber.git
- Create a new virtual environment for it
cd title_grabber && python3 -m venv venv
- Install its dependencies
pip install -r requirements.txt
- Run the test suite to make sure everything is set up OK
python -m unittest discover -v -s title_grabber/tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for title_grabber-cristianrasch-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19a970a78fc438ef944464e7b8d9178fdee6720f6039c64381e4d4e336d36c90 |
|
MD5 | fb8a58ef60e1eee76be2f79d74141804 |
|
BLAKE2b-256 | 429a6011e9b20237912527971f613edb56c5945b989adcc0c4ec24846ac6de26 |
Close
Hashes for title_grabber_cristianrasch-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd479f3efe758dc9d467cd3a4864d88072884a2f034238965a975ac44d235d59 |
|
MD5 | 5cd8da43337a2e2b4eb03616654abd41 |
|
BLAKE2b-256 | 71361561ff875fdbba6174f6ef9547c387114d833387f70a32b37c586f17567f |