Grabs page & article titles from lists of URLs contained in files passed in as arguments
Project description
title_grabber
Usage instructions
- Just feed it 1 or more files containing URLs (1 per line)
python -m title_grabber /abs/path/2/urls1.csv rel/path/2/urls2.csv
- Optionally, change the output file:
python -m title_grabber -o output.csv /abs/path/2/urls1.csv rel/path/2/urls2.csv
- See all available config options:
python -m title_grabber -h
usage: title_grabber [-h] [-o OUT_FILE] [--connect-timeout TIMEOUT]
[--read-timeout TIMEOUT] [-r RETRIES] [-t THREADS] [-d]
[FILES [FILES ...]]
positional arguments:
FILES 1 or more CSV files containing URLs (1 per line)
optional arguments:
-h, --help show this help message and exit
-o OUT_FILE, --output OUT_FILE
Output file (defaults to out.csv)
--connect-timeout TIMEOUT
HTTP connect timeout. Defaults to the value of the
CONNECT_TIMEOUT env var or 10
--read-timeout TIMEOUT
HTTP read timeout. Defaults to the value of the
READ_TIMEOUT env var or 15
-r RETRIES, --max-retries RETRIES
Max. # of times to retry failed HTTP reqs. Defaults to
the value of the MAX_RETRIES env var or 3
-t THREADS, --max-threads THREADS
Max. # of threads to use. Defaults to the value of the
MAX_THREADS env var or the # of logical processors in
the system (8)
-d, --debug Log to STDOUT instead of to a file in the CWD.
Defaults to the value of the DEBUG env var or False
-V, --version Print program version and exit
dev setup instructions
- Clone the project
git clone git@github.com:cristianrasch/title_grabber.git
- Create a new virtual environment for it
cd title_grabber && python3 -m venv venv
- Install its dependencies
pip install -r requirements.txt
- Run the test suite to make sure everything is set up OK
python -m unittest discover -v -s title_grabber/tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for title_grabber-cristianrasch-0.1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2516efeb93cd9a0a39674186d795253d17e810377472e39f367c6285a0bda89 |
|
MD5 | 64acac3275e6d8b2a5498a697c8eb1c7 |
|
BLAKE2b-256 | 2e6858bc025472e7041ecd1107150da0d61cfe70af07cee868d057a47c7bdb1f |
Close
Hashes for title_grabber_cristianrasch-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b4fa7190a12da1840a0b3416e12bec7fd72e44b88f4e27be5ca6fcdc58a8dce |
|
MD5 | a8e21487e7789d8373ef073fbe751ef0 |
|
BLAKE2b-256 | f0c266ea4693218ba65f7049dda1cb7add748178f64f07bbac7ecf14ae65704f |