Grabs page & article titles from lists of URLs contained in files passed in as arguments
Project description
title_grabber
Usage instructions
- Just feed it 1 or more files containing URLs (1 per line)
python -m title_grabber /abs/path/2/urls1.csv rel/path/2/urls2.csv
- Optionally, change the output file:
python -m title_grabber -o output.csv /abs/path/2/urls1.csv rel/path/2/urls2.csv
- See all available config options:
python -m title_grabber -h
usage: title_grabber [-h] [-o OUT_FILE] [--connect-timeout TIMEOUT]
[--read-timeout TIMEOUT] [-r RETRIES] [-t THREADS] [-d]
[FILES [FILES ...]]
positional arguments:
FILES 1 or more CSV files containing URLs (1 per line)
optional arguments:
-h, --help show this help message and exit
-o OUT_FILE, --output OUT_FILE
Output file (defaults to out.csv)
--connect-timeout TIMEOUT
HTTP connect timeout. Defaults to the value of the
CONNECT_TIMEOUT env var or 10
--read-timeout TIMEOUT
HTTP read timeout. Defaults to the value of the
READ_TIMEOUT env var or 15
-r RETRIES, --max-retries RETRIES
Max. # of times to retry failed HTTP reqs. Defaults to
the value of the MAX_RETRIES env var or 3
-t THREADS, --max-threads THREADS
Max. # of threads to use. Defaults to the value of the
MAX_THREADS env var or the # of logical processors in
the system (8)
-d, --debug Log to STDOUT instead of to a file in the CWD.
Defaults to the value of the DEBUG env var or False
dev setup instructions
- Clone the project
git clone git@github.com:cristianrasch/title_grabber.git
- Create a new virtual environment for it
cd title_grabber && python3 -m venv venv
- Install its dependencies
pip install -r requirements.txt
- Run the test suite to make sure everything is set up OK
python -m unittest discover -v -s title_grabber/tests/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for title_grabber-cristianrasch-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cbf9283d23bd0c5a58352ee3a9240fc852ced0278b092fccd6a9886a86f882a |
|
MD5 | d6f7033de675621dde54d6dd694dbe6d |
|
BLAKE2b-256 | 78afc656671c4043707e3ac095478bf24971f6ba0adfa1acfb8adfcd27255d03 |
Close
Hashes for title_grabber_cristianrasch-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b2ee475f5e62874880ca0beeb67e8a26b6f668ef354295440ea8ddc87d76713 |
|
MD5 | 56383476f80329409a74d59ed17a194d |
|
BLAKE2b-256 | b50e2fa63276d66b8caf2b00972bd7f2908a324ec3838dee2d15d4eae43fa4cc |