Run cmd for every line of input
Project description
nd for every line of input
Free software: MIT license
Documentation: (COMING SOON!) https://cmdfor.readthedocs.org.
Features
A shell utility (and package) which runs a command for every line of input.
It allows for spawning an arbitrary number of concurrent threads, and control over where to keep each commands output.
In my daily work, I have to run all manner of commands on huge batches of items. These things are usually not CPU bound, so it makes sense to multithread these tasks.
Thus, I find myself doing bash commands such as the following, which takes an input file of items, splits it into equal(ish) parts, and then spawns a worker for each part, all the while keeping granular logs and return codes:
lines=`wc -l domains.txt | awk '{print $1}'`; threads=10; split=$(((lines/threads)+1)); mkdir -p in out; split -d -l ${split} domains.txt in/part. ; ls in/ | while read -r f; do cat in/${f} | while read -r d; do host -t a "${d}" > out/${d} 2>&1; echo -e "${d}\t$?"; done > log.${f} & echo ${!}; done > pids
That gets pretty tiring to type all the time. Why not use xargs -P you say? Well that works perfectly fine for cases where I don’t need to make very complicated commands, and don’t need to log all return codes. Maybe I can do all of that with xargs, but I wanted to make this anyway as a learning experience.
How-To
The program can take input from STDIN or from a file passed with the -i option.
All arguments that aren’t options are considered the subcommand to run. All wildcards {} are replaced with the corresponding positional field from the input data.
To delete a list of files, basically the same behaviour as xargs:
cat files.txt | cmdfor rm {}
To run the fictional command imaplogin for every line of a csv that contains <email>,<password> fields, logging each individual command’s output to an file in the directory ./out:
cat email_users.csv | cmdfor -d, -o ./out -- imaplogin -u {} -p {}
To look up the IP addresses of a huge amount of hostnames, using 10 concurrent threads, and storing each individual commands stdout and stderr in seperate files in the directory ./results, with each file being named after the hostname on which the query was performed:
cat hostnames.txt | cmdfor -t 10 -Eo ./results -l 1 -- host -t a {}
To-Do
1. Come up with a real test case. Since this is a shell utility and really only deals with shell subcommands, I don’t know what will work and what won’t on travis.ci (can I run a shell command there?) 2. By default, it suppresses all output from subprocesses, and writes a message to STDOUT for each process spawn and reap. This output is too verbose for the default behaviour, and so it should be toggled with -v. The default should be quitier and simpler. Perhaps just the returncodes of each task. 3. Refactoring some stuff to be a little less messy. The function signatures are huge, and there are messages generated in odd places. I think it would be better to pass a context object.
History
2018-04-09 v0.1.0 initial release, still need to do tests and docs 2018-04-10 v0.1.5 fixing import issues and removing <2.7 support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cmdfor-0.1.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c25c8dcce2a15983baaa2179c51f3b07226c9830eef4c940788cf7ef12a76ab5 |
|
MD5 | b193058b8987e90d47e27bdc052388f3 |
|
BLAKE2b-256 | e4560098400caed87faf87d23be67269d4c4e4a00d1a2a193c7ab91d32c76318 |