submit jobs to LSF with python

python wrapper to submit jobs to bsub (and later qsub)

@brentp, @brwnj


>>> from bsub import bsub
>>> sub = bsub("some_job", R="rusage[mem=1]", verbose=True)

# submit a job via call'ing the sub object with the command to run.
# the return value is the numeric job id.
>>> print sub("date").job_id.isdigit()

# 2nd argument can be a shell script, in which case
# the call() is empty.
#>>> bsub("somejob", "", verbose=True)()

# dependencies:
>>> job_id = bsub("sleeper", verbose=True)("sleep 2").job_id
>>> bsub.poll(job_id)



For file jobs, we can emulate shell syntax:


job = bsub('my-job') < ''

Same for text commands:


"echo hello" | bsub('other-job')



It's possible to specify dependencies to LSF using a flag like:

bsub -w 'done("other-name")' < myjob

We make this more pythonic with:


>>> j = sub('sleep 1').then('sleep 2')

which will wait for the first job `sleep 1` to complete
before running the second job `sleep 2`. These can be chained as:


j = sub('myjob')
j2 = j('sleep 1')
j3 = j2.then('echo "hello"')
j4 = j3.then('echo "world"')
j5 = j4.then('my scripts.p')

# or:

j('sleep 1').then('echo "hello"').then('echo "world"')

Where each job in `.then()` is not run until the preceding job
is `done()` according to LSF.

Bioinformatics example of chaining:

This would submit jobs for positive and negative strand coverage in parallel.
Each strand submitting jobs that run serially.


from bsub import bsub

submit = bsub("bam2bg", verbose=verbose)

# convert bam to stranded bg then bw
sample = "subject_1"
chrom_sizes = "chrom_sizes.txt"

# submit jobs by strand for parallel processing
for symbol, strand in zip(["+", "-"], ["pos", "neg"]):

bigwig = "" % (sample, strand)
bedgraph = "%s_%s.bedgraph" % (sample, strand)

bam_to_bg = ("bedtools genomecov -strand %s -bg "
"-ibam %s | bedtools sort -i - > %s") % (symbol, bam, bedgraph)
bg_to_bw = "bedGraphToBigWig %s %s %s" % (bedgraph, chrom_sizes, bigwig)
gzip_bg = "gzip -f %s" % bedgraph

# process strand-based steps serially
# submit first 2 jobs to default queue; final job to 'gzip' queue
submit(bam_to_bg).then(bg_to_bw, job_name="bg2bw").then(gzip_bg, "gzipbg", q='gzip')



use the command-line to run jobs with auto-specified err and log files:

echo "hello" | python -m bsub -J "fake"
bsub -J fake -e fake.%J.err -o fake.%J.out < /tmp/
If a log/ directory exists, the logs will be placed there.

the shell script is automatically created and cleaned up after use.

